Skip to content

sdivyanshu90/Evaluating-and-Debugging-Generative-AI

Repository files navigation

📘 Evaluating and Debugging Generative AI

This repository contains the Jupyter notebooks from the course Evaluating and Debugging Generative AI, created by DeepLearning.AI in collaboration with Weights & Biases.

The course explores practical ways to evaluate, debug, and improve generative AI models while leveraging W&B for experiment tracking and visualization.


📚 Course Contents

1. Instruments with W&B

Learn how to set up and use Weights & Biases to track experiments, log results, and visualize performance.

Learning outcomes:

  • Set up W&B in your environment
  • Log metrics, predictions, and artifacts
  • Visualize and compare experiments effectively

2. Training Diffusion Models with W&B

Understand how to train diffusion models (the backbone of many generative image models like Stable Diffusion) while using W&B to monitor training.

Learning outcomes:

  • Explain the basic idea of diffusion models
  • Train a diffusion model step by step
  • Track training progress and generated samples with W&B
  • Spot and debug issues during training

3. Evaluating Diffusion Models

Discover methods for evaluating generative image models. Since traditional metrics like accuracy don’t apply, you’ll explore new approaches.

Learning outcomes:

  • Apply quantitative metrics (e.g., FID, IS) to measure image quality
  • Use visualizations for qualitative evaluation
  • Combine automated and human evaluation strategies
  • Understand the trade-offs between evaluation methods

4. LLM Evaluation and Tracing with W&B

Dive into large language models (LLMs) and learn how to evaluate their outputs systematically.

Learning outcomes:

  • Set up tracing to capture prompts, responses, and metadata
  • Evaluate LLM outputs for correctness, relevance, and safety
  • Log and visualize evaluations in W&B
  • Debug LLM behavior with structured traces

5. Fine-tuning an LLM

Get hands-on experience with fine-tuning large language models to adapt them for specific tasks.

Learning outcomes:

  • Prepare and clean datasets for fine-tuning
  • Fine-tune an LLM on a downstream task
  • Track experiments and evaluate improvements with W&B
  • Reflect on when fine-tuning is (and isn’t) the right approach

🎯 Notes for Learners

  • These notebooks are designed for hands-on learning.
  • You’ll need your own Weights & Biases account to run logging and tracking examples.
  • Some exercises may require external APIs (e.g., Hugging Face or OpenAI).

🙏 Acknowledgements

This course and its materials are brought to you by:

  • DeepLearning.AI – advancing AI education for everyone.
  • Weights & Biases – tools for experiment tracking, model evaluation, and debugging.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published