📘 Evaluating and Debugging Generative AI

This repository contains the Jupyter notebooks from the course Evaluating and Debugging Generative AI, created by DeepLearning.AI in collaboration with Weights & Biases.

The course explores practical ways to evaluate, debug, and improve generative AI models while leveraging W&B for experiment tracking and visualization.

📚 Course Contents

1. Instruments with W&B

Learn how to set up and use Weights & Biases to track experiments, log results, and visualize performance.

Learning outcomes:

Set up W&B in your environment
Log metrics, predictions, and artifacts
Visualize and compare experiments effectively

2. Training Diffusion Models with W&B

Understand how to train diffusion models (the backbone of many generative image models like Stable Diffusion) while using W&B to monitor training.

Learning outcomes:

Explain the basic idea of diffusion models
Train a diffusion model step by step
Track training progress and generated samples with W&B
Spot and debug issues during training

3. Evaluating Diffusion Models

Discover methods for evaluating generative image models. Since traditional metrics like accuracy don’t apply, you’ll explore new approaches.

Learning outcomes:

Apply quantitative metrics (e.g., FID, IS) to measure image quality
Use visualizations for qualitative evaluation
Combine automated and human evaluation strategies
Understand the trade-offs between evaluation methods

4. LLM Evaluation and Tracing with W&B

Dive into large language models (LLMs) and learn how to evaluate their outputs systematically.

Learning outcomes:

Set up tracing to capture prompts, responses, and metadata
Evaluate LLM outputs for correctness, relevance, and safety
Log and visualize evaluations in W&B
Debug LLM behavior with structured traces

5. Fine-tuning an LLM

Get hands-on experience with fine-tuning large language models to adapt them for specific tasks.

Learning outcomes:

Prepare and clean datasets for fine-tuning
Fine-tune an LLM on a downstream task
Track experiments and evaluate improvements with W&B
Reflect on when fine-tuning is (and isn’t) the right approach

🎯 Notes for Learners

These notebooks are designed for hands-on learning.
You’ll need your own Weights & Biases account to run logging and tracking examples.
Some exercises may require external APIs (e.g., Hugging Face or OpenAI).

🙏 Acknowledgements

This course and its materials are brought to you by:

DeepLearning.AI – advancing AI education for everyone.
Weights & Biases – tools for experiment tracking, model evaluation, and debugging.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
01_intro_starter.ipynb		01_intro_starter.ipynb
02_diffusion_training_starter.ipynb		02_diffusion_training_starter.ipynb
03_diffusion_sampling_starter.ipynb		03_diffusion_sampling_starter.ipynb
04_llm_eval_starter.ipynb		04_llm_eval_starter.ipynb
05_train_llm_starter.ipynb		05_train_llm_starter.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 Evaluating and Debugging Generative AI

📚 Course Contents

1. Instruments with W&B

2. Training Diffusion Models with W&B

3. Evaluating Diffusion Models

4. LLM Evaluation and Tracing with W&B

5. Fine-tuning an LLM

🎯 Notes for Learners

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

sdivyanshu90/Evaluating-and-Debugging-Generative-AI

Folders and files

Latest commit

History

Repository files navigation

📘 Evaluating and Debugging Generative AI

📚 Course Contents

1. Instruments with W&B

2. Training Diffusion Models with W&B

3. Evaluating Diffusion Models

4. LLM Evaluation and Tracing with W&B

5. Fine-tuning an LLM

🎯 Notes for Learners

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages