You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository demonstrates how to perform LLM prompt evaluation, deterministic testing, and regression detection using PromptFoo. It is designed to help developers, researchers, and QA teams ensure that prompt or model updates produce consistent and reliable results.
A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.