Welcome to Augmented Generation with LLMs, a curated collection of interactive Colab notebooks exploring different approaches to enhance Large Language Model (LLM) outputs through context, cache, and retrieval-based techniques. Built using LangChain, Ollama, Vector Databases, and more, this repo demonstrates powerful patterns to improve LLM performance and memory capabilities.
- Concept: Enhances LLM inference by reusing previously computed results with a smart cache layer.
- Tech Stack:
- Pickling for storing & retrieving cached outputs
- In-memory caching logic
- Minimal recomputation, blazing speed ⚡
- ✅ Ideal for repetitive or FAQ-style inputs.
- Concept: Augments responses using relevant context from previous interactions or documents.
- Tech Stack:
- Custom context memory
- LangChain’s prompt management
- Context-based generation pipeline
- 📚 Boosts response richness and continuity.
- Concept: Integrates Vector DBs and Embeddings to retrieve relevant chunks from external data for precise answering.
- Tech Stack:
- LangChain + FAISS/Chroma
- Embedding models via Ollama
- Retrieval-Augmented Generation (RAG) flow
- 🔎 Perfect for knowledge-based systems and document Q&A.
Tech / Tool | Purpose |
---|---|
LLMs | Generative responses |
LangChain | Chaining prompts, memory, and tools |
Ollama | Lightweight local LLMs |
Vector DB | Fast document retrieval |
Embeddings | Semantic search capability |
Pickling | Output caching |
Cache Memory | Efficient reuse of responses |
Jupyter Notebook | Interactive development |
💡 Tip: You can replace the above links with your actual GitHub asset paths or embed Colab previews using badges.
- Open any notebook in Jupyter Notebook or Google Colab
.
- Follow the instructions in each cell.
- Make sure you have the required models via Ollama and libraries installed (
langchain
,faiss-cpu
, etc.)
Questions or contributions? Open an issue or PR anytime.