A Retrieval-Augmented Generation (RAG) chatbot that uses Ollama's phi4 model for intelligent question answering over documents.
- PDF document processing and text extraction
- Intelligent text chunking and embedding generation
- Vector-based semantic search
- RAG-based question answering
- LangGraph workflow for state management
- Configurable model parameters
- Progress tracking and error handling
- Modern Streamlit UI
The system follows a RAG architecture with the following components:
-
Document Processing
- PDF text extraction
- Text chunking
- Embedding generation
- Vector store creation
-
Query Processing
- User question input
- Context retrieval
- Response generation
- Answer presentation
For a detailed view of the data flow, see the RAG Flow Diagram.
The system uses LangGraph for state management and workflow control:
-
State Management
- AgentState for tracking question, context, and answer
- TypedDict for type-safe state handling
- Clear state transitions
-
Processing Nodes
- Retrieve node for context gathering
- Generate node for answer creation
- Error handling and logging
For a detailed view of the workflow, see the LangGraph Workflow Diagram.
- Clone the repository:
git clone https://github.com/yourusername/rag-chatbot.git
cd rag-chatbot
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install Ollama and pull the phi4 model:
# Install Ollama (follow instructions for your OS)
# Pull the phi4 model
ollama pull phi4
- Configure environment variables:
cp .env.example .env
# Edit .env with your settings
The system uses the following environment variables:
OLLAMA_BASE_URL
: Base URL for Ollama server (default: http://127.0.0.1:11434)OLLAMA_MODEL
: Model name (default: phi4)OLLAMA_TEMPERATURE
: Response creativity (default: 0.7)OLLAMA_TOP_P
: Response diversity (default: 0.9)OLLAMA_NUM_CTX
: Context length (default: 2048)OLLAMA_NUM_THREAD
: Thread count (default: 4)OLLAMA_STOP
: Stop sequences for response controlOLLAMA_REPEAT_PENALTY
: Penalty for repetitive responsesOLLAMA_TOP_K
: Top-k sampling parameter
- Start the Ollama server:
ollama serve
- Run the Streamlit application:
streamlit run rag_chatbot/chatbot.py
- Upload a PDF document and start asking questions!
rag-chatbot/
├── docs/
│ ├── rag_flow_diagram.md
│ └── langgraph_workflow.md
├── rag_chatbot/
│ ├── chatbot.py
│ ├── models/
│ │ └── ollama_client.py
│ ├── utils/
│ │ ├── text_processor.py
│ │ ├── vector_store.py
│ │ ├── pdf_processor.py
│ │ └── logging_config.py
│ └── rag_chain.py
├── .env
├── .env.example
├── .cursorrules
├── requirements.txt
└── README.md
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.