Conversational interface to discuss The Great Gatsby (book themes, symbols, characters, and more)
A Retrieval-Augmented Generation (RAG) Web App powered by LangChain and Flask
π Try it here: rag-chatbot-1090137452187.us-west1.run.app
This project is an AI-powered chatbot that allows users to explore The Great Gatsby book through interactive, AI-driven dialogue.
Using a Retrieval-Augmented Generation (RAG) pipeline built with LangChain and OpenAI , the chatbot retrieves relevant passages from the novel and generates insightful, context-aware answers.
This project demonstrates modern AI integration, vector database usage, and cloud deployment β key skills in data engineering and applied machine learning.
The goal of this project was to build an end-to-end system that solves the "private data" problem in AI, making it a strong template for real-world applications like:
- An internal chatbot for company documentation.
- A customer support bot trained on product manuals.
- A research assistant for specific academic papers.
- π RAG pipeline Implements a full RAG pipeline that ingests text documents into a searchable ChromaDB vector database.
- π§ LangChain + OpenAI GPT Uses LangChain Expression Language (LCEL) to create a sophisticated chain that understands conversational history for natural follow-up questions.
- ποΈ ChromaDB used as the vector store for semantic text search
- π Fully responsive chat UI (HTML, CSS, JavaScript) A responsive, floating chat widget built with vanilla HTML, CSS, and JavaScript, providing a modern user experience.
- βοΈ Deployed on Google Cloud Run with secure environment variable management
The application follows a modern RAG workflow using LangChain Expression Language (LCEL) to ensure answers are grounded in the facts from the provided book.
User Interface (HTML/JavaScript)
β
βΌ
Flask Server (app.py)
β
βΌ
LangChain (LCEL Orchestrator)
β
ββββββββββ΄βββββββββ
βΌ βΌ
ChromaDB Chat History
(Retrieve Docs) (Provide Context)
β β
ββββββββββ¬βββββββββ
β
βΌ
OpenAI API (GPT-3.5)
(Augmented Generation)
β
βΌ
Flask Server (Returns JSON)
β
βΌ
User Interface (Displays Answer)
git clone https://github.com/<your-username>/gatsby-rag-chatbot.git
cd gatsby-rag-chatbot
If you use Anaconda or Miniconda, create and activate a new environment:
conda create -n rag-chatbot python=3.10
conda activate rag-chatbot
Alternatively, you can use venv:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Create a .env file in the root directory:
OPENAI_API_KEY="your open ai api key here"
#use chroma path below for local run
CHROMA_DB_PATH="./chroma_db"
#use chroma path below for production run
#CHROMA_DB_PATH="/app/chroma_db"
OPENAI_TEMPERATURE=0.3
Place your book file (named book.txt) in the main project folder. Then, run the following command only once to create the vector database:
python create_database.pyThis will create a chroma_db folder containing the book's knowledge base.
Once the database is created, run this command to start the Flask web server:
python app.pyOpen your web browser and navigate to http://127.0.0.1:5000 to see the application and start chatting.
π Retrieval-Augmented Generation (RAG) Pipeline: Built using LangChain and Chroma to enable context-aware responses from The Great Gatsby text corpus.
π§ OpenAI GPT Integration: Utilized ChatOpenAI for natural language generation with adjustable temperature and system prompts.
ποΈ Vector Database Management: Embedded and stored document chunks in ChromaDB using OpenAI embeddings for efficient semantic search.
βοΈ Cloud Deployment: Deployed Flask-based RAG application on Google Cloud Run, showcasing cloud scalability and CI/CD integration.
π§° Backend Engineering: Designed RESTful /chat endpoint, implemented request/response handling, and managed environment variables securely using .env.
π§ͺ Data Pipeline Development: Used a Jupyter notebook (create_database.ipynb) to preprocess text data and build the vector database pipeline.
.
βββ app.py # Flask app with LangChain RAG setup
βββ create_database.ipynb # Builds Chroma vector DB from the novel
βββ requirements.txt # Project dependencies
βββ templates/
β βββ index.html # Chat UI page
βββ static/
β βββ style.css # Styling
β βββ script.js # Chat logic
β βββ images/
β βββ cover.jpg # Book cover image
User: What does the green light symbolize?
AI: The green light symbolizes Gatsby's hopes and dreams, particularly his longing for Daisy and the life he envisions with her. It represents his pursuit of a future that seems just out of reach, yet he continues to strive for it. The light also symbolizes the idea of the American Dream and the elusive nature of success and happiness. Gatsby's fixation on the green light reflects his idealistic and romantic nature, as well as his belief in the possibility of a better future. Ultimately, the green light embodies both Gatsby's aspirations and the fleeting nature of dreams.
- Integrate vector embeddings with fine-tuned LLMs to improve semantic retrieval accuracy and response relevance
- Add persistent conversation memory to enable contextual multi-turn dialogue
- Build an automated data ingestion pipeline to index additional literary texts into the Chroma database dynamically
- Enhance RAG pipeline performance with asynchronous retrieval and model parallelization for faster responses
- Deploy continuous monitoring and logging (via Prometheus or Google Cloud Logging) to track latency, API usage, and model quality metrics
- Implement authentication and user session management for a personalized AI chat experience
Dinnara Hitt
Data Analyst & AI/ML Practitioner
π California, USA
This project is licensed under the MIT License.