The Great Gatsby RAG Chatbot

Conversational interface to discuss The Great Gatsby (book themes, symbols, characters, and more)

A Retrieval-Augmented Generation (RAG) Web App powered by LangChain and Flask

🌐 Live Demo

👉 Try it here: rag-chatbot-1090137452187.us-west1.run.app

📘 Overview

This project is an AI-powered chatbot that allows users to explore The Great Gatsby book through interactive, AI-driven dialogue.
Using a Retrieval-Augmented Generation (RAG) pipeline built with LangChain and OpenAI , the chatbot retrieves relevant passages from the novel and generates insightful, context-aware answers.

This project demonstrates modern AI integration, vector database usage, and cloud deployment — key skills in data engineering and applied machine learning.

The goal of this project was to build an end-to-end system that solves the "private data" problem in AI, making it a strong template for real-world applications like:

An internal chatbot for company documentation.
A customer support bot trained on product manuals.
A research assistant for specific academic papers.

✨ Key Features

🔍 RAG pipeline Implements a full RAG pipeline that ingests text documents into a searchable ChromaDB vector database.
🧠 LangChain + OpenAI GPT Uses LangChain Expression Language (LCEL) to create a sophisticated chain that understands conversational history for natural follow-up questions.
🗂️ ChromaDB used as the vector store for semantic text search
🌐 Fully responsive chat UI (HTML, CSS, JavaScript) A responsive, floating chat widget built with vanilla HTML, CSS, and JavaScript, providing a modern user experience.
☁️ Deployed on Google Cloud Run with secure environment variable management

🧠 Architecture

The application follows a modern RAG workflow using LangChain Expression Language (LCEL) to ensure answers are grounded in the facts from the provided book.

   User Interface (HTML/JavaScript)
            │
            ▼
   Flask Server (app.py)
            │
            ▼
   LangChain (LCEL Orchestrator)
            │
   ┌────────┴────────┐
   ▼                   ▼
ChromaDB         Chat History
(Retrieve Docs)   (Provide Context)
   │                   │
   └────────┬────────┘
            │
            ▼
      OpenAI API (GPT-3.5)
    (Augmented Generation)
            │
            ▼
   Flask Server (Returns JSON)
            │
            ▼
   User Interface (Displays Answer)

⚙️ Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/<your-username>/gatsby-rag-chatbot.git
cd gatsby-rag-chatbot

2️⃣ Create a Conda Environment (Recommended)

If you use Anaconda or Miniconda, create and activate a new environment:

conda create -n rag-chatbot python=3.10
conda activate rag-chatbot

Alternatively, you can use venv:

python3 -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Environment Variables

Create a .env file in the root directory:

OPENAI_API_KEY="your open ai api key here"
#use chroma path below for local run
CHROMA_DB_PATH="./chroma_db"
#use chroma path below for production run
#CHROMA_DB_PATH="/app/chroma_db"
OPENAI_TEMPERATURE=0.3

⚙️ How to Run

1️⃣ Create the Knowledge Base (Run Once)

Place your book file (named book.txt) in the main project folder. Then, run the following command only once to create the vector database:

python create_database.py

This will create a chroma_db folder containing the book's knowledge base.

2️⃣ Start the Chat App

Once the database is created, run this command to start the Flask web server:

python app.py

Open your web browser and navigate to http://127.0.0.1:5000 to see the application and start chatting.

⚙️ Technical Highlights

🔗 Retrieval-Augmented Generation (RAG) Pipeline: Built using LangChain and Chroma to enable context-aware responses from The Great Gatsby text corpus.

🧠 OpenAI GPT Integration: Utilized ChatOpenAI for natural language generation with adjustable temperature and system prompts.

🗄️ Vector Database Management: Embedded and stored document chunks in ChromaDB using OpenAI embeddings for efficient semantic search.

☁️ Cloud Deployment: Deployed Flask-based RAG application on Google Cloud Run, showcasing cloud scalability and CI/CD integration.

🧰 Backend Engineering: Designed RESTful /chat endpoint, implemented request/response handling, and managed environment variables securely using .env.

🧪 Data Pipeline Development: Used a Jupyter notebook (create_database.ipynb) to preprocess text data and build the vector database pipeline.

📁 Project Structure

.
├── app.py                  # Flask app with LangChain RAG setup
├── create_database.ipynb   # Builds Chroma vector DB from the novel
├── requirements.txt        # Project dependencies
├── templates/
│   └── index.html          # Chat UI page
├── static/
│   ├── style.css           # Styling
│   ├── script.js           # Chat logic
│   └── images/
│       └── cover.jpg       # Book cover image

🧪 Example Interaction

User: What does the green light symbolize?
AI: The green light symbolizes Gatsby's hopes and dreams, particularly his longing for Daisy and the life he envisions with her. It represents his pursuit of a future that seems just out of reach, yet he continues to strive for it. The light also symbolizes the idea of the American Dream and the elusive nature of success and happiness. Gatsby's fixation on the green light reflects his idealistic and romantic nature, as well as his belief in the possibility of a better future. Ultimately, the green light embodies both Gatsby's aspirations and the fleeting nature of dreams.

🚀 Future Improvements

Integrate vector embeddings with fine-tuned LLMs to improve semantic retrieval accuracy and response relevance
Add persistent conversation memory to enable contextual multi-turn dialogue
Build an automated data ingestion pipeline to index additional literary texts into the Chroma database dynamically
Enhance RAG pipeline performance with asynchronous retrieval and model parallelization for faster responses
Deploy continuous monitoring and logging (via Prometheus or Google Cloud Logging) to track latency, API usage, and model quality metrics
Implement authentication and user session management for a personalized AI chat experience

👩‍💻 Author

Dinnara Hitt
Data Analyst & AI/ML Practitioner
📍 California, USA

📜 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Great Gatsby RAG Chatbot

🌐 Live Demo

📘 Overview

✨ Key Features

🧠 Architecture

⚙️ Installation & Setup

1️⃣ Clone the Repository

2️⃣ Create a Conda Environment (Recommended)

3️⃣ Install Dependencies

4️⃣ Configure Environment Variables

⚙️ How to Run

1️⃣ Create the Knowledge Base (Run Once)

2️⃣ Start the Chat App

⚙️ Technical Highlights

📁 Project Structure

🧪 Example Interaction

🚀 Future Improvements

👩‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
chroma_db		chroma_db
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
book.txt		book.txt
create_database.ipynb		create_database.ipynb
create_database.py		create_database.py
requirements.txt		requirements.txt

License

dmhitt/gatsby-rag-chatbot

Folders and files

Latest commit

History

Repository files navigation

The Great Gatsby RAG Chatbot

🌐 Live Demo

📘 Overview

✨ Key Features

🧠 Architecture

⚙️ Installation & Setup

1️⃣ Clone the Repository

2️⃣ Create a Conda Environment (Recommended)

3️⃣ Install Dependencies

4️⃣ Configure Environment Variables

⚙️ How to Run

1️⃣ Create the Knowledge Base (Run Once)

2️⃣ Start the Chat App

⚙️ Technical Highlights

📁 Project Structure

🧪 Example Interaction

🚀 Future Improvements

👩‍💻 Author

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages