diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 53e34bee..c435f01d 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -50,4 +50,4 @@ jobs: GCP_REGION: ${{ secrets.GCP_REGION }} GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }} run: | - pytest --verbose --nbval-lax python-recipes/RAG/ python-recipes/vector-search python-recipes/redis-intro --ignore python-recipes/RAG/05_nvidia_ai_rag_redis.ipynb \ No newline at end of file + pytest --verbose --nbval-lax python-recipes/RAG/ python-recipes/vector-search python-recipes/redis-intro --ignore python-recipes/RAG/05_nvidia_ai_rag_redis.ipynb --ignore python-recipes/semantic-cache/doc2cache_llama3_1.ipynb diff --git a/README.md b/README.md index 377064f4..d61f8410 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ An estimated 31% of LLM queries are potentially redundant ([source](https://arxi | Recipe | Description | | --- | --- | +| [/semantic-cache/doc2cache_llama3_1.ipynb](python-recipes/semantic-cache/doc2cache_llama3_1.ipynb) | Build a semantic cache using the Doc2Cache framework and Llama3.1 | | [/semantic-cache/semantic_caching_gemini.ipynb](python-recipes/semantic-cache/semantic_caching_gemini.ipynb) | Build a semantic cache with Redis and Google Gemini | ## Advanced RAG diff --git a/assets/Doc2Cache.png b/assets/Doc2Cache.png new file mode 100644 index 00000000..b874751b Binary files /dev/null and b/assets/Doc2Cache.png differ diff --git a/python-recipes/semantic-cache/doc2cache_llama3_1.ipynb b/python-recipes/semantic-cache/doc2cache_llama3_1.ipynb new file mode 100644 index 00000000..2079cf37 --- /dev/null +++ b/python-recipes/semantic-cache/doc2cache_llama3_1.ipynb @@ -0,0 +1,1478 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "aiHV6ip2f5id" + }, + "source": [ + "# Doc2Cache w/ Llama3.1\n", + "\n", + "This recipe demonstrates how to convert a PDF document into a set of pre-defined FAQs that can be used to populate an LLM [Semantic Cache](https://www.redisvl.com/user_guide/llmcache_03.html) using the Llama3.1 LLM." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "np5gjopbPBS9" + }, + "source": [ + "## Motivation\n", + "\n", + "As a framework, `Doc2Cache` solves 3 main problems faced by AI engineers optimizing RAG pipelines:\n", + "\n", + "1. How do you get the benefits of semantic caching from day-1 without waiting for tons of production user traffic to accumulate?\n", + "2. How do you make sure that the semantic cache has valid/factual data in it?\n", + "3. How can you test the quality of a semantic cache without a bunch of \"ground truth\" (labeled) data?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Architecture\n", + "\n", + "`Doc2Cache` is comprised of an end-to-end workflow with a few stages:\n", + "\n", + "- Smaller document chunks are extracted from knowledge base documents (PDFs)\n", + "- Each chunk is presented to the Llama3.1 LLM along with a specialized prompt to extract FAQs\n", + "- Generated FAQs are embedding using an embedding model\n", + "- FAQ embeddings are loaded into a Redis semantic cache instance\n", + "\n", + "![doc2cache](../../assets/Doc2Cache.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "37rbBPKdL09o" + }, + "source": [ + "## 1. Setup\n", + "Before we begin, we must install some required libraries, initialize the LLM instance, create a Redis database, and initialize other required components.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7MMyrImcPPBo" + }, + "source": [ + "### Install required libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "id": "pc-IxYu3wnQm", + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# NBVAL_SKIP\n", + "!pip install redisvl>=0.3.3 unstructured[pdf] sentence-transformers openai\n", + "!pip install langchain-core langchain-community pypdf rapidocr-onnxruntime" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'status': 'ok', 'restart': True}" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import IPython\n", + "\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Init Llama3.1 model with vLLM" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Set key variables necessary for downloading model weights\n", + "\n", + "import os\n", + "\n", + "HUGGING_FACE_HUB_TOKEN = os.getenv(\"HUGGING_FACE_HUB_TOKEN\")\n", + "MODEL_NAME = \"meta-llama/Meta-Llama-3.1-8B-Instruct\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the Llama3.1 model using vLLM docker container" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "89f939208d2851e56000957da56813a1d641bf5dd3197291c15f79d3bf51dc56\n" + ] + } + ], + "source": [ + "!docker run -d \\\n", + " --runtime nvidia --gpus all \\\n", + " -v ~/.cache/huggingface:/root/.cache/huggingface \\\n", + " --env \"HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN\" \\\n", + " -p 8000:8000 --ipc=host vllm/vllm-openai:latest \\\n", + " --model $MODEL_NAME \\\n", + " --gpu-memory-utilization 0.95 \\\n", + " --max-model-len 36640" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "THynXaAhPRNb" + }, + "source": [ + "### Connect to LLM and vectorizer instances" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "4DvCWkLrUIbG", + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_community.llms import VLLMOpenAI\n", + "\n", + "# Create LLM instance\n", + "llama = VLLMOpenAI(\n", + " openai_api_key=\"EMPTY\",\n", + " openai_api_base=\"http://localhost:8000/v1\",\n", + " model_name=MODEL_NAME,\n", + " temperature=0.1\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "Kge8K1TQPoaD", + "tags": [] + }, + "outputs": [], + "source": [ + "from redisvl.utils.vectorize import HFTextVectorizer\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "# Ensure the tmp cache directory exists\n", + "os.makedirs('/tmp/huggingface', exist_ok=True)\n", + "\n", + "class Vectorizer(HFTextVectorizer):\n", + " def _initialize_client(self, model: str):\n", + " \"\"\"Setup the HuggingFace client\"\"\"\n", + " # Dynamic import of the cohere module\\\n", + " try:\n", + " from sentence_transformers import SentenceTransformer\n", + " except ImportError:\n", + " raise ImportError(\n", + " \"HFTextVectorizer requires the sentence-transformers library. \"\n", + " \"Please install with `pip install sentence-transformers`\"\n", + " )\n", + "\n", + " self._client = SentenceTransformer(model, cache_folder='/tmp/huggingface/transformers')\n", + "\n", + "\n", + "vectorizer = Vectorizer(\"sentence-transformers/all-mpnet-base-v2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p5kx9ePDwwp6", + "tags": [] + }, + "source": [ + "#### Run Redis locally\n", + "If you have a Redis db running elsewhere with [Redis Stack](https://redis.io/docs/about/about-stack/) installed, you don't need to run it on this machine. You can skip to the \"Connect to Redis server\" step." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vs4KZURX4XpT", + "outputId": "04ed262d-8751-4b69-bda4-e519ebe324e2", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unable to find image 'redis/redis-stack-server:latest' locally\n", + "latest: Pulling from redis/redis-stack-server\n", + "\n", + "\u001b[1B021b0277: Pulling fs layer \n", + "\u001b[1B764663d7: Pulling fs layer \n", + "\u001b[1Bb700ef54: Pulling fs layer \n", + "\u001b[1B2c477937: Pulling fs layer \n", + "\u001b[1B310e49ba: Pulling fs layer \n", + "\u001b[1B2f33031a: Pulling fs layer \n", + "\u001b[1B9eb144bd: Pulling fs layer \n", + "\u001b[1B77c6ca59: Pulling fs layer \n", + "\u001b[1Ba0f7b647: Pulling fs layer \n", + "\u001b[1B1312cb2e: Pulling fs layer \n", + "\u001b[1BDigest: sha256:887cf87cc744e4588ccade336d0dbb943e4e46330f738653ccb3a7a55df2f1862K\u001b[5A\u001b[2K\u001b[11A\u001b[2K\u001b[4A\u001b[2K\u001b[11A\u001b[2K\u001b[7A\u001b[2K\u001b[11A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[11A\u001b[2K\u001b[7A\u001b[2K\u001b[11A\u001b[2K\u001b[3A\u001b[2K\u001b[2A\u001b[2K\u001b[11A\u001b[2K\u001b[7A\u001b[2K\u001b[11A\u001b[2K\u001b[7A\u001b[2K\u001b[10A\u001b[2K\u001b[10A\u001b[2K\u001b[10A\u001b[2K\u001b[10A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[8A\u001b[2K\u001b[7A\u001b[2K\u001b[8A\u001b[2K\u001b[8A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[7A\u001b[2K\u001b[6A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[4A\u001b[2K\u001b[3A\u001b[2K\u001b[1A\u001b[2K\n", + "Status: Downloaded newer image for redis/redis-stack-server:latest\n", + "6ff8add913c50902aca6df15b28a53935eebbaf12a3ad32f190f0efcd76f9e0c\n" + ] + } + ], + "source": [ + "!docker run -d --name my-redis-stack -p 6379:6379 redis/redis-stack-server:latest" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VsNJn75A0VJm" + }, + "source": [ + "## 2. Implement Doc2Cache workflow\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "HG_w_jPSmXVb", + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_community.document_loaders import PyPDFLoader\n", + "from langchain_text_splitters import RecursiveCharacterTextSplitter" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BtTSE4ZmmDt5", + "outputId": "d5324418-4654-4e13-a8e2-57c0370bf36e", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Done preprocessing. Created 123 chunks of the original pdf ../RAG/resources/amzn-10k-2023.pdf\n" + ] + } + ], + "source": [ + "doc_path = \"../RAG/resources/amzn-10k-2023.pdf\"\n", + "\n", + "# set up the file loader/extractor and text splitter to create chunks\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=3200, chunk_overlap=50\n", + ")\n", + "loader = PyPDFLoader(doc_path, extract_images=True)\n", + "\n", + "# extract, load, and make chunks\n", + "chunks = loader.load_and_split(text_splitter)\n", + "\n", + "print(\"Done preprocessing. Created\", len(chunks), \"chunks of the original pdf\", doc_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "rRh5Dw5jmEBr", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "fa442464-8193-4e02-c5dd-749a7fdb22ad", + "tags": [] + }, + "outputs": [], + "source": [ + "for chunk in chunks:\n", + " print(\"############\", \"\\n\", chunk.page_content)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cLxoFY9Y1iIv" + }, + "source": [ + "#### Extract FAQs with Llama3.1\n", + "\n", + "First we will define a chain to properly prompt an LLM to extract FAQs as a JSON object per node." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "id": "JfJ_DdvL1hdQ", + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain import PromptTemplate\n", + "from langchain_core.output_parsers import JsonOutputParser\n", + "from langchain_core.prompts import PromptTemplate\n", + "from pydantic import BaseModel, Field\n", + "from typing import List\n", + "\n", + "\n", + "class QuestionAnswer(BaseModel):\n", + " question: str = Field(description=\"Frequently asked question about information in the document.\")\n", + " answer: str = Field(description=\"Factual answer from the LLM related to the user question.\")\n", + "\n", + "class FAQs(BaseModel):\n", + " faqs: List[QuestionAnswer] = Field(description=\"List of question/answer pairs extracted from the document\")\n", + "\n", + "\n", + "# Set up a parser + inject instructions into the prompt template.\n", + "json_parser = JsonOutputParser(pydantic_object=FAQs)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "id": "5IiIEF1G1wLM", + "tags": [] + }, + "outputs": [], + "source": [ + "prompt = PromptTemplate(\n", + " template=\"\"\"\n", + " You are a document intelligence tool used to extract FAQs\n", + " from portions of financial 10k SEC doc for Amazon.\n", + "\n", + " For each small chunk from the doc and your task is to extract\n", + " possible frequently asked questions derived straight from the content.\n", + " Put yourself in the shoes of a potential human reader anticipate what\n", + " real world questions they might have.\n", + " \n", + " You must reply with only a JSON object that captures the structured output\n", + " according to the following string schema. No exceptions:\n", + " \n", + " {format_instructions}\n", + "\n", + " Document Chunk:\\n{doc}\\n\n", + " \n", + " FAQs JSON: \n", + " \"\"\",\n", + " input_variables=[\"doc\"],\n", + " partial_variables={\"format_instructions\": json_parser.get_format_instructions()},\n", + ")\n", + "\n", + "doc2cache = prompt | llama | json_parser" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u5lsMVKc2TsF" + }, + "source": [ + "Let's test this out with a sample document from wikipedia first." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "id": "rKibkgR_2W28", + "tags": [] + }, + "outputs": [], + "source": [ + "sample_doc = \"\"\"Obi-Wan Kenobi (Ewan McGregor) is a young apprentice Jedi knight\n", + "under the tutelage of Qui-Gon Jinn (Liam Neeson) ; Anakin Skywalker (Jake Lloyd),\n", + "who will later father Luke Skywalker and become known as Darth Vader, is just\n", + "a 9-year-old boy. When the Trade Federation cuts off all routes to the planet\n", + "Naboo, Qui-Gon and Obi-Wan are assigned to settle the matter.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "id": "VAhZYyf22bW9", + "tags": [] + }, + "outputs": [], + "source": [ + "faqs = doc2cache.invoke({\"doc\": sample_doc})" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XjsEhdjl2dDj", + "outputId": "2f964435-5c52-452b-b148-4e705d4dd1fc", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'faqs': [{'question': 'Who is Obi-Wan Kenobi?',\n", + " 'answer': 'Obi-Wan Kenobi is a young apprentice Jedi knight.'},\n", + " {'question': 'Who is Qui-Gon Jinn?',\n", + " 'answer': 'Qui-Gon Jinn is a Jedi knight.'},\n", + " {'question': 'Who is Anakin Skywalker?',\n", + " 'answer': 'Anakin Skywalker is a 9-year-old boy who will later father Luke Skywalker and become known as Darth Vader.'}]}" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "faqs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oSRYapAa4Oue" + }, + "source": [ + "Now we can apply this same logic to nodes from our pdf document." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "id": "vOSHH6HO4TpV", + "tags": [] + }, + "outputs": [], + "source": [ + "def extract_faqs(chunks):\n", + " all_faqs = []\n", + " for i, chunk in enumerate(chunks):\n", + " print(f\"Processing chunk {i+1} of {len(chunks)}\", flush=True)\n", + " try:\n", + " results = doc2cache.invoke({\"doc\": chunk.page_content})\n", + " except Exception as e:\n", + " print(\"..Skipping chunk due to error decoding LLM response\", str(e))\n", + " if results and results.get(\"faqs\"):\n", + " all_faqs.extend(results[\"faqs\"])\n", + " return all_faqs" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "vIepVu60Jmcq", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "42bd5864-2d2d-4ba2-ea44-a1c2f04ef967", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Processing chunk 1 of 123\n", + "Processing chunk 2 of 123\n", + "Processing chunk 3 of 123\n", + "Processing chunk 4 of 123\n", + "Processing chunk 5 of 123\n", + "Processing chunk 6 of 123\n", + "Processing chunk 7 of 123\n", + "Processing chunk 8 of 123\n", + "Processing chunk 9 of 123\n", + "Processing chunk 10 of 123\n", + "..Skipping chunk due to error decoding LLM response Invalid json output: {\"faqs\": [\n", + " {\"question\": \"What are the challenges faced by Amazon in the competitive market?\", \"answer\": \"Amazon faces challenges such as competitors entering into business combinations or alliances, established companies expanding to new market segments, and new technologies increasing competition.\"},\n", + " {\"question\": \"What are the risks associated with Amazon's expansion into new products and services?\", \"answer\": \"Amazon's expansion into new products and services is subject to risks such as limited experience in new market segments, customer adoption, service disruptions, delays, setbacks, or failures or quality issues, and potential write-down or write-off of investments.\"},\n", + " {\"question\": \"What are the potential consequences of Amazon's failure to realize the benefits of its investments in new technologies, products, or services?\", \"answer\": \"If Amazon fails to realize the benefits of its investments in new technologies, products, or services, the value of those investments may be written down or written off.\"}\n", + " ]}\n", + " \n", + " Here is the output schema:\n", + " {\n", + " \"$defs\": {\n", + " \"QuestionAnswer\": {\n", + " \"properties\": {\n", + " \"question\": {\n", + " \"description\": \"Frequently asked question about information in the document.\",\n", + " \"title\": \"Question\",\n", + " \"type\": \"string\"\n", + " },\n", + "Processing chunk 11 of 123\n", + "Processing chunk 12 of 123\n", + "Processing chunk 13 of 123\n", + "Processing chunk 14 of 123\n", + "Processing chunk 15 of 123\n", + "Processing chunk 16 of 123\n", + "Processing chunk 17 of 123\n", + "Processing chunk 18 of 123\n", + "Processing chunk 19 of 123\n", + "Processing chunk 20 of 123\n", + "Processing chunk 21 of 123\n", + "Processing chunk 22 of 123\n", + "Processing chunk 23 of 123\n", + "Processing chunk 24 of 123\n", + "Processing chunk 25 of 123\n", + "Processing chunk 26 of 123\n", + "Processing chunk 27 of 123\n", + "Processing chunk 28 of 123\n", + "Processing chunk 29 of 123\n", + "Processing chunk 30 of 123\n", + "Processing chunk 31 of 123\n", + "Processing chunk 32 of 123\n", + "Processing chunk 33 of 123\n", + "Processing chunk 34 of 123\n", + "Processing chunk 35 of 123\n", + "Processing chunk 36 of 123\n", + "Processing chunk 37 of 123\n", + "Processing chunk 38 of 123\n", + "Processing chunk 39 of 123\n", + "Processing chunk 40 of 123\n", + "Processing chunk 41 of 123\n", + "Processing chunk 42 of 123\n", + "Processing chunk 43 of 123\n", + "Processing chunk 44 of 123\n", + "Processing chunk 45 of 123\n", + "Processing chunk 46 of 123\n", + "Processing chunk 47 of 123\n", + "Processing chunk 48 of 123\n", + "Processing chunk 49 of 123\n", + "Processing chunk 50 of 123\n", + "Processing chunk 51 of 123\n", + "Processing chunk 52 of 123\n", + "Processing chunk 53 of 123\n", + "Processing chunk 54 of 123\n", + "Processing chunk 55 of 123\n", + "Processing chunk 56 of 123\n", + "Processing chunk 57 of 123\n", + "Processing chunk 58 of 123\n", + "Processing chunk 59 of 123\n", + "Processing chunk 60 of 123\n", + "Processing chunk 61 of 123\n", + "Processing chunk 62 of 123\n", + "Processing chunk 63 of 123\n", + "Processing chunk 64 of 123\n", + "Processing chunk 65 of 123\n", + "Processing chunk 66 of 123\n", + "Processing chunk 67 of 123\n", + "Processing chunk 68 of 123\n", + "Processing chunk 69 of 123\n", + "Processing chunk 70 of 123\n", + "Processing chunk 71 of 123\n", + "Processing chunk 72 of 123\n", + "Processing chunk 73 of 123\n", + "Processing chunk 74 of 123\n", + "Processing chunk 75 of 123\n", + "Processing chunk 76 of 123\n", + "Processing chunk 77 of 123\n", + "Processing chunk 78 of 123\n", + "Processing chunk 79 of 123\n", + "Processing chunk 80 of 123\n", + "Processing chunk 81 of 123\n", + "Processing chunk 82 of 123\n", + "Processing chunk 83 of 123\n", + "Processing chunk 84 of 123\n", + "Processing chunk 85 of 123\n", + "Processing chunk 86 of 123\n", + "Processing chunk 87 of 123\n", + "Processing chunk 88 of 123\n", + "Processing chunk 89 of 123\n", + "Processing chunk 90 of 123\n", + "Processing chunk 91 of 123\n", + "Processing chunk 92 of 123\n", + "Processing chunk 93 of 123\n", + "Processing chunk 94 of 123\n", + "Processing chunk 95 of 123\n", + "Processing chunk 96 of 123\n", + "Processing chunk 97 of 123\n", + "Processing chunk 98 of 123\n", + "Processing chunk 99 of 123\n", + "Processing chunk 100 of 123\n", + "Processing chunk 101 of 123\n", + "Processing chunk 102 of 123\n", + "Processing chunk 103 of 123\n", + "Processing chunk 104 of 123\n", + "Processing chunk 105 of 123\n", + "Processing chunk 106 of 123\n", + "Processing chunk 107 of 123\n", + "Processing chunk 108 of 123\n", + "Processing chunk 109 of 123\n", + "Processing chunk 110 of 123\n", + "Processing chunk 111 of 123\n", + "Processing chunk 112 of 123\n", + "Processing chunk 113 of 123\n", + "Processing chunk 114 of 123\n", + "Processing chunk 115 of 123\n", + "Processing chunk 116 of 123\n", + "Processing chunk 117 of 123\n", + "Processing chunk 118 of 123\n", + "Processing chunk 119 of 123\n", + "Processing chunk 120 of 123\n", + "Processing chunk 121 of 123\n", + "Processing chunk 122 of 123\n", + "Processing chunk 123 of 123\n" + ] + } + ], + "source": [ + "faqs = extract_faqs(chunks)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IhTEt-2HO-cz", + "outputId": "9a7489f7-e3f8-47bb-ba4b-1dbcb80c20d1", + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Generated 534 frequently asked questions.\n" + ] + } + ], + "source": [ + "print(\"Generated\", len(faqs), \"frequently asked questions.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-LBrSPkJKhV-", + "outputId": "7a59c6d0-fbc1-4ce6-de83-224f2e965c84", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'question': \"What factors can cause demand for Amazon's products and services to fluctuate?\",\n", + " 'answer': 'Seasonality, promotions, product launches, unforeseeable events such as recessionary fears, natural or human-caused disasters, extreme weather, or geopolitical events.'},\n", + " {'question': \"What are the potential consequences of Amazon's failure to stock or restock popular products?\",\n", + " 'answer': 'Significant affect on revenue and future growth.'},\n", + " {'question': 'What are the potential consequences of Amazon overstocking products?',\n", + " 'answer': 'Significant inventory markdowns or write-offs and commitment costs, which could materially reduce profitability.'},\n", + " {'question': \"What are the potential consequences of Amazon's websites experiencing system interruptions?\",\n", + " 'answer': 'Reduced volume of goods offered or sold and the attractiveness of products and services.'},\n", + " {}]" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "faqs[50:55]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JaZe-6xILHPS" + }, + "source": [ + "#### Index FAQs into Redis\n", + "\n", + "Now we will create embeddings of each prompt and load them into Redis for our semantic cache." + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OETsrYvfuzmX", + "outputId": "da469dda-f4ca-45f1-e10d-7ef6fc31b65b", + "tags": [] + }, + "outputs": [], + "source": [ + "from redisvl.extensions.llmcache import SemanticCache\n", + "\n", + "\n", + "def to_semantic_cache(faqs: list) -> SemanticCache:\n", + " \"\"\"Convert list of FAQs into a semantic cache instance.\"\"\"\n", + " cache = SemanticCache(\n", + " name=\"amzn_10k_cache\",\n", + " redis_url=\"redis://localhost:6379\", # point to your own Redis URL if necessary\n", + " vectorizer=vectorizer,\n", + " distance_threshold=0.2\n", + " )\n", + " for i, faq in enumerate(faqs):\n", + " print(i)\n", + " if faq and \"question\" in faq and \"answer\" in faq:\n", + " cache.store(\n", + " prompt=faq[\"question\"],\n", + " response=faq[\"answer\"]\n", + " )\n", + " return cache" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "18:22:34 redisvl.index.index INFO Index already exists, not overwriting.\n", + "0\n", + "1\n", + "2\n", + "3\n", + "4\n", + "5\n", + "6\n", + "7\n", + "8\n", + "9\n", + "10\n", + "11\n", + "12\n", + "13\n", + "14\n", + "15\n", + "16\n", + "17\n", + "18\n", + "19\n", + "20\n", + "21\n", + "22\n", + "23\n", + "24\n", + "25\n", + "26\n", + "27\n", + "28\n", + "29\n", + "30\n", + "31\n", + "32\n", + "33\n", + "34\n", + "35\n", + "36\n", + "37\n", + "38\n", + "39\n", + "40\n", + "41\n", + "42\n", + "43\n", + "44\n", + "45\n", + "46\n", + "47\n", + "48\n", + "49\n", + "50\n", + "51\n", + "52\n", + "53\n", + "54\n", + "55\n", + "56\n", + "57\n", + "58\n", + "59\n", + "60\n", + "61\n", + "62\n", + "63\n", + "64\n", + "65\n", + "66\n", + "67\n", + "68\n", + "69\n", + "70\n", + "71\n", + "72\n", + "73\n", + "74\n", + "75\n", + "76\n", + "77\n", + "78\n", + "79\n", + "80\n", + "81\n", + "82\n", + "83\n", + "84\n", + "85\n", + "86\n", + "87\n", + "88\n", + "89\n", + "90\n", + "91\n", + "92\n", + "93\n", + "94\n", + "95\n", + "96\n", + "97\n", + "98\n", + "99\n", + "100\n", + "101\n", + "102\n", + "103\n", + "104\n", + "105\n", + "106\n", + "107\n", + "108\n", + "109\n", + "110\n", + "111\n", + "112\n", + "113\n", + "114\n", + "115\n", + "116\n", + "117\n", + "118\n", + "119\n", + "120\n", + "121\n", + "122\n", + "123\n", + "124\n", + "125\n", + "126\n", + "127\n", + "128\n", + "129\n", + "130\n", + "131\n", + "132\n", + "133\n", + "134\n", + "135\n", + "136\n", + "137\n", + "138\n", + "139\n", + "140\n", + "141\n", + "142\n", + "143\n", + "144\n", + "145\n", + "146\n", + "147\n", + "148\n", + "149\n", + "150\n", + "151\n", + "152\n", + "153\n", + "154\n", + "155\n", + "156\n", + "157\n", + "158\n", + "159\n", + "160\n", + "161\n", + "162\n", + "163\n", + "164\n", + "165\n", + "166\n", + "167\n", + "168\n", + "169\n", + "170\n", + "171\n", + "172\n", + "173\n", + "174\n", + "175\n", + "176\n", + "177\n", + "178\n", + "179\n", + "180\n", + "181\n", + "182\n", + "183\n", + "184\n", + "185\n", + "186\n", + "187\n", + "188\n", + "189\n", + "190\n", + "191\n", + "192\n", + "193\n", + "194\n", + "195\n", + "196\n", + "197\n", + "198\n", + "199\n", + "200\n", + "201\n", + "202\n", + "203\n", + "204\n", + "205\n", + "206\n", + "207\n", + "208\n", + "209\n", + "210\n", + "211\n", + "212\n", + "213\n", + "214\n", + "215\n", + "216\n", + "217\n", + "218\n", + "219\n", + "220\n", + "221\n", + "222\n", + "223\n", + "224\n", + "225\n", + "226\n", + "227\n", + "228\n", + "229\n", + "230\n", + "231\n", + "232\n", + "233\n", + "234\n", + "235\n", + "236\n", + "237\n", + "238\n", + "239\n", + "240\n", + "241\n", + "242\n", + "243\n", + "244\n", + "245\n", + "246\n", + "247\n", + "248\n", + "249\n", + "250\n", + "251\n", + "252\n", + "253\n", + "254\n", + "255\n", + "256\n", + "257\n", + "258\n", + "259\n", + "260\n", + "261\n", + "262\n", + "263\n", + "264\n", + "265\n", + "266\n", + "267\n", + "268\n", + "269\n", + "270\n", + "271\n", + "272\n", + "273\n", + "274\n", + "275\n", + "276\n", + "277\n", + "278\n", + "279\n", + "280\n", + "281\n", + "282\n", + "283\n", + "284\n", + "285\n", + "286\n", + "287\n", + "288\n", + "289\n", + "290\n", + "291\n", + "292\n", + "293\n", + "294\n", + "295\n", + "296\n", + "297\n", + "298\n", + "299\n", + "300\n", + "301\n", + "302\n", + "303\n", + "304\n", + "305\n", + "306\n", + "307\n", + "308\n", + "309\n", + "310\n", + "311\n", + "312\n", + "313\n", + "314\n", + "315\n", + "316\n", + "317\n", + "318\n", + "319\n", + "320\n", + "321\n", + "322\n", + "323\n", + "324\n", + "325\n", + "326\n", + "327\n", + "328\n", + "329\n", + "330\n", + "331\n", + "332\n", + "333\n", + "334\n", + "335\n", + "336\n", + "337\n", + "338\n", + "339\n", + "340\n", + "341\n", + "342\n", + "343\n", + "344\n", + "345\n", + "346\n", + "347\n", + "348\n", + "349\n", + "350\n", + "351\n", + "352\n", + "353\n", + "354\n", + "355\n", + "356\n", + "357\n", + "358\n", + "359\n", + "360\n", + "361\n", + "362\n", + "363\n", + "364\n", + "365\n", + "366\n", + "367\n", + "368\n", + "369\n", + "370\n", + "371\n", + "372\n", + "373\n", + "374\n", + "375\n", + "376\n", + "377\n", + "378\n", + "379\n", + "380\n", + "381\n", + "382\n", + "383\n", + "384\n", + "385\n", + "386\n", + "387\n", + "388\n", + "389\n", + "390\n", + "391\n", + "392\n", + "393\n", + "394\n", + "395\n", + "396\n", + "397\n", + "398\n", + "399\n", + "400\n", + "401\n", + "402\n", + "403\n", + "404\n", + "405\n", + "406\n", + "407\n", + "408\n", + "409\n", + "410\n", + "411\n", + "412\n", + "413\n", + "414\n", + "415\n", + "416\n", + "417\n", + "418\n", + "419\n", + "420\n", + "421\n", + "422\n", + "423\n", + "424\n", + "425\n", + "426\n", + "427\n", + "428\n", + "429\n", + "430\n", + "431\n", + "432\n", + "433\n", + "434\n", + "435\n", + "436\n", + "437\n", + "438\n", + "439\n", + "440\n", + "441\n", + "442\n", + "443\n", + "444\n", + "445\n", + "446\n", + "447\n", + "448\n", + "449\n", + "450\n", + "451\n", + "452\n", + "453\n", + "454\n", + "455\n", + "456\n", + "457\n", + "458\n", + "459\n", + "460\n", + "461\n", + "462\n", + "463\n", + "464\n", + "465\n", + "466\n", + "467\n", + "468\n", + "469\n", + "470\n", + "471\n", + "472\n", + "473\n", + "474\n", + "475\n", + "476\n", + "477\n", + "478\n", + "479\n", + "480\n", + "481\n", + "482\n", + "483\n", + "484\n", + "485\n", + "486\n", + "487\n", + "488\n", + "489\n", + "490\n", + "491\n", + "492\n", + "493\n", + "494\n", + "495\n", + "496\n", + "497\n", + "498\n", + "499\n", + "500\n", + "501\n", + "502\n", + "503\n", + "504\n", + "505\n", + "506\n", + "507\n", + "508\n", + "509\n", + "510\n", + "511\n", + "512\n", + "513\n", + "514\n", + "515\n", + "516\n", + "517\n", + "518\n", + "519\n", + "520\n", + "521\n", + "522\n", + "523\n", + "524\n", + "525\n", + "526\n", + "527\n", + "528\n", + "529\n", + "530\n", + "531\n", + "532\n", + "533\n" + ] + } + ], + "source": [ + "# load doc2cache outputs into Redis semantic cache\n", + "cache = to_semantic_cache(faqs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IDtXfaB8MCAx" + }, + "source": [ + "## 3. Test the semantic cache" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "lgneMQDDMEhd", + "outputId": "8607536d-c6bd-4157-8e20-4f07e62ec76e", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'entry_id': 'ac61cfeee88b0468599ec8f79dc54b54a76defd4476a1b388c41c207d7b4e749',\n", + " 'prompt': 'How many employees does Amazon have?',\n", + " 'response': 'As of December 31, 2022, Amazon employed approximately 1,541,000 full-time and part-time employees.',\n", + " 'vector_distance': 0.0474983453751,\n", + " 'inserted_at': 1727288555.11,\n", + " 'updated_at': 1727288555.11,\n", + " 'key': 'amzn_10k_cache:ac61cfeee88b0468599ec8f79dc54b54a76defd4476a1b388c41c207d7b4e749'}]" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cache.check(\"How many employees work at Amazon?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "F72oqivmMdMM", + "outputId": "10c05825-2338-49a4-9f0c-7e6830c2a2c7", + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'entry_id': '969e0aa725337085711033c81202da3a0287ec2b736b17a026f0df79592bbbd0',\n", + " 'prompt': \"What is Amazon's business principle?\",\n", + " 'response': \"Amazon's business principle is customer obsession rather than competitor focus, passion for invention, commitment to operational excellence, and long-term thinking.\",\n", + " 'vector_distance': 0.0554879903793,\n", + " 'inserted_at': 1727288554.92,\n", + " 'updated_at': 1727288554.92,\n", + " 'key': 'amzn_10k_cache:969e0aa725337085711033c81202da3a0287ec2b736b17a026f0df79592bbbd0'}]" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cache.check(\"What are Amazon's business principles?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T1PdtQ6oMspe" + }, + "source": [ + "## Cleanup" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": { + "id": "ixmj0zaSPVGR", + "tags": [] + }, + "outputs": [], + "source": [ + "cache.delete()" + ] + } + ], + "metadata": { + "colab": { + "gpuType": "T4", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}