diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index c435f01..cf615e6 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -50,4 +50,4 @@ jobs:
GCP_REGION: ${{ secrets.GCP_REGION }}
GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
run: |
- pytest --verbose --nbval-lax python-recipes/RAG/ python-recipes/vector-search python-recipes/redis-intro --ignore python-recipes/RAG/05_nvidia_ai_rag_redis.ipynb --ignore python-recipes/semantic-cache/doc2cache_llama3_1.ipynb
+ pytest --verbose --nbval-lax python-recipes/RAG/ python-recipes/vector-search python-recipes/redis-intro python-recipes/recommendation-systems --ignore python-recipes/RAG/05_nvidia_ai_rag_redis.ipynb --ignore python-recipes/semantic-cache/doc2cache_llama3_1.ipynb
diff --git a/README.md b/README.md
index 95f9413..deac85d 100644
--- a/README.md
+++ b/README.md
@@ -96,6 +96,11 @@ For further insights on enhancing RAG applications with dense content representa
## Recommendation systems
+| Recipe | Description |
+| --- | --- |
+| [/recommendation-systems/content_filtering.ipynb](python-recipes/recommendation-systems/content_filtering.ipynb) | Intro content filtering example with redisvl |
+
+### See also
An exciting example of how Redis can power production-ready systems is highlighted in our collaboration with [NVIDIA](https://developer.nvidia.com/blog/offline-to-online-feature-storage-for-real-time-recommendation-systems-with-nvidia-merlin/) to construct a state-of-the-art recommendation system.
Within [this repository](https://github.com/redis-developer/redis-nvidia-recsys), you'll find three examples, each escalating in complexity, showcasing the process of building such a system.
diff --git a/python-recipes/RAG/01_redisvl.ipynb b/python-recipes/RAG/01_redisvl.ipynb
index a30bea9..af72deb 100644
--- a/python-recipes/RAG/01_redisvl.ipynb
+++ b/python-recipes/RAG/01_redisvl.ipynb
@@ -439,7 +439,7 @@
" 'chunk_id': i,\n",
" 'content': chunk.page_content,\n",
" # For HASH -- must convert embeddings to bytes\n",
- " 'text_embedding': array_to_buffer(embeddings[i])\n",
+ " 'text_embedding': array_to_buffer(embeddings[i], dtype='float32')\n",
" } for i, chunk in enumerate(chunks)\n",
"]\n",
"\n",
diff --git a/python-recipes/RAG/04_advanced_redisvl.ipynb b/python-recipes/RAG/04_advanced_redisvl.ipynb
index 8424cf4..25e1eb8 100644
--- a/python-recipes/RAG/04_advanced_redisvl.ipynb
+++ b/python-recipes/RAG/04_advanced_redisvl.ipynb
@@ -574,7 +574,7 @@
" 'chunk_id': f'{i}',\n",
" 'proposition': proposition,\n",
" # For HASH -- must convert embeddings to bytes\n",
- " 'text_embedding': array_to_buffer(prop_embeddings[i])\n",
+ " 'text_embedding': array_to_buffer(prop_embeddings[i], dtype=\"float32\")\n",
" } for i, proposition in enumerate(propositions)\n",
"]\n",
"\n",
diff --git a/python-recipes/recommendation-systems/content_filtering.ipynb b/python-recipes/recommendation-systems/content_filtering.ipynb
new file mode 100644
index 0000000..3d541e9
--- /dev/null
+++ b/python-recipes/recommendation-systems/content_filtering.ipynb
@@ -0,0 +1,639 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "# Content Filtering in RedisVL\n",
+ "\n",
+ "In this recipe you'll learn how to build a content filtering recommender system (RecSys) from scratch using RedisVl and an IMDB movies dataset.\n",
+ "\n",
+ "## Let's Begin!\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Recommendation systems are a common application of machine learning and serve many industries from e-commerce to music streaming platforms.\n",
+ "\n",
+ "There are many different architechtures that can be followed to build a recommender system. \n",
+ "\n",
+ "In this notebook we'll demonstrate how to build a [content filtering](https://en.wikipedia.org/wiki/Recommender_system#:~:text=of%20hybrid%20systems.-,Content%2Dbased%20filtering,-%5Bedit%5D)\n",
+ "recommender and use the movies dataset as our example data."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Content Filtering recommender systems are built on the premise that a person will want to be recommended things that are similar to things they already like.\n",
+ "\n",
+ "In the case of movies, if a person watches and enjoys a nature documentary we should recommend other nature documentaries. Or if they like classic black & white horror films we should recommend more of those.\n",
+ "\n",
+ "The question we need to answer is, 'what does it mean for movies to be similar?'. There are exact matching strategies, like using a movie's labelled genre like 'Horror', or 'Sci Fi', but that can lock people in to only a few genres. Or what if it's not the genre that a person likes, but certain story arcs that are common among many genres?\n",
+ "\n",
+ "For our content filtering recommender we'll measure similarity between movies as semantic similarity of their descriptions and keywords."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "## IMPORTS\n",
+ "import pandas as pd\n",
+ "import ast\n",
+ "import os\n",
+ "import pickle\n",
+ "import requests\n",
+ "\n",
+ "# Replace values below with your own if using Redis Cloud instance\n",
+ "REDIS_HOST = os.getenv(\"REDIS_HOST\", \"localhost\") # ex: \"redis-18374.c253.us-central1-1.gce.cloud.redislabs.com\"\n",
+ "REDIS_PORT = os.getenv(\"REDIS_PORT\", \"6379\") # ex: 18374\n",
+ "REDIS_PASSWORD = os.getenv(\"REDIS_PASSWORD\", \"\") # ex: \"1TNxTEdYRDgIDKM2gDfasupCADXXXX\"\n",
+ "\n",
+ "# If SSL is enabled on the endpoint, use rediss:// as the URL prefix\n",
+ "REDIS_URL = f\"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prepare The Dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Start by downloading the movies data and doing a quick inspection of it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " title | \n",
+ " runtime | \n",
+ " rating | \n",
+ " rating_count | \n",
+ " genres | \n",
+ " overview | \n",
+ " keywords | \n",
+ " director | \n",
+ " cast | \n",
+ " writer | \n",
+ " year | \n",
+ " path | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " The Story of the Kelly Gang | \n",
+ " 1 hour 10 minutes | \n",
+ " 6.0 | \n",
+ " 772 | \n",
+ " ['Action', 'Adventure', 'Biography'] | \n",
+ " Story of Ned Kelly, an infamous 19th-century A... | \n",
+ " ['ned kelly', 'australia', 'historic figure', ... | \n",
+ " Charles Tait | \n",
+ " ['Elizabeth Tait', 'John Tait', 'Nicholas Brie... | \n",
+ " Charles Tait | \n",
+ " 1906 | \n",
+ " /title/tt0000574/ | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " Fantômas - À l'ombre de la guillotine | \n",
+ " not-released | \n",
+ " 6.9 | \n",
+ " 2300 | \n",
+ " ['Crime', 'Drama'] | \n",
+ " Inspector Juve is tasked to investigate and ca... | \n",
+ " ['silent film', 'france', 'hotel', 'duchess', ... | \n",
+ " Louis Feuillade | \n",
+ " ['Louis Feuillade', 'Pierre Souvestre', 'René ... | \n",
+ " Marcel Allain | \n",
+ " 1913 | \n",
+ " /title/tt0002844/ | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " Cabiria | \n",
+ " 2 hours 28 minutes | \n",
+ " 7.1 | \n",
+ " 3500 | \n",
+ " ['Adventure', 'Drama', 'History'] | \n",
+ " Cabiria is a Roman child when her home is dest... | \n",
+ " ['carthage', 'slave', 'moloch', '3rd century b... | \n",
+ " Giovanni Pastrone | \n",
+ " ['Titus Livius', 'Giovanni Pastrone', 'Italia ... | \n",
+ " Gabriele D'Annunzio | \n",
+ " 1914 | \n",
+ " /title/tt0003740/ | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " The Life of General Villa | \n",
+ " not-released | \n",
+ " 6.7 | \n",
+ " 65 | \n",
+ " ['Action', 'Adventure', 'Biography'] | \n",
+ " The life and career of Panccho Villa from youn... | \n",
+ " ['chihuahua mexico', 'chihuahua', 'sonora mexi... | \n",
+ " Christy Cabanne | \n",
+ " ['Frank E. Woods', 'Raoul Walsh', 'Eagle Eye',... | \n",
+ " Raoul Walsh | \n",
+ " 1914 | \n",
+ " /title/tt0004223/ | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " The Patchwork Girl of Oz | \n",
+ " not-released | \n",
+ " 5.4 | \n",
+ " 484 | \n",
+ " ['Adventure', 'Family', 'Fantasy'] | \n",
+ " Ojo and Unc Nunkie are out of food, so they de... | \n",
+ " ['silent film', 'journey', 'magic wand', 'wiza... | \n",
+ " J. Farrell MacDonald | \n",
+ " ['Violet MacMillan', 'Frank Moore', 'Raymond R... | \n",
+ " L. Frank Baum | \n",
+ " 1914 | \n",
+ " /title/tt0004457/ | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " title runtime rating \\\n",
+ "0 The Story of the Kelly Gang 1 hour 10 minutes 6.0 \n",
+ "1 Fantômas - À l'ombre de la guillotine not-released 6.9 \n",
+ "2 Cabiria 2 hours 28 minutes 7.1 \n",
+ "3 The Life of General Villa not-released 6.7 \n",
+ "4 The Patchwork Girl of Oz not-released 5.4 \n",
+ "\n",
+ " rating_count genres \\\n",
+ "0 772 ['Action', 'Adventure', 'Biography'] \n",
+ "1 2300 ['Crime', 'Drama'] \n",
+ "2 3500 ['Adventure', 'Drama', 'History'] \n",
+ "3 65 ['Action', 'Adventure', 'Biography'] \n",
+ "4 484 ['Adventure', 'Family', 'Fantasy'] \n",
+ "\n",
+ " overview \\\n",
+ "0 Story of Ned Kelly, an infamous 19th-century A... \n",
+ "1 Inspector Juve is tasked to investigate and ca... \n",
+ "2 Cabiria is a Roman child when her home is dest... \n",
+ "3 The life and career of Panccho Villa from youn... \n",
+ "4 Ojo and Unc Nunkie are out of food, so they de... \n",
+ "\n",
+ " keywords director \\\n",
+ "0 ['ned kelly', 'australia', 'historic figure', ... Charles Tait \n",
+ "1 ['silent film', 'france', 'hotel', 'duchess', ... Louis Feuillade \n",
+ "2 ['carthage', 'slave', 'moloch', '3rd century b... Giovanni Pastrone \n",
+ "3 ['chihuahua mexico', 'chihuahua', 'sonora mexi... Christy Cabanne \n",
+ "4 ['silent film', 'journey', 'magic wand', 'wiza... J. Farrell MacDonald \n",
+ "\n",
+ " cast writer \\\n",
+ "0 ['Elizabeth Tait', 'John Tait', 'Nicholas Brie... Charles Tait \n",
+ "1 ['Louis Feuillade', 'Pierre Souvestre', 'René ... Marcel Allain \n",
+ "2 ['Titus Livius', 'Giovanni Pastrone', 'Italia ... Gabriele D'Annunzio \n",
+ "3 ['Frank E. Woods', 'Raoul Walsh', 'Eagle Eye',... Raoul Walsh \n",
+ "4 ['Violet MacMillan', 'Frank Moore', 'Raymond R... L. Frank Baum \n",
+ "\n",
+ " year path \n",
+ "0 1906 /title/tt0000574/ \n",
+ "1 1913 /title/tt0002844/ \n",
+ "2 1914 /title/tt0003740/ \n",
+ "3 1914 /title/tt0004223/ \n",
+ "4 1914 /title/tt0004457/ "
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "try:\n",
+ " df = pd.read_csv(\"datasets/content_filtering/25k_imdb_movie_dataset.csv\")\n",
+ "except:\n",
+ " import requests\n",
+ " # download the file\n",
+ " url = 'https://redis-ai-resources.s3.us-east-2.amazonaws.com/recommenders/datasets/content-filtering/25k_imdb_movie_dataset.csv'\n",
+ " r = requests.get(url)\n",
+ "\n",
+ " #save the file as a csv\n",
+ " if not os.path.exists('./datasets/content_filtering'):\n",
+ " os.makedirs('./datasets/content_filtering')\n",
+ " with open('./datasets/content_filtering/25k_imdb_movie_dataset.csv', 'wb') as f:\n",
+ " f.write(r.content)\n",
+ " df = pd.read_csv(\"datasets/content_filtering/25k_imdb_movie_dataset.csv\")\n",
+ "\n",
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As with any machine learning task, the first step is to clean our data.\n",
+ "\n",
+ "We'll drop some columns that we don't plan to use, and fill missing values with some reasonable defaults.\n",
+ "\n",
+ "Lastly, we'll do a quick check to make sure we've filled in all the null and missing values."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "title 0\n",
+ "rating 0\n",
+ "rating_count 0\n",
+ "genres 0\n",
+ "overview 0\n",
+ "keywords 0\n",
+ "director 0\n",
+ "cast 0\n",
+ "year 0\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "roman_numerals = ['(I)','(II)','(III)','(IV)', '(V)', '(VI)', '(VII)', '(VIII)', '(IX)', '(XI)', '(XII)', '(XVI)', '(XIV)', '(XXXIII)', '(XVIII)', '(XIX)', '(XXVII)']\n",
+ "\n",
+ "def replace_year(x):\n",
+ " if x in roman_numerals:\n",
+ " return 1998 # the average year of the dataset\n",
+ " else:\n",
+ " return x\n",
+ "\n",
+ "df.drop(columns=['runtime', 'writer', 'path'], inplace=True)\n",
+ "df['year'] = df['year'].apply(replace_year) # replace roman numerals with average year\n",
+ "df['genres'] = df['genres'].apply(ast.literal_eval) # convert string representation of list to list\n",
+ "df['keywords'] = df['keywords'].apply(ast.literal_eval) # convert string representation of list to list\n",
+ "df['cast'] = df['cast'].apply(ast.literal_eval) # convert string representation of list to list\n",
+ "df = df[~df['overview'].isnull()] # drop rows with missing overviews\n",
+ "df = df[~df['overview'].isin(['none'])] # drop rows with 'none' as the overview\n",
+ "\n",
+ "# make sure we've filled all missing values\n",
+ "df.isnull().sum()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Generate Vector Embeddings"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Since we movie similarity as semantic similarity of movie descriptions we need a way to generate semantic vector embeddings of these descriptions.\n",
+ "\n",
+ "RedisVL supports many different embedding generators. For this example we'll use a HuggingFace model that is rated well for semantic similarity use cases.\n",
+ "\n",
+ "RedisVL also supports complex query logic, beyond just vector similarity. To showcase this we'll generate an embedding from each movies' `overview` text and list of `plot keywords`,\n",
+ "and use the remaining fields like, `genres`, `year`, and `rating` as filterable fields to target our vector queries to.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'The Story of the Kelly Gang. Story of Ned Kelly, an infamous 19th-century Australian outlaw. ned kelly, australia, historic figure, australian western, first of its kind, directorial debut, australian history, 19th century, victoria australia, australian'"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# add a column to the dataframe with all the text we want to embed\n",
+ "df[\"full_text\"] = df[\"title\"] + \". \" + df[\"overview\"] + \" \" + df['keywords'].apply(lambda x: ', '.join(x))\n",
+ "df[\"full_text\"][0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# NBVAL_SKIP\n",
+ "# # this step will take a while, but only needs to be done once for your entire dataset\n",
+ "# currently taking 10 minutes to run, so we've gone ahead and saved the vectors to a file for you\n",
+ "# if you don't want to wait, you can skip the cell and load the vectors from the file in the next cell\n",
+ "from redisvl.utils.vectorize import HFTextVectorizer\n",
+ "\n",
+ "vectorizer = HFTextVectorizer(model = 'sentence-transformers/paraphrase-MiniLM-L6-v2')\n",
+ "\n",
+ "df['embedding'] = df['full_text'].apply(lambda x: vectorizer.embed(x, as_buffer=False))\n",
+ "pickle.dump(df['embedding'], open('datasets/content_filtering/text_embeddings.pkl', 'wb'))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " with open('datasets/content_filtering/text_embeddings.pkl', 'rb') as vector_file:\n",
+ " df['embedding'] = pickle.load(vector_file)\n",
+ "except:\n",
+ " embeddings_url = 'https://redis-ai-resources.s3.us-east-2.amazonaws.com/recommenders/datasets/content-filtering/text_embeddings.pkl'\n",
+ " r = requests.get(embeddings_url)\n",
+ " with open('./datasets/content_filtering/text_embeddings.pkl', 'wb') as f:\n",
+ " f.write(r.content)\n",
+ " with open('datasets/content_filtering/text_embeddings.pkl', 'rb') as vector_file:\n",
+ " df['embedding'] = pickle.load(vector_file)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Define our Search Schema\n",
+ "Our data is now ready to be loaded into Redis. The last step is to define our search index schema that specifies each of our data fields and the size and type of our embedding vectors.\n",
+ "\n",
+ "We'll load this from the accompanying `content_filtering_schema.yaml` file."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This schema defines what each entry will look like within Redis. It will need to specify the name of each field, like `title`, `rating`, and `rating-count`, as well as the type of each field, like `text` or `numeric`.\n",
+ "\n",
+ "The vector component of each entry similarly needs its dimension (dims), distance metric, algorithm and datatype (dtype) attributes specified."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "09:44:30 redisvl.index.index INFO Index already exists, overwriting.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from redis import Redis\n",
+ "\n",
+ "client = Redis.from_url(REDIS_URL)\n",
+ "from redisvl.schema import IndexSchema\n",
+ "from redisvl.index import SearchIndex\n",
+ "\n",
+ "movie_schema = IndexSchema.from_yaml(\"content_filtering_schema.yaml\")\n",
+ "\n",
+ "index = SearchIndex(movie_schema, redis_client=client)\n",
+ "index.create(overwrite=True, drop=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Load products into vector DB\n",
+ "Now that we have all our data cleaned and a defined schema we can load the data into RedisVL.\n",
+ "\n",
+ "We need to convert this data into a format that RedisVL can understand, which is a list of dictionaries.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data = df.to_dict(orient='records')\n",
+ "keys = index.load(data)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Querying to get recommendations\n",
+ "\n",
+ "We now have a working content filtering recommender system, all we need a starting point, so let's say we want to find movies similar to the movie with the title \"20,000 Leagues Under the Sea\"\n",
+ "\n",
+ "We can use the title to find the movie in the dataset and then use the vector to find similar movies."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'id': 'movie:b64fc099d6af440a891e1dd8314e5af7', 'vector_distance': '0.584870040417', 'title': 'The Odyssey', 'overview': 'The aquatic adventure of the highly influential and fearlessly ambitious pioneer, innovator, filmmaker, researcher, and conservationist, Jacques-Yves Cousteau, covers roughly thirty years of an inarguably rich in achievements life.'}\n",
+ "{'id': 'movie:2fbd7803b51a4bf9a8fb1aa79244ad64', 'vector_distance': '0.63329231739', 'title': 'The Inventor', 'overview': 'Inventing flying contraptions, war machines and studying cadavers, Leonardo da Vinci tackles the meaning of life itself with the help of French princess Marguerite de Nevarre.'}\n",
+ "{'id': 'movie:224a785ca7ea4006bbcdac8aad5bf1bc', 'vector_distance': '0.658123672009', 'title': 'Ruin', 'overview': 'The film follows a nameless ex-Nazi captain who navigates the ruins of post-WWII Germany determined to atone for his crimes during the war by hunting down the surviving members of his former SS Death Squad.'}\n",
+ "{'id': 'movie:b7d0af286515427fbbf3866bc7e9b739', 'vector_distance': '0.688094437122', 'title': 'The Raven', 'overview': 'A man with incredible powers is sought by the government and military.'}\n",
+ "{'id': 'movie:9aceb1903b584648b3a5b5d1bdd383b2', 'vector_distance': '0.694671392441', 'title': 'Get the Girl', 'overview': 'Sebastain \"Bash\" Danye, a legendary gun for hire hangs up his weapon to retire peacefully with his \\'it\\'s complicated\\' partner Renee. Their quiet lives are soon interrupted when they find an unconscious woman on their property, Maddie. While nursing her back to health, some bad me... Read all'}\n"
+ ]
+ }
+ ],
+ "source": [
+ "from redisvl.query import RangeQuery\n",
+ "\n",
+ "query_vector = df[df['title'] == '20,000 Leagues Under the Sea']['embedding'].values[0] # one good match\n",
+ "\n",
+ "query = RangeQuery(vector=query_vector,\n",
+ " vector_field_name='embedding',\n",
+ " num_results=5,\n",
+ " distance_threshold=0.7,\n",
+ " return_fields = ['title', 'overview', 'vector_distance'])\n",
+ "\n",
+ "results = index.query(query)\n",
+ "for r in results:\n",
+ " print(r)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Generating user recommendations\n",
+ "This systems works, but we can make it even better.\n",
+ "\n",
+ "Production recommender systems often have fields that can be configured. Users can specify if they want to see a romantic comedy or a horror film, or only see new releases.\n",
+ "\n",
+ "Let's go ahead and add this functionality by using the tags we've defined in our schema."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from redisvl.query.filter import Tag, Num, Text\n",
+ "\n",
+ "def make_filter(genres=None, release_year=None, keywords=None):\n",
+ " flexible_filter = (\n",
+ " (Num(\"year\") > release_year) & # only show movies released after this year\n",
+ " (Tag(\"genres\") == genres) & # only show movies that match at least one in list of genres\n",
+ " (Text(\"full_text\") % keywords) # only show movies that contain at least one of the keywords\n",
+ " )\n",
+ " return flexible_filter\n",
+ "\n",
+ "def get_recommendations(movie_vector, num_results=5, distance=0.6, filter=None):\n",
+ " query = RangeQuery(vector=movie_vector,\n",
+ " vector_field_name='embedding',\n",
+ " num_results=num_results,\n",
+ " distance_threshold=distance,\n",
+ " return_fields = ['title', 'overview', 'genres'],\n",
+ " filter_expression=filter,\n",
+ " )\n",
+ "\n",
+ " recommendations = index.query(query)\n",
+ " return recommendations"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As a final demonstration we'll find movies similar to the classic horror film 'Nosferatu'.\n",
+ "The process has 3 steps:\n",
+ "- fetch the vector embedding of our film Nosferatu\n",
+ "- optionally define any hard filters we want. Here we'll specify we want horror movies made on or after 1990\n",
+ "- perform the vector range query to find similar movies that meet our filter criteria"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "- Wolfman:\n",
+ "\t A man becomes afflicted by an ancient curse after he is bitten by a werewolf.\n",
+ "\t Genres: [\"Horror\"]\n",
+ "- Off Season:\n",
+ "\t Tenn's relentless search for his father takes him back to his childhood town only to find a community gripped by fear. As he travels deeper into the bitter winter wilderness of the town he uncovers a dreadful secret buried long ago.\n",
+ "\t Genres: [\"Horror\",\"Mystery\",\"Thriller\"]\n",
+ "- Pieces:\n",
+ "\t The co-eds of a Boston college campus are targeted by a mysterious killer who is creating a human jigsaw puzzle from their body parts.\n",
+ "\t Genres: [\"Horror\",\"Mystery\",\"Thriller\"]\n",
+ "- Cursed:\n",
+ "\t A prominent psychiatrist at a state run hospital wrestles with madness and a dark supernatural force as he and a female police detective race to stop an escaped patient from butchering five people held hostage in a remote mansion.\n",
+ "\t Genres: [\"Horror\",\"Thriller\"]\n",
+ "- The Home:\n",
+ "\t The Home unfolds after a young man is nearly killed during an accident that leaves him physically and emotionally scarred. To recuperate, he is taken to a secluded nursing home where the elderly residents appear to be suffering from delusions. But after witnessing a violent attac... Read all\n",
+ "\t Genres: [\"Action\",\"Fantasy\",\"Horror\"]\n"
+ ]
+ }
+ ],
+ "source": [
+ "movie_vector = df[df['title'] == 'Nosferatu']['embedding'].values[0]\n",
+ "\n",
+ "filter = make_filter(genres=['Horror'], release_year=1990)\n",
+ "\n",
+ "recs = get_recommendations(movie_vector, distance=0.8, filter=filter)\n",
+ "for rec in recs:\n",
+ " print(f\"- {rec['title']}:\\n\\t {rec['overview']}\\n\\t Genres: {rec['genres']}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Now you have a working content filtering recommender system with Redis.\n",
+ "Don't forget to clean up once you're done."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# clean up your index\n",
+ "while remaining := index.clear():\n",
+ " print(f\"Deleted {remaining} keys\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "redis-ai-res",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/python-recipes/recommendation-systems/content_filtering_schema.yaml b/python-recipes/recommendation-systems/content_filtering_schema.yaml
new file mode 100644
index 0000000..615e12d
--- /dev/null
+++ b/python-recipes/recommendation-systems/content_filtering_schema.yaml
@@ -0,0 +1,34 @@
+index:
+ name: movies_recommendation
+ prefix: movie
+ storage_type: json
+
+fields:
+ - name: title
+ type: text
+ - name: rating
+ type: numeric
+ - name: rating_count
+ type: numeric
+ - name: genres
+ type: tag
+ - name: overview
+ type: text
+ - name: keywords
+ type: tag
+ - name: cast
+ type: tag
+ - name: writer
+ type: text
+ - name: year
+ type: numeric
+ - name: full_text
+ type: text
+
+ - name: embedding
+ type: vector
+ attrs:
+ dims: 384
+ distance_metric: cosine
+ algorithm: flat
+ dtype: float32
\ No newline at end of file