Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
807c392
Updating AutoVec tutorial
giriraj-singh-couchbase Sep 15, 2025
68e04f3
Added frontmatter.md and updated the tutorial
giriraj-singh-couchbase Sep 15, 2025
9bb439b
Applied suggestions from code review
giriraj-singh-couchbase Sep 15, 2025
dfca154
Fixed frontmatter.md
giriraj-singh-couchbase Sep 15, 2025
e52bcd5
fixing minor details
giriraj-singh-couchbase Sep 15, 2025
604bf29
Merge branch 'DA-1096_autovec_tutorial' of https://github.com/couchba…
giriraj-singh-couchbase Sep 15, 2025
fc878a4
Fixed frontmatter.md
giriraj-singh-couchbase Sep 15, 2025
38d1792
Updated autovec_langchain.ipynb
giriraj-singh-couchbase Sep 18, 2025
1e9bb07
fixed some screenshots
giriraj-singh-couchbase Sep 18, 2025
b85f7a4
updated screenshot
giriraj-singh-couchbase Sep 23, 2025
69fe694
fixed grammatical mistakes
giriraj-singh-couchbase Sep 24, 2025
9bcf7b9
fixed capella free-tier issue and added some missing content
giriraj-singh-couchbase Sep 25, 2025
0d870b1
added missing content
giriraj-singh-couchbase Sep 25, 2025
709cfe5
Updatede doc
giriraj-singh-couchbase Sep 26, 2025
55242d5
added version of libraries, removed unnecessary files
giriraj-singh-couchbase Oct 30, 2025
ce09c83
updated model service name
giriraj-singh-couchbase Nov 3, 2025
e48691b
removed extra code
giriraj-singh-couchbase Nov 3, 2025
97392d9
updated screenshots
giriraj-singh-couchbase Nov 13, 2025
8cb8715
Updated tutorial to use couchbase hyperscale vector index
giriraj-singh-couchbase Dec 4, 2025
7274d17
fixed frontmatter path
giriraj-singh-couchbase Dec 4, 2025
44ed62b
renamed folder
giriraj-singh-couchbase Dec 4, 2025
90042d1
Removed title as it will be used from the frontmatter
giriraj-singh-couchbase Dec 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions autovec-tutorial/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Couchbase Capella AI Services Auto-Vectorization with LangChain

This guide is a comprehensive tutorial demonstrating how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.

## 📋 Overview

The main tutorial is contained in the Jupyter notebook `autovec_langchain.ipynb`, which walks you through:

1. **Couchbase Capella Setup** - Creating account, cluster, and access controls
2. **Data Upload & Processing** - Using sample data
3. **Model Deployment** - Deploying embedding models for vectorization
4. **Auto-Vectorization Workflow** - Setting up automated embedding generation
5. **LangChain Integration** - Building semantic search applications with vector similarity

## 🚀 Quick Start

### Prerequisites

- Python 3.8 or higher
- A Couchbase Capella account
- Basic understanding of vector databases and embeddings

### Installation Steps

1. **Clone or download this repository**
```bash
git clone <repository-url>
cd vector-search-cookbook/autovec-tutorial
```

2. **Install Python dependencies**
```bash
pip install jupyter
pip install couchbase
pip install langchain-couchbase
pip install langchain-nvidia-ai-endpoints
```

3. **Start Jupyter Notebook**
```bash
jupyter notebook
```
or
```bash
jupyter lab
```

4. **Open the tutorial notebook**
- Navigate to `autovec_langchain.ipynb` in the Jupyter interface
- Follow the step-by-step instructions in the notebook
```

**Note**: This tutorial is designed for educational purposes. For production deployments, ensure proper security configurations and SSL/TLS verification.
301 changes: 301 additions & 0 deletions autovec-tutorial/autovec_langchain.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "44480f12-3bd0-4fe9-9493-25bd6a2712bb",
"metadata": {},
"source": [
"# Couchbase Capella AI Services Auto-Vectorization Tutorial\n",
"\n",
"This comprehensive tutorial demonstrates how to use Couchbase Capella's new AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.\n",
"\n",
"---\n",
"\n",
"## 📚 Table of Contents\n",
"\n",
"1. [Capella Account Setup](#1-capella-account-setup)\n",
"2. [Data Upload and Preparation](#2-data-upload-and-preparation)\n",
"3. [Deploying the Model](#3-deploying-the-model)\n",
"4. [Auto-Vectorization Process](#4-deploying-autovectorization-workflow)\n",
"5. [LangChain Vector Search]()\n"
]
},
{
"cell_type": "markdown",
"id": "502eb13e",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"# 1. Capella Account Setup\n",
"\n",
"Before we can use AI Services auto-vectorization, you need to set up a Couchbase Capella account and create a cluster.\n",
"\n",
"## Step 1.1: Sign Up for Couchbase Capella\n",
"\n",
"1. **Visit Capella**: Go to [https://cloud.couchbase.com](https://cloud.couchbase.com)\n",
" \n",
" <img src=\"./img/login_.png\" width=500pt height=1000pt>\n",
"\n",
"3. **Sign In**: Click \"Sign in\" or create your free account by clicking \"Try free\", you can also sign in using google, github or using your organization's SSO.\n",
"\n",
"\n",
"## Step 1.2: Create a New Cluster\n",
"\n",
"1. **Access Dashboard**: After logging in, you'll see the Capella dashboard\n",
"2. **Create Cluster**: Click \"Create Cluster\"\n",
" \n",
" <img src=\"./img/create_cluster.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
"4. **Choose Configuration**:\n",
" - **Cluster Configuration**: \n",
" - For development: Single node cluster\n",
" - For production: Multi-node with replicas\n",
" \n",
" <img src=\"./img/node_select_cluster_opt.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
" - **Cloud Provider**: AWS, Azure, or GCP (AWS recommended for this tutorial)\n",
" \n",
" <img src=\"./img/cluster_cloud_config.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
" - **Cluster Configuration**: Select number of nodes and their configuration, make sure to allow <B>searching</B> and <B>eventing</B> for using AutoVectorization.\n",
" \n",
" <img src=\"./img/cluster_no_nodes.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"\n",
"## Step 1.3: Configure Access Control \n",
"\n",
"1. **Access Control**: Navigate to the <B>access control</B> tab which is present in <B>cluster settings</B> as highlited in the image below:-\n",
" \n",
" <img src=\"./img/Access_control.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
"3. **Enter your details**:\n",
" - <B>Cluster Access Name</B>: `username`\n",
" - <B>Password</B>: Create a strong password\n",
"\n",
" \n",
" <img src=\"./img/password_cluster.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
" - Also, do not forget the level of authorization you want to give to these credentials, as shown in the above image modify the <B>bucket-level access</B> field as per the requirement.\n"
]
},
{
"cell_type": "markdown",
"id": "4369c925-adbc-4c7d-9ea6-04ff020cb1a6",
"metadata": {},
"source": [
"\n",
"# 2. Data Upload and Preparation\n",
"\n",
"Now we'll upload sample data that will be automatically vectorized by AI Services.\n",
"\n",
"## Option A: Upload Sample Dataset (Recommended)\n",
"\n",
"We'll create sample documents about different topics to demonstrate the vectorization capabilities.\n",
"\n",
"## Option B: Use Existing Couchbase Data\n",
"\n",
"If you already have data in Couchbase (like travel-sample), you can configure vectorization for existing collections.\n",
"\n",
"Let's proceed with **Option A** for this tutorial:\n",
"\n",
"## 2.1: Uploading the sample-data provided by capella in your cluster\n",
"<div style=\"display: flex; align-items: flex-start; gap: 10px;\">\n",
" <img src=\"./img/select_cluster.png\" width=\"160\" height=\"300\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" <img src=\"./img/import_sd.png\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\" width=\"800px\">\n",
" </div>\n",
" \n",
" 1. In order to upload sample data in your cluster, you need to navigate to the import section inside your cluster \n",
" 2. Click on \"<B>Load Sample Data</B>\"\n",
" 3. Click on \"<B>travel-sample</B>\"\n",
" 4. Click on \"<B>Import</B>\"\n",
" - After importing the data you can check that a bucket named travel-sample would have been created inside the cluster.\n",
" - Select the \"<B>travel-sample</B>\" bucket, \"<B>Inventory</B>\" scope, and \"<B>Hotel</B>\" collection. Then you will see the documents inside this collection.\n",
" - The document will not contain any vector embeddings inside it.\n",
" - Now, we can proceed with the formation of vectors using auto-vectorization service.\n",
" \n",
"## 2.2: Uploading data from your program\n",
"\n",
"We'll also demonstrate how to programmatically upload sample documents using Python and the Couchbase SDK.\n"
]
},
{
"cell_type": "markdown",
"id": "7e3afd3f-9949-4f5e-b96a-1aac1a3aea29",
"metadata": {},
"source": [
"# 3. Deploying the Model\n",
"Now, before we actually create embedding for the documents we need to deploy a model which will create the embedding for us.\n",
"## 3.1: Selecting the model \n",
"1. To select the model, you first need to navigate to the \"<B>AI Services</B>\" tab, then selecting \"<B>Models</B>\" and clicking on \"<B>Deploy New Model</B>\"\n",
" \n",
" <img src=\"./img/importing_model.png\" width=\"950px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"3. Enter the <B>model name</B>, and choose the model that you want to deploy. After Selecting your model, choose the <B>model infrastructure</B> and <B>region</B> where the model will be deployed.\n",
" \n",
" <img src=\"./img/deploying_model.png\" width=\"800px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"## 3.2 Access control to the model\n",
"\n",
"1. After deploying the model, go to the \"<B>Models</B>\" tab in the <B>AI-services</B> and click on \"<B>setup access</B>\".\n",
"\n",
" <img src=\"./img/model_setup_access.png\" width=\"1100px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"3. Enter your <B>api_key_name</B>, <B>expiration time</B> and the <B>IP-address</B> from which you will be accessing the model.\n",
"\n",
" <img src=\"./img/model_api_key_form.png\" width=\"1100px\" height=\"600px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"4. Download your API key\n",
"\n",
" <img src=\"./img/download_api_key_details.png\" width=\"1200px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">"
]
},
{
"cell_type": "markdown",
"id": "daaf6525-d4e6-45fb-8839-fc7c20081675",
"metadata": {},
"source": [
"# 4. Deploying AutoVectorization Workflow\n",
"\n",
"1. For deploying the autovectorization, you need to go to the <B>ai-services</B> tab, then click on the <B>workflows</B>, and then click on <B>Get started with RAG</B>.\n",
" <img src=\"./img/Create_auto_vec.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
"2. Start your workflow deployment by giving it a name, and selecting from where your data will be provided to the auto-vectorization service. There are currently 3 options, <B>pre-processed data(JSON format) from capella</B>, <B>pre-processed data(JSON format) from external sources(S3 buckets)</B> and <B>unstructured data from external sources (S3 buckets)</B>. For this tutorial we will be choosing first option which is pre-processed data from capella.\n",
"\n",
" <img src=\"./img/start_workflow.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"3. Now, select the <B>cluster</B>, <B>bucket</B>, <B>scope</B> and <B>collection</B> from which you want to select the documents and get the data vectorized.\n",
"\n",
" <img src=\"./img/vector_data_source.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"4. <B>Field Mapping</B> will be used to tell the AutoVectorize service that which data will be converted to embeddings.\n",
"\n",
" There are two options:-\n",
"\n",
" - <B>All source fields</B> - This feature will convert all your fields inside the document to a single vector field.\n",
" \n",
" <img src=\"./img/vector_all_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"\n",
" - <B>Custom source fields</B> - This feature will convert specific fields which are chosen by the user to a single vector field, in the image below we have chosen <B>address</B>, <B>description</B> and <B>id</B> as the fields to be converted to a vector having the name as <B>vec_addr_decr_id_mapping</B>.\n",
" \n",
" <img src=\"./img/vector_custom_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" \n",
"5. After choosing your type of mapping, you will be required to either have an index on the new vector_embedding field or you can skip the creation of vector index which is not recommended as you will be loosing out the functionality of vector searching.\n",
"\n",
" <img src=\"./img/vector_index.png\" width=\"1200px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
"\n",
"6. All the steps mentioned above.\n",
"\n",
" <img src=\"./img/vector_index_page.png\" width=\"1200px\" height=\"1200px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "30955126-0053-4cec-9dec-e4c05a8de7c3",
"metadata": {},
"outputs": [],
"source": [
"from couchbase.cluster import Cluster\n",
"from couchbase.auth import PasswordAuthenticator\n",
"from couchbase.options import ClusterOptions\n",
"\n",
"from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings \n",
"from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e4c9e8d",
"metadata": {},
"outputs": [],
"source": [
"endpoint = \"couchbases://cb.XYZ.com\" # Replace this with Connection String\n",
"username = \"testing\" \n",
"password = \"Testing@1\"\n",
"auth = PasswordAuthenticator(username, password)\n",
"# Configure cluster options with SSL verification disabled for testing, in production you should enable it\n",
"options = ClusterOptions(auth, tls_verify='none')\n",
"options.apply_profile(\"wan_development\")\n",
"cluster = Cluster(endpoint, options)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "799b2efc",
"metadata": {},
"outputs": [],
"source": [
"bucket_name = \"travel-sample\"\n",
"scope_name = \"inventory\"\n",
"collection_name = \"hotel\"\n",
"index_name = \"hybrid_av_workflow1_vector_Series_Title\" # This is the name of the search index which was created in the step 4.5 and can also be seen in the search tab of the cluster.\n",
"embedder = NVIDIAEmbeddings(\n",
" model=\"nvidia/nv-embedqa-e5-v5\", # This is the model which will be used to create the embedding of the query.\n",
" api_key=\"nvapi-XYZ\" # This is the api key using which your model will be accessed.\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50b85f78",
"metadata": {},
"outputs": [],
"source": [
"vector_store = CouchbaseSearchVectorStore(\n",
" cluster=cluster,\n",
" bucket_name=bucket_name,\n",
" scope_name=scope_name,\n",
" collection_name=collection_name,\n",
" embedding=embedder,\n",
" index_name=index_name,\n",
" text_key=\"address\", # your document's text field\n",
" embedding_key=\"vector_Series_Title\" # this is the field in which your vector(embedding) is stored in the cluster.\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "177fd6d5",
"metadata": {},
"outputs": [],
"source": [
"query = \"USA\"\n",
"results = vector_store.similarity_search(query, k=3)\n",
"\n",
"# Printing out the top-k results\n",
"for rank, doc in enumerate(results, start=1):\n",
" title = doc.metadata.get(\"title\", \"<no title>\")\n",
" address_text = doc.page_content\n",
" print(f\"{rank}. {title} — Address: {address_text}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "autovec",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Binary file added autovec-tutorial/img/Access_control.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/Create_auto_vec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/cluster_cloud_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/cluster_no_nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/create_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/deploying_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/import_sd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/imported_data_hotel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/importing_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/login.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/login_.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/model_api_key_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/model_setup_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/node_select_cluster_opt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/password_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/select_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/setup_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/start_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/vector_all_field_mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/vector_data_source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/vector_field_mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/vector_index.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-tutorial/img/vector_index_page.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.