couchbase-examples · giriraj-singh-couchbase · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025
diff --git a/autovec-tutorial/README.md b/autovec-tutorial/README.md
@@ -0,0 +1,53 @@
+# Couchbase Capella AI Services Auto-Vectorization with LangChain
+
+This guide is a comprehensive tutorial demonstrating how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.
+
+## 📋 Overview
+
+The main tutorial is contained in the Jupyter notebook `autovec_langchain.ipynb`, which walks you through:
+
+1. **Couchbase Capella Setup** - Creating account, cluster, and access controls
+2. **Data Upload & Processing** - Using sample data
+3. **Model Deployment** - Deploying embedding models for vectorization
+4. **Auto-Vectorization Workflow** - Setting up automated embedding generation
+5. **LangChain Integration** - Building semantic search applications with vector similarity
+
+## 🚀 Quick Start
+
+### Prerequisites
+
+- Python 3.8 or higher
+- A Couchbase Capella account
+- Basic understanding of vector databases and embeddings
+
+### Installation Steps
+
+1. **Clone or download this repository**
+   ```bash
+   git clone <repository-url>
+   cd vector-search-cookbook/autovec-tutorial
+   ```
+
+2. **Install Python dependencies**
+   ```bash
+   pip install jupyter
+   pip install couchbase
+   pip install langchain-couchbase
+   pip install langchain-nvidia-ai-endpoints
+   ```
+
+3. **Start Jupyter Notebook**
+   ```bash
+   jupyter notebook
+   ```
+   or
+   ```bash
+   jupyter lab
+   ```
+
+4. **Open the tutorial notebook**
+   - Navigate to `autovec_langchain.ipynb` in the Jupyter interface
+   - Follow the step-by-step instructions in the notebook
+```
+
+**Note**: This tutorial is designed for educational purposes. For production deployments, ensure proper security configurations and SSL/TLS verification.
diff --git a/autovec-tutorial/autovec_langchain.ipynb b/autovec-tutorial/autovec_langchain.ipynb
@@ -0,0 +1,301 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "44480f12-3bd0-4fe9-9493-25bd6a2712bb",
+   "metadata": {},
+   "source": [
+    "# Couchbase Capella AI Services Auto-Vectorization Tutorial\n",
+    "\n",
+    "This comprehensive tutorial demonstrates how to use Couchbase Capella's new AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## 📚 Table of Contents\n",
+    "\n",
+    "1. [Capella Account Setup](#1-capella-account-setup)\n",
+    "2. [Data Upload and Preparation](#2-data-upload-and-preparation)\n",
+    "3. [Deploying the Model](#3-deploying-the-model)\n",
+    "4. [Auto-Vectorization Process](#4-deploying-autovectorization-workflow)\n",
+    "5. [LangChain Vector Search]()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "502eb13e",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "# 1. Capella Account Setup\n",
+    "\n",
+    "Before we can use AI Services auto-vectorization, you need to set up a Couchbase Capella account and create a cluster.\n",
+    "\n",
+    "## Step 1.1: Sign Up for Couchbase Capella\n",
+    "\n",
+    "1. **Visit Capella**: Go to [https://cloud.couchbase.com](https://cloud.couchbase.com)\n",
+    "   \n",
+    "   <img src=\"./img/login_.png\" width=500pt height=1000pt>\n",
+    "\n",
+    "3. **Sign In**: Click \"Sign in\" or create your free account by clicking \"Try free\", you can also sign in using google, github or using your organization's SSO.\n",
+    "\n",
+    "\n",
+    "## Step 1.2: Create a New Cluster\n",
+    "\n",
+    "1. **Access Dashboard**: After logging in, you'll see the Capella dashboard\n",
+    "2. **Create Cluster**: Click \"Create Cluster\"\n",
+    "   \n",
+    "   <img src=\"./img/create_cluster.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   \n",
+    "4. **Choose Configuration**:\n",
+    "   - **Cluster Configuration**: \n",
+    "     - For development: Single node cluster\n",
+    "     - For production: Multi-node with replicas\n",
+    "       \n",
+    "     <img src=\"./img/node_select_cluster_opt.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "     \n",
+    "   - **Cloud Provider**: AWS, Azure, or GCP (AWS recommended for this tutorial)\n",
+    "     \n",
+    "     <img src=\"./img/cluster_cloud_config.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "  \n",
+    "   - **Cluster Configuration**: Select number of nodes and their configuration, make sure to allow <B>searching</B> and <B>eventing</B> for using AutoVectorization.\n",
+    "     \n",
+    "     <img src=\"./img/cluster_no_nodes.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "\n",
+    "## Step 1.3: Configure Access Control \n",
+    "\n",
+    "1. **Access Control**: Navigate to the <B>access control</B> tab which is present in <B>cluster settings</B> as highlited in the image below:-\n",
+    "   \n",
+    "    <img src=\"./img/Access_control.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   \n",
+    "3. **Enter your details**:\n",
+    "   - <B>Cluster Access Name</B>: `username`\n",
+    "   - <B>Password</B>: Create a strong password\n",
+    "\n",
+    "     \n",
+    "    <img src=\"./img/password_cluster.png\" width=900 style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "   - Also, do not forget the level of authorization you want to give to these credentials, as shown in the above image modify the <B>bucket-level access</B> field as per the requirement.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4369c925-adbc-4c7d-9ea6-04ff020cb1a6",
+   "metadata": {},
+   "source": [
+    "\n",
+    "# 2. Data Upload and Preparation\n",
+    "\n",
+    "Now we'll upload sample data that will be automatically vectorized by AI Services.\n",
+    "\n",
+    "## Option A: Upload Sample Dataset (Recommended)\n",
+    "\n",
+    "We'll create sample documents about different topics to demonstrate the vectorization capabilities.\n",
+    "\n",
+    "## Option B: Use Existing Couchbase Data\n",
+    "\n",
+    "If you already have data in Couchbase (like travel-sample), you can configure vectorization for existing collections.\n",
+    "\n",
+    "Let's proceed with **Option A** for this tutorial:\n",
+    "\n",
+    "## 2.1: Uploading the sample-data provided by capella in your cluster\n",
+    "<div style=\"display: flex; align-items: flex-start; gap: 10px;\">\n",
+    "          <img src=\"./img/select_cluster.png\" width=\"160\" height=\"300\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "          <img src=\"./img/import_sd.png\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\" width=\"800px\">\n",
+    "        </div>\n",
+    "   \n",
+    "   1. In order to upload sample data in your cluster, you need to navigate to the import section inside your cluster    \n",
+    "   2. Click on \"<B>Load Sample Data</B>\"\n",
+    "   3. Click on \"<B>travel-sample</B>\"\n",
+    "   4. Click on \"<B>Import</B>\"\n",
+    "   - After importing the data you can check that a bucket named travel-sample would have been created inside the cluster.\n",
+    "   - Select the \"<B>travel-sample</B>\" bucket, \"<B>Inventory</B>\" scope, and \"<B>Hotel</B>\" collection. Then you will see the documents inside this collection.\n",
+    "   - The document will not contain any vector embeddings inside it.\n",
+    "   - Now, we can proceed with the formation of vectors using auto-vectorization service.\n",
+    "     \n",
+    "## 2.2: Uploading data from your program\n",
+    "\n",
+    "We'll also demonstrate how to programmatically upload sample documents using Python and the Couchbase SDK.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e3afd3f-9949-4f5e-b96a-1aac1a3aea29",
+   "metadata": {},
+   "source": [
+    "# 3. Deploying the Model\n",
+    "Now, before we actually create embedding for the documents we need to deploy a model which will create the embedding for us.\n",
+    "## 3.1: Selecting the model \n",
+    "1. To select the model, you first need to navigate to the \"<B>AI Services</B>\" tab, then selecting \"<B>Models</B>\" and clicking on \"<B>Deploy New Model</B>\"\n",
+    "   \n",
+    "   <img src=\"./img/importing_model.png\" width=\"950px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "3. Enter the <B>model name</B>, and choose the model that you want to deploy. After Selecting your model, choose the <B>model infrastructure</B> and <B>region</B> where the model will be deployed.\n",
+    "   \n",
+    "   <img src=\"./img/deploying_model.png\" width=\"800px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "## 3.2 Access control to the model\n",
+    "\n",
+    "1. After deploying the model, go to the \"<B>Models</B>\" tab in the <B>AI-services</B> and click on \"<B>setup access</B>\".\n",
+    "\n",
+    "    <img src=\"./img/model_setup_access.png\" width=\"1100px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "3. Enter your <B>api_key_name</B>, <B>expiration time</B> and the <B>IP-address</B> from which you will be accessing the model.\n",
+    "\n",
+    "    <img src=\"./img/model_api_key_form.png\" width=\"1100px\" height=\"600px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "4. Download your API key\n",
+    "\n",
+    "   <img src=\"./img/download_api_key_details.png\" width=\"1200px\" height=\"800px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "daaf6525-d4e6-45fb-8839-fc7c20081675",
+   "metadata": {},
+   "source": [
+    "# 4. Deploying AutoVectorization Workflow\n",
+    "\n",
+    "1. For deploying the autovectorization, you need to go to the <B>ai-services</B> tab, then click on the <B>workflows</B>, and then click on <B>Get started with RAG</B>.\n",
+    "   <img src=\"./img/Create_auto_vec.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   \n",
+    "2. Start your workflow deployment by giving it a name, and selecting from where your data will be provided to the auto-vectorization service. There are currently 3 options, <B>pre-processed data(JSON format) from capella</B>, <B>pre-processed data(JSON format) from external sources(S3 buckets)</B> and <B>unstructured data from external sources (S3 buckets)</B>. For this tutorial we will be choosing first option which is pre-processed data from capella.\n",
+    "\n",
+    "   <img src=\"./img/start_workflow.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "3. Now, select the <B>cluster</B>, <B>bucket</B>, <B>scope</B> and <B>collection</B> from which you want to select the documents and get the data vectorized.\n",
+    "\n",
+    "   <img src=\"./img/vector_data_source.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "4. <B>Field Mapping</B> will be used to tell the AutoVectorize service that which data will be converted to embeddings.\n",
+    "\n",
+    "   There are two options:-\n",
+    "\n",
+    "   - <B>All source fields</B> - This feature will convert all your fields inside the document to a single vector field.\n",
+    "   \n",
+    "     <img src=\"./img/vector_all_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "\n",
+    "   - <B>Custom source fields</B> - This feature will convert specific fields which are chosen by the user to a single vector field, in the image below we have chosen <B>address</B>, <B>description</B> and <B>id</B> as the fields to be converted to a vector having the name as <B>vec_addr_decr_id_mapping</B>.\n",
+    "  \n",
+    "       <img src=\"./img/vector_custom_field_mapping.png\" width=\"900px\" height=\"400px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "  \n",
+    "5. After choosing your type of mapping, you will be required to either have an index on the new vector_embedding field or you can skip the creation of vector index which is not recommended as you will be loosing out the functionality of vector searching.\n",
+    "\n",
+    "   <img src=\"./img/vector_index.png\" width=\"1200px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "\n",
+    "6. All the steps mentioned above.\n",
+    "\n",
+    "   <img src=\"./img/vector_index_page.png\" width=\"1200px\" height=\"1200px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "30955126-0053-4cec-9dec-e4c05a8de7c3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from couchbase.cluster import Cluster\n",
+    "from couchbase.auth import PasswordAuthenticator\n",
+    "from couchbase.options import ClusterOptions\n",
+    "\n",
+    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings      \n",
+    "from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e4c9e8d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"couchbases://cb.XYZ.com\" # Replace this with Connection String\n",
+    "username = \"testing\" \n",
+    "password = \"Testing@1\"\n",
+    "auth = PasswordAuthenticator(username, password)\n",
+    "# Configure cluster options with SSL verification disabled for testing, in production you should enable it\n",
+    "options = ClusterOptions(auth, tls_verify='none')\n",
+    "options.apply_profile(\"wan_development\")\n",
+    "cluster = Cluster(endpoint, options)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "799b2efc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bucket_name = \"travel-sample\"\n",
+    "scope_name = \"inventory\"\n",
+    "collection_name = \"hotel\"\n",
+    "index_name = \"hybrid_av_workflow1_vector_Series_Title\"  # This is the name of the search index which was created in the step 4.5 and can also be seen in the search tab of the cluster.\n",
+    "embedder = NVIDIAEmbeddings(\n",
+    "    model=\"nvidia/nv-embedqa-e5-v5\",    # This is the model which will be used to create the embedding of the query.\n",
+    "    api_key=\"nvapi-XYZ\" # This is the api key using which your model will be accessed.\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "50b85f78",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = CouchbaseSearchVectorStore(\n",
+    "    cluster=cluster,\n",
+    "    bucket_name=bucket_name,\n",
+    "    scope_name=scope_name,\n",
+    "    collection_name=collection_name,\n",
+    "    embedding=embedder,\n",
+    "    index_name=index_name,\n",
+    "    text_key=\"address\",                  # your document's text field\n",
+    "    embedding_key=\"vector_Series_Title\"  # this is the field in which your vector(embedding) is stored in the cluster.\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "177fd6d5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"USA\"\n",
+    "results = vector_store.similarity_search(query, k=3)\n",
+    "\n",
+    "# Printing out the top-k results\n",
+    "for rank, doc in enumerate(results, start=1):\n",
+    "    title = doc.metadata.get(\"title\", \"<no title>\")\n",
+    "    address_text = doc.page_content\n",
+    "    print(f\"{rank}. {title} — Address: {address_text}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "autovec",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/autovec-tutorial/img/Access_control.png b/autovec-tutorial/img/Access_control.png
diff --git a/autovec-tutorial/img/Create_auto_vec.png b/autovec-tutorial/img/Create_auto_vec.png
diff --git a/autovec-tutorial/img/cluster_cloud_config.png b/autovec-tutorial/img/cluster_cloud_config.png
diff --git a/autovec-tutorial/img/cluster_no_nodes.png b/autovec-tutorial/img/cluster_no_nodes.png
diff --git a/autovec-tutorial/img/create_cluster.png b/autovec-tutorial/img/create_cluster.png
diff --git a/autovec-tutorial/img/deploying_model.png b/autovec-tutorial/img/deploying_model.png
diff --git a/autovec-tutorial/img/download_api_key_details.png b/autovec-tutorial/img/download_api_key_details.png
diff --git a/autovec-tutorial/img/import_sd.png b/autovec-tutorial/img/import_sd.png
diff --git a/autovec-tutorial/img/imported_data_hotel.png b/autovec-tutorial/img/imported_data_hotel.png
diff --git a/autovec-tutorial/img/importing_model.png b/autovec-tutorial/img/importing_model.png
diff --git a/autovec-tutorial/img/login.png b/autovec-tutorial/img/login.png
diff --git a/autovec-tutorial/img/login_.png b/autovec-tutorial/img/login_.png
diff --git a/autovec-tutorial/img/model_api_key_form.png b/autovec-tutorial/img/model_api_key_form.png
diff --git a/autovec-tutorial/img/model_setup_access.png b/autovec-tutorial/img/model_setup_access.png
diff --git a/autovec-tutorial/img/node_select_cluster_opt.png b/autovec-tutorial/img/node_select_cluster_opt.png
diff --git a/autovec-tutorial/img/password_cluster.png b/autovec-tutorial/img/password_cluster.png
diff --git a/autovec-tutorial/img/select_cluster.png b/autovec-tutorial/img/select_cluster.png
diff --git a/autovec-tutorial/img/setup_access.png b/autovec-tutorial/img/setup_access.png
diff --git a/autovec-tutorial/img/start_workflow.png b/autovec-tutorial/img/start_workflow.png
diff --git a/autovec-tutorial/img/vector_all_field_mapping.png b/autovec-tutorial/img/vector_all_field_mapping.png
diff --git a/autovec-tutorial/img/vector_custom_field_mapping.png b/autovec-tutorial/img/vector_custom_field_mapping.png
diff --git a/autovec-tutorial/img/vector_data_source.png b/autovec-tutorial/img/vector_data_source.png
diff --git a/autovec-tutorial/img/vector_field_mapping.png b/autovec-tutorial/img/vector_field_mapping.png
diff --git a/autovec-tutorial/img/vector_index.png b/autovec-tutorial/img/vector_index.png
diff --git a/autovec-tutorial/img/vector_index_page.png b/autovec-tutorial/img/vector_index_page.png