Skip to content

Conversation

@giriraj-singh-couchbase
Copy link
Contributor

This guide is a comprehensive tutorial demonstrating how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.

📋 Overview

The main tutorial is contained in the Jupyter notebook autovec_langchain.ipynb, which walks you through:

  1. Couchbase Capella Setup - Creating account, cluster, and access controls
  2. Data Upload & Processing - Using sample data
  3. Model Deployment - Deploying embedding models for vectorization
  4. Auto-Vectorization Workflow - Setting up automated embedding generation
  5. LangChain Integration - Building semantic search applications with vector similarity

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @giriraj-singh-couchbase, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new, detailed tutorial designed to guide users through the process of leveraging Couchbase Capella's AI Services for automatic data vectorization. The tutorial provides a complete walkthrough, from initial Capella account and cluster setup to deploying embedding models and configuring auto-vectorization workflows, culminating in practical examples of semantic search using LangChain. The aim is to empower users to easily transform their data into vector embeddings and build intelligent search applications.

Highlights

  • New Auto-Vectorization Tutorial: Introduces a comprehensive tutorial demonstrating the use of Couchbase Capella's AI Services auto-vectorization feature to convert data into vector embeddings.
  • LangChain Integration: The tutorial showcases how to perform semantic search using the generated vector embeddings by integrating with LangChain.
  • Step-by-Step Guide: The tutorial covers essential steps including Couchbase Capella setup, data upload and processing, embedding model deployment, auto-vectorization workflow configuration, and practical LangChain integration examples.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Sep 15, 2025

Caution

Notebooks or Frontmatter Files Have Been Modified

  • Please ensure that a frontmatter.md file is accompanying the notebook file, and that the frontmatter is up to date.
  • These changes will be published to the developer portal tutorials only if frontmatter.md is included.
  • Proofread all changes before merging, as changes to notebook and frontmatter content will update the developer tutorial.

1 Notebook Files Modified:

Notebook File Frontmatter Included?
autovec-tutorial/autovec_langchain.ipynb

0 Frontmatter Files Modified:

Frontmatter File
Note: frontmatter will be checked and tested in the Test Frontmatter workflow.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive tutorial on using Couchbase Capella's AI Services for auto-vectorization with LangChain. The tutorial is well-structured, but there are several areas for improvement to enhance clarity, correctness, and security for the end-user. My review includes feedback on the README file and the Jupyter notebook, addressing issues such as placeholder values, dependency management, broken links, inconsistent formatting, typos, and a hardcoded credential. Addressing these points will make the tutorial more polished and easier for users to follow.

Copy link
Contributor

@nithishr nithishr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you apply the same comments as in #57 to this one as well?

Copy link
Contributor

@nithishr nithishr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as in #57 are relevant here as well.

"source": [
"# Cluster Connection Setup\n",
" - Defines the secure connection string, user credentials, and creates a `Cluster` object.\n",
" - Disables TLS verification by `options = ClusterOptions(auth, tls_verify='none')` ONLY for quick local testing (not recommended in production) and applies the `wan_development` profile to tune timeouts for higher-latency networks."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do it anymore

"bucket_name = \"travel-sample\"\n",
"scope_name = \"inventory\"\n",
"collection_name = \"hotel\"\n",
"index_name = \"hybrid_autovec_workflow_vec_addr_descr_id\" # This is the name of the search index that was created in step 4.5 and can also be seen in the search tab of the cluster.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This looks different from the one on the screenshot

}
],
"source": [
"query = \"Woodhead Road\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again feels like FTS rather than vector search.
Maybe just index the description & ask for synonyms of descriptions?

"source": [
"# Auto-Vectorization Using Couchbase Capella AI Services\n",
"\n",
"This comprehensive tutorial demonstrates how to use Couchbase Capella's new AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a paragraph about this tutorial being about Vectorizing structured data stored in Couchbase. And link to the other tutorial along the lines of - If you are looking to vectorize data from unstructured sources such as S3, check this tutorial.
And vice versa for the other tutorial.
We have similar examples in the

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the titles need to be adapted to add Structured/Unstructured to it.

@@ -0,0 +1,18 @@
---
# frontmatter
path: "/tutorial-couchbase-autovectorization-langchain"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path should also be different from the other autovec tutorial (#57 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to consider grouping the tutorials in different folders to avoid confusion/conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants