Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ import Admonition from '@theme/Admonition';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
import ContentFrame from '@site/src/components/ContentFrame';
import Panel from '@site/src/components/Panel';

<Admonition type="note" title="">

Expand All @@ -14,16 +16,16 @@ import CodeBlock from '@theme/CodeBlock';
to understand the embeddings generation process flow.

* In this article:
* [Configuring an embeddings generation task - from the Studio](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task---from-the-studio)
* [Configuring an embeddings generation task - from the Client API](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task---from-the-client-api)
* [Configuring an embeddings generation task - from the Studio](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task-from-the-studio)
* [Configuring an embeddings generation task - from the Client API](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configuring-an-embeddings-generation-task-from-the-client-api)
* [Define source using PATHS](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configure-an-embeddings-generation-task---define-source-using-paths)
* [Define source using SCRIPT](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#configure-an-embeddings-generation-task---define-source-using-script)
* [Chunking methods and tokens](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#chunking-methods-and-tokens)
* [Syntax](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#syntax)

</Admonition>

## Configuring an embeddings generation task - from the Studio
<Panel heading="Configuring an embeddings generation task - from the Studio">

* **Define the general task settings**:

Expand Down Expand Up @@ -93,7 +95,7 @@ import CodeBlock from '@theme/CodeBlock';
4. **Add path configuration**
Click to add the specified to the list.
5. **List of paths**
Displays the document properties you added for embedding generation.
This table displays the document properties you added for embedding generation.

* **Define the embeddings source - using SCRIPT**:

Expand Down Expand Up @@ -141,18 +143,24 @@ import CodeBlock from '@theme/CodeBlock';

1. **Querying**
This label indicates that this section configures parameters only for embeddings
generated by the task for **search terms** in vector search queries.
generated by the task for **search terms in vector search queries**.
2. **Chunking method**
Select the method for splitting the search term into chunks.
Learn more in [Chunking methods and tokens](../../../ai-integration/generating-embeddings/embeddings-generation-task.mdx#chunking-methods-and-tokens).
3. **Max tokens per chunk**
Enter the maximum number of tokens allowed per chunk (this depends on the service provider).
4. **Embeddings cache expiration**
4. **Overlap tokens**
Enter the number of tokens to repeat at the start of each chunk from the end of the previous one.
This helps preserve context between chunks by carrying over some tokens from one to the next.
Applies only to the _"Plain Text: Split Paragraphs"_ and _"Markdown: Split Paragraphs"_ chunking methods.
5. **Embeddings cache expiration**
Set the expiration period for documents stored in the `@embeddings-cache` collection.
These documents contain embeddings generated from the search terms, serving as a cache for these embeddings.
The default period is `14` days. Learn more in [The embeddings cache collection](../../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection).

## Configuring an embeddings generation task - from the Client API
</Panel>

<Panel heading="Configuring an embeddings generation task - from the Client API">

<Admonition type="note" title="">

Expand Down Expand Up @@ -314,7 +322,9 @@ EmbeddingsTransformation = new EmbeddingsTransformation()

</Admonition>

## Chunking methods and tokens
</Panel>

<Panel heading="Chunking methods and tokens">

**Tokens and processing limits**:

Expand Down Expand Up @@ -432,7 +442,9 @@ html.strip(htmlText | [htmlText], maxTokensPerChunk);
| **maxTokensPerChunk / maxTokensPerLine** | `number` | The maximum number of tokens allowed per chunk.<br/>Default is `512`. |
| **overlapTokens** | `number` (optional) | The number of tokens to overlap between consecutive chunks. Helps preserve context continuity across chunks (e.g., between paragraphs).<br/>Default is `0`. |

## Syntax
</Panel>

<Panel heading="Syntax">

#### The embeddings generation task configuration:

Expand Down Expand Up @@ -463,7 +475,7 @@ public class EmbeddingsGenerationConfiguration : EtlConfiguration<AiConnectionSt
| **EmbeddingsPathConfigurations** | `List<EmbeddingPathConfiguration>` | A list of properties inside documents that contain text to be embedded, along with their chunking settings. |
| **EmbeddingsTransformation** | `EmbeddingsTransformation ` | An object that contains a script defining the transformations and processing applied to the source text before generating embeddings. |
| **Quantization** | `VectorEmbeddingType ` | The quantization type for the generated embeddings. |
| **ChunkingOptionsForQuerying** | `ChunkingOptions ` | The chunking method and maximum token limit used when processing search terms in vector search queries. |
| **ChunkingOptionsForQuerying** | `ChunkingOptions ` | The chunking method, maximum token limit, and overlap tokens used when processing search terms in vector search queries. |
| **EmbeddingsCacheExpiration** | `TimeSpan ` | The expiration period for documents in the [Embedding cache collection](../../../ai-integration/generating-embeddings/embedding-collections.mdx#the-embeddings-cache-collection) that contain embeddings generated from source documents. |
| **EmbeddingsCacheForQueryingExpiration** | `TimeSpan ` | The expiration period for documents in the embedding cache collection that contain embeddings generated from search terms in vector search queries. |

Expand Down Expand Up @@ -518,3 +530,5 @@ public enum VectorEmbeddingType
public AddEmbeddingsGenerationOperation(EmbeddingsGenerationConfiguration configuration);
```
</TabItem>

</Panel>
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
title: "Generating embeddings: Start"
title: "Generating embeddings"
sidebar_label: Start
sidebar_position: 0
---

import Admonition from '@theme/Admonition';

import CardWithImage from "@site/src/components/Common/CardWithImage";
import CardWithImageHorizontal from "@site/src/components/Common/CardWithImageHorizontal";
import ColGrid from "@site/src/components/ColGrid";

import embedGenStartApiImage from "./assets/embeddings-generation_start_api-image.png";
import embedGenStartStudioImage from "./assets/embeddings-generation_start_studio-image.png";
import aiSearchArticleImage from "./assets/ai-search-article-cover.webp";
Expand All @@ -17,43 +17,102 @@ import ayendeBlogImage from "@site/static/img/from-ayende-com.webp";
import webinarThumbnailPlaceholder from "@site/static/img/webinar.webp";
import discordLargeThumbnailPlaceholder from "@site/static/img/discord-lg.webp";

# Generating embeddings

<Admonition type="note" title="">
### Create embeddings to enable AI-powered similarity search.
[Embeddings](https://en.wikipedia.org/wiki/Embedding_(machine_learning)) are numeric vectors that you can create for data (like a text or an image) to capture meanings, contexts, or relationships related to the data. You can then search the data by running intelligent queries over its embeddings using [vector search](../../ai-integration/vector-search/vector-search_start) to find content by similarity rather than exact match.
- RavenDB allows you to create embeddings using native [ongoing embeddings-generation tasks](../../ai-integration/generating-embeddings/embeddings-generation-task) that systematically process document collections and convert document fields (like texts or arrays) into embeddings. To create the embeddings, the tasks can use either an external AI model (such as OpenAI) or RavenDB's default embedding model.
- You can also create embeddings using external embeddings providers and store them in your database (e.g., to handle other content types such as images).
- You can avoid pre-generating embeddings, and let vector search operations generate embeddings on-the-fly, while searching.
- Embeddings can be used by other RavenDB AI features. E.g., [AI agents](../../ai-integration/ai-agents/ai-agents_start) can use vector search to retrieve relevant data requested by the LLM.

* [Embeddings](https://en.wikipedia.org/wiki/Embedding_(machine_learning)) are numeric vectors that represent the meaning, context, or relationships within the data (e.g., text or images).
They allow you to search for similar content using [Vector search](../../ai-integration/vector-search/vector-search_start), rather than exact matches.

* RavenDB can generate embeddings automatically using [Embeddings-generation tasks](../../ai-integration/generating-embeddings/embeddings-generation-task) that you define.
These tasks process text fields from documents and convert them into vectors using either an AI model from external providers
(e.g., [OpenAI](../../ai-integration/connection-strings/open-ai)) or [RavenDB’s built-in model](../../ai-integration/connection-strings/embedded).
The resulting vectors are stored in a dedicated collection in your database.

* Alternatively, you can skip pre-generating embeddings via _Embedding-generation tasks_ and let RavenDB generate them on-the-fly during dynamic vector search queries.
Learn more in [Vector search using a dynamic query](../../ai-integration/vector-search/vector-search-using-dynamic-query).

* RavenDB does not generate embeddings for non-text content such as images.
However, you can import embeddings for such content from external providers and store them in your database for similarity search.

* Embeddings also power other AI features in RavenDB, such as [GenAI](../../ai-integration/gen-ai-integration/gen-ai_start) and [AI agents](../../ai-integration/ai-agents/ai-agents_start),
which may use vector search queries to retrieve relevant documents in response to LLM requests.

* On this start page, you'll find:
* [Use cases](../../ai-integration/generating-embeddings/embeddings-generation_start#use-cases)
* [Technical documentation links](../../ai-integration/generating-embeddings/embeddings-generation_start#technical-documentation-links-)
* [Related in-depth articles](../../ai-integration/generating-embeddings/embeddings-generation_start#related-in-depth-articles)
* [Related lives & videos](../../ai-integration/generating-embeddings/embeddings-generation_start#related-lives--videos)

</Admonition>

---

### Use cases
Embeddings generation tasks can be used to prepare your data for AI-powered search, analysis, and usage, e.g., for -

Embeddings-generation tasks can prepare your data for AI-powered search, analysis, and retrieval.
Common use cases include:

* **Enterprise knowledge bases**
Generate embeddings for thousands of documents, policies, and procedures to enable instant semantic search
Generate embeddings for thousands of documents, policies, and procedures to enable fast, semantic search.
* **Legal document libraries**
Process case law, contracts, and regulations to build searchable legal repositories
Process case law, contracts, and regulations to build a searchable legal knowledge base.
* **Product catalogs**
Convert product descriptions, specifications, and reviews into embeddings for enhanced e-commerce search
Convert product descriptions, specifications, and reviews into embeddings for intelligent e-commerce search and recommendations.
* **Content management systems**
Transform blog posts, articles, and marketing materials into searchable vector representations
Transform blog posts, articles, and marketing materials into searchable vector embeddings.

### Technical documentation
Learn about generating, storing, and using embeddings in RavenDB.
---

<ColGrid colCount={2}>
<CardWithImage title="Embeddings generation overview" description="Learn the basics of embeddings generation in RavenDB" url="../../ai-integration/generating-embeddings/overview" imgIcon="embeddings" ctaLabel="Read" />
<CardWithImage title="Embeddings generation tasks UI" description="Create and configure your tasks using Studio" url="../../ai-integration/generating-embeddings/embeddings-generation-task#configuring-an-embeddings-generation-task---from-the-studio" imgIcon="studio-config" ctaLabel="Read" />
</ColGrid>
### Technical documentation links ✨

* [Generating embeddings - overview](../../ai-integration/generating-embeddings/overview)
* [Configure an embeddings generation task - from the Studio](../../ai-integration/generating-embeddings/embeddings-generation-task)
* [Configure an embeddings generation task - from the Client API](../../ai-integration/generating-embeddings/embeddings-generation-task#configuring-an-embeddings-generation-task---from-the-client-api)
* [The embedding collections](../../ai-integration/generating-embeddings/embedding-collections)

---

### Related in-depth articles

#### Learn more: In-depth embeddings generation articles
<ColGrid colCount={2}>
<CardWithImage title="Embeddings with RavenDB and External Models" description="Step-by-step setup guide for AI-powered semantic search, by Paweł Lachowski" imgSrc={aiSearchArticleImage} url="https://ravendb.net/articles/embeddings-generation-with-ravendb" imgAlt="Embeddings with RavenDB and External Models" ctaLabel="Read" />
<CardWithImage title="The integration of embeddings generation in RavenDB" description="The reasoning and architecture behind RavenDB's embeddings feature, by Oren Eini" imgSrc={ayendeBlogImage} url="https://ravendb.net/articles/ai-integration-in-ravendb-embeddings-generation" imgAlt="The integration of embeddings generation in RavenDB" ctaLabel="Read" />
<CardWithImage
title="Embeddings with RavenDB and External Models"
description="Step-by-step setup guide for AI-powered semantic search, by Paweł Lachowski"
imgSrc={aiSearchArticleImage} url="https://ravendb.net/articles/embeddings-generation-with-ravendb"
imgAlt="Embeddings with RavenDB and External Models"
ctaLabel="Read article"
/>
<CardWithImage
title="The integration of embeddings generation in RavenDB"
description="The reasoning and architecture behind RavenDB's embeddings feature, by Oren Eini"
imgSrc={ayendeBlogImage} url="https://ravendb.net/articles/ai-integration-in-ravendb-embeddings-generation"
imgAlt="The integration of embeddings generation in RavenDB"
ctaLabel="Read article"
/>
</ColGrid>

### Related lives & Videos
---

### Related lives & videos

Learn more about enhancing your applications using vector search operations.

<ColGrid colCount={1}>
<CardWithImageHorizontal title="Taking over the world with AI and RavenDB" description="Watch a webinar about vector search." url="https://www.youtube.com/watch?v=7DhbgfH_rSE" imgSrc={webinarThumbnailPlaceholder} imgAlt="Taking over the world with AI and RavenDB" ctaLabel="Watch" />
<CardWithImageHorizontal title="Vector search in RavenDB" description="The AI trend developers simply cannot ignore " url="https://www.youtube.com/watch?v=zZwid8LA-e4" imgSrc={discordLargeThumbnailPlaceholder} imgAlt="Vector search in RavenDB" ctaLabel="Watch" />
<CardWithImageHorizontal
title="Taking over the world with AI and RavenDB"
description="Learn how to integrate AI into your apps using RavenDB for semantic search, vector embeddings, and LLM-powered features - with real examples from Oren Eini."
url="https://www.youtube.com/watch?v=7DhbgfH_rSE"
imgSrc={webinarThumbnailPlaceholder}
imgAlt="Taking over the world with AI and RavenDB"
ctaLabel="Watch"
/>
<CardWithImageHorizontal
title="Vector search in RavenDB"
description="The AI trend developers simply cannot ignore."
url="https://www.youtube.com/watch?v=zZwid8LA-e4"
imgSrc={discordLargeThumbnailPlaceholder}
imgAlt="Vector search in RavenDB"
ctaLabel="Watch"
/>
</ColGrid>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading