Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions diskann/00_Introduction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,23 @@
- [Visual Studio Code](https://code.visualstudio.com/download)
- [Python extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-python.python)
- [Jupyter Notebook extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter)
- [Docker Desktop](https://www.docker.com/products/docker-desktop/) with [WSL 2 backend (if on Windows)](https://learn.docker.com/desktop/wsl/)
- [Docker Desktop](https://www.docker.com/products/docker-desktop/) with [WSL 2 backend (if on Windows)](https://docs.docker.com/desktop/features/wsl/)
- [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli)
- [Bicep CLI](https://learn.microsoft.com/azure/azure-resource-manager/bicep/install#install-manually)
- [Powershell](https://learn.microsoft.com/powershell/scripting/install/installing-powershell?view=powershell-7.3)

## Why use this guide?

The future of software involves combining AI and data services, also known as intelligent applications. This guide is for developers looking to implement intelligent applications quickly while leveraging existing skills. The content will focus on the developer journey implementing an Azure-based AI-enabled GPT-based chat application that is augmented using data stored in Azure Cosmos DB for NoSQL while leveraging Azure OpenAI services.
The future of software involves combining AI and data services, also known as intelligent applications.
This guide is for developers looking to implement intelligent applications quickly while leveraging existing skills.
The content will focus on the developer journey implementing an Azure-based AI-enabled GPT-based chat application that is augmented using data stored in Azure Cosmos DB for NoSQL while leveraging Azure OpenAI services.

## Introduction

This guide will walks through the creating intelligent solutions that combines Azure Cosmos DB for NoSQL with vector search capabilities powered by DiskANN and document retrieval with Azure OpenAI services to build a chat bot experience. The guide includes labs that build and deploy a sample chat app using these technologies, with a focus on Azure Cosmos DB for NoSQL, vector search powered by DiskANN, and Azure OpenAI using the Python programming language. For those new to using Azure OpenAI and Vector Search technologies, the guide includes explanations of the core concepts and techniques used when implementing these technologies.
This guide will walks through the creating intelligent solutions that combines Azure Cosmos DB for NoSQL with vector search capabilities powered by DiskANN and document retrieval with Azure OpenAI services to build a chat bot experience.
The guide includes labs that build and deploy a sample chat app using these technologies, with a focus on Azure Cosmos DB for NoSQL, vector search powered by DiskANN, and Azure OpenAI using the Python programming language.
For those new to using Azure OpenAI and Vector Search technologies, the guide includes explanations of the core concepts and techniques used when implementing these technologies.

> **Note:** This developer guide is targeted towards Python developers.

If you are a Node.js developer, then you may be interested in the Node.js version here: [https://github.com/AzureCosmosDB/Azure-OpenAI-Node.js-Developer-Guide](https://github.com/AzureCosmosDB/Azure-OpenAI-Node.js-Developer-Guide)
111 changes: 77 additions & 34 deletions diskann/01_Azure_Overview/README.md

Large diffs are not rendered by default.

31 changes: 22 additions & 9 deletions diskann/02_Overview_Cosmos_DB/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,38 @@
# Overview of Azure Cosmos DB

[Azure Cosmos DB](https://learn.microsoft.com/azure/cosmos-db/introduction) is a globally distributed database for storing and querying both NoSQL and vector data, with a serverless option. It has multiple APIs, the most notable being the native NoSQL document API and MongoDB API. It provides turnkey global distribution, elastic and dynamic scaling of throughput and storage, and a comprehensive SLA (service level agreement) for single-digit millisecond latency and 99.999% high-availability.
[Azure Cosmos DB](https://learn.microsoft.com/azure/cosmos-db/introduction) is a globally distributed, multi-model database service for both NoSQL and relational workloads.
It supports multiple APIs: NoSQL, MongoDB, PostgreSQL, Cassandra, Gremlin, and Table—covering document, relational, column-family, graph, and key-value data models.
The service offers turnkey global distribution with elastic scaling of throughput and storage.
It delivers single-digit millisecond latencies at the 99th percentile and guarantees high availability through multi-homing capabilities.
Azure Cosmos DB provides comprehensive service level agreements (SLAs) covering throughput, latency, availability, and consistency—a unique combination among cloud database services.

## Azure Cosmos DB and AI

The surge of AI-powered applications has led to the need to integrate operational data from multiple data stores, introducing another layer of complexity as each data store tends to have its own workflow and operational performance. Azure Cosmos DB simplifies this process by providing a unified platform for all data types, including AI data. In particular, its support for vector storage and retrieval is a game-changer for generative AI applications. By representing complex data elements like text, images, or sound as high-dimensional vectors, Azure Cosmos DB allows for efficient storage, indexing, and querying of these vectors, which is crucial for many generative AI tasks.

Unlike traditional databases requiring separate workarounds for different data types, Azure Cosmos DB supports multiple data models within a single, integrated environment. This simplification means you can leverage the same robust platform for all your AI data needs. Many AI applications rely on external stand-alone vector stores, which can be cumbersome to manage and maintain. Azure Cosmos DB's native support for vector storage and retrieval eliminates the need for these external stores as all the application's data is located in a single place thus streamlining the development and deployment of AI applications. These features enable the building, deploying, and scaling of AI applications to be more efficient and reliable, making Azure Cosmos DB an ideal choice for handling the complex data requirements of modern generative AI solutions.
The surge of AI-powered applications has led to the need to integrate data from multiple data stores, introducing another layer of complexity as each data store tends to have its own workflow and operational performance.
Azure Cosmos DB simplifies this process by providing a unified platform for all data types, including AI data.
Azure Cosmos DB supports relational, document, vector, key-value, graph, and table data models, making it an ideal platform for AI applications.
The wide array of data model support combined with guaranteed high availability, high throughput, low latency, and tunable consistency are huge advantages when building these types of applications.

## Azure Cosmos DB for NoSQL

The focus for this developer guide is [Azure Cosmos DB for NoSQL](https://learn.microsoft.com/azure/cosmos-db/nosql/) and [Vector Search](https://learn.microsoft.com/azure/cosmos-db/nosql/vector-search).

### Azure Cosmos DB for NoSQL capacity modes

Azure Cosmos DB offers three capacity modes: provisioned throughput, serverless and autoscale modes. creating an Azure Cosmos DB account, it's essential to evaluate the workload's characteristics in order to choose the appropriate mode to optimize both performance and cost efficiency.
Azure Cosmos DB offers three capacity modes: provisioned throughput, serverless and autoscale modes.
When creating an Azure Cosmos DB account, it's essential to evaluate the workload's characteristics in order to choose the appropriate mode to optimize both performance and cost efficiency.

[**Serverless mode**](https://learn.microsoft.com/azure/cosmos-db/serverless) offers a more flexible and pay-as-you-go approach, where only the Request Units consumed are billed. This is particularly advantageous for applications with sporadic or unpredictable usage patterns, as it eliminates the need to provision resources upfront.
[**Serverless mode**](https://learn.microsoft.com/azure/cosmos-db/serverless) offers a more flexible and pay-as-you-go approach, where only the Request Units consumed are billed.
This is particularly advantageous for applications with sporadic or unpredictable usage patterns, as it eliminates the need to provision resources upfront.

[**Provisioned throughput mode**](https://learn.microsoft.com/azure/cosmos-db/set-throughput) allocates a fixed amount of resources, measured in [Request Units per second (RUs/s)](https://learn.microsoft.com/azure/cosmos-db/request-units), which is ideal for applications with predictable and steady workloads. This ensures consistent performance and can be more cost-effective when there is a constant or high demand for database operations. RU/s can be set at both the database and container levels, allowing for fine-grained control over resource allocation.
[**Provisioned throughput mode**](https://learn.microsoft.com/azure/cosmos-db/set-throughput) allocates a fixed amount of resources, measured in [Request Units per second (RUs/s)](https://learn.microsoft.com/azure/cosmos-db/request-units), which is ideal for applications with predictable and steady workloads.
This ensures consistent performance and can be more cost-effective when there is a constant or high demand for database operations.
RU/s can be set at both the database and container levels, allowing for fine-grained control over resource allocation.

[**Autoscale mode**](https://learn.microsoft.com/azure/cosmos-db/provision-throughput-autoscale) builds upon the provisioned throughput mode but allows for the database or container automatically and instantly scale up or down resources based on demand, ensuring that the application can handle varying workloads efficiently. When configuring autoscale, a maximum (Tmax) value threshold is set for a predictable maximum cost. This mode is suitable for applications with fluctuating usage patterns or infrequently used applications.
[**Autoscale mode**](https://learn.microsoft.com/azure/cosmos-db/provision-throughput-autoscale) builds upon the provisioned throughput mode but allows for the database or container automatically and instantly scale up or down resources based on demand, ensuring that the application can handle varying workloads efficiently.
When configuring autoscale, a maximum (Tmax) value threshold is set for a predictable maximum cost.
This mode is suitable for applications with fluctuating usage patterns or infrequently used applications.

[**Dynamic scaling**](https://learn.microsoft.com/azure/cosmos-db/autoscale-per-partition-region) allows for the automatic and independent scaling of non-uniform workloads across regions and partitions according to usage patterns. For instance, in a disaster recovery configuration with two regions, the primary region may experience high traffic while the secondary region can scale down to idle, thereby saving costs. This approach is also highly effective for multi-regional applications, where traffic patterns fluctuate based on the time of day in each region.
[**Dynamic scaling**](https://learn.microsoft.com/azure/cosmos-db/autoscale-per-partition-region) allows for the automatic and independent scaling of non-uniform workloads across regions and partitions according to usage patterns.
For instance, in a disaster recovery configuration with two regions, the primary region may experience high traffic while the secondary region can scale down to idle, thereby saving costs.
This approach is also highly effective for multi-regional applications, where traffic patterns fluctuate based on the time of day in each region.
Loading