diff --git a/api-reference/workflow/destinations/ibm-watsonxdata.mdx b/api-reference/workflow/destinations/ibm-watsonxdata.mdx
index aa9cf952..c9905fc2 100644
--- a/api-reference/workflow/destinations/ibm-watsonxdata.mdx
+++ b/api-reference/workflow/destinations/ibm-watsonxdata.mdx
@@ -2,6 +2,17 @@
title: IBM watsonx.data
---
+
+ The IBM watsonx.data destination connector relies on an Apache Iceberg-based catalog within the watsonx.data data store instance.
+ Apache Iceberg is suitable for managed data storage and cataloging, but not for embedding storage or semantic similarity
+ queries. For embedding storage and semantic similarity queries, Unstructured recommends that you use the following destination connectors
+ instead:
+
+ - [Astra DB](/api-reference/workflow/destinations/astradb)
+ - [Milvus](/api-reference/workflow/destinations/milvus) on IBM watsonx.data
+
+
+
import FirstTimeAPIDestinationConnector from '/snippets/general-shared-text/first-time-api-destination-connector.mdx';
diff --git a/open-source/ingestion/destination-connectors/ibm-watsonxdata.mdx b/open-source/ingestion/destination-connectors/ibm-watsonxdata.mdx
index 2135acf6..9b97c33e 100644
--- a/open-source/ingestion/destination-connectors/ibm-watsonxdata.mdx
+++ b/open-source/ingestion/destination-connectors/ibm-watsonxdata.mdx
@@ -2,6 +2,17 @@
title: IBM watsonx.data
---
+
+ The IBM watsonx.data destination connector relies on an Apache Iceberg-based catalog within the watsonx.data data store instance.
+ Apache Iceberg is suitable for managed data storage and cataloging, but not for embedding storage or semantic similarity
+ queries. For embedding storage and semantic similarity queries, Unstructured recommends that you use the following destination connectors
+ instead:
+
+ - [Astra DB](/open-source/ingestion/destination-connectors/astradb)
+ - [Milvus](/open-source/ingestion/destination-connectors/milvus) on IBM watsonx.data
+
+
+
import SharedIBMWatsonxdata from '/snippets/dc-shared-text/ibm-watsonxdata-cli-api.mdx';
diff --git a/snippets/general-shared-text/astradb-api-placeholders.mdx b/snippets/general-shared-text/astradb-api-placeholders.mdx
index 0368ae40..45697f5e 100644
--- a/snippets/general-shared-text/astradb-api-placeholders.mdx
+++ b/snippets/general-shared-text/astradb-api-placeholders.mdx
@@ -1,7 +1,7 @@
- `` (_required_) - A unique name for this connector.
- `` (_required_) - The application token for the database.
-- `` (_required_) - The database’s associated API endpoint.
-- `` - The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
+- `` (_required_) - The database's associated API endpoint.
+- `` - The name of the collection in the keyspace. If no value is provided, see the beginning of this article for the behavior at run time.
- `` - The name of the keyspace in the collection. The default is `default_keyspace` if not otherwise specified.
- `` - The maximum number of records to send per batch. The default is `20` if not otherwise specified.
- `flatten_metadata` - Set to `true` to flatten the metadata into each record. Specifically, when flattened, the metadata key values are brought to the top level of the element, and the `metadata` key itself is removed. By default, the metadata is not flattened (`false`).
diff --git a/snippets/general-shared-text/astradb-platform.mdx b/snippets/general-shared-text/astradb-platform.mdx
index 3e4a79fd..8ff44f9b 100644
--- a/snippets/general-shared-text/astradb-platform.mdx
+++ b/snippets/general-shared-text/astradb-platform.mdx
@@ -1,7 +1,7 @@
Fill in the following fields:
- **Name** (_required_): A unique name for this connector.
-- **Collection Name**: The name of the collection in the namespace. If no value is provided, see the beginning of this article for the behavior at run time.
+- **Collection Name**: The name of the collection in the keyspace. If no value is provided, see the beginning of this article for the behavior at run time.
- **Keyspace** (_required_): The name of the keyspace in the collection.
- **Batch Size**: The maximum number of records per batch. The default is `20` if not otherwise specified.
- **Flatten Metadata**: Check this box to flatten the metadata into each record.
diff --git a/snippets/general-shared-text/astradb.mdx b/snippets/general-shared-text/astradb.mdx
index b2a8c1ec..91d4b3c8 100644
--- a/snippets/general-shared-text/astradb.mdx
+++ b/snippets/general-shared-text/astradb.mdx
@@ -8,26 +8,70 @@ allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; pic
allowfullscreen
>
-- An Astra account. [Create or sign in to an Astra account](https://astra.datastax.com/).
-- A database in the Astra account. [Create a database in an account](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).
-- An application token for the database. [Create a database application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).
-- A namespace in the database. [Create a namespace in a database](https://docs.datastax.com/en/astra-db-serverless/databases/manage-namespaces.html#create-namespace).
-- A collection in the namespace. [Create a collection in a namespace](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
+- An IBM Cloud account or DataStax account.
- An existing collection is not required. At runtime, the collection behavior is as follows:
+ - For an IBM Cloud account, [sign up](https://cloud.ibm.com/registration) for an IBMid, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your IBMid.
+ - For a DataStax account, [sign up](https://astra.datastax.com/signup) for a DataStax account, and then [sign in](https://accounts.datastax.com/session-service/v1/login) to DataStax with your DataStax account.
+
+- An Astra DB database in the DataStax account. To create a database:
+
+ a. After you sign in to DataStax, click **Create database**.
+ b. Click the **Serverless (vector)** tile, if it is not already selected.
+ c. For **Database name**, enter some unique name for the database.
+ d. Select a **Provider** and a **Region**, and then click **Create database**.
+
+ [Learn more](https://docs.datastax.com/en/astra-db-classic/databases/manage-create.html).
+
+- An application token for the database. To create an application token:
+
+ a. After you sign in to DataStax, in the list of databases, click the name of the target database.
+ b. On the **Overview** tab, under **Database Details**, in the **Application Tokens** tile, click **Generate Token**.
+ c. Enter some **Token description** and select and **Expiration** time period, and then click **Generate token**.
+ d. Save the application token that is displayed to a secure location, and then click **Close**.
+
+ [Learn more](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html).
+
+- A keyspace in the database. To create a keyspace:
+
+ a. After you sign in to DataStax, in the list of databases, click the name of the target database.
+ b. On the **Data Explorer** tab, in the **Keyspace** list, select **Create keyspace**.
+ c. Enter some **Keyspace name**, and then click **Add keyspace**.
+
+ [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-keyspaces.html#keyspaces).
+
+- A collection in the keyspace.
For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview):
- - If an existing collection name is specified, and Unstructured generates embeddings,
- but the number of dimensions that are generated does not match the existing collection's embedding settings, the run will fail.
- You must change your Unstructured embedding settings or your existing collection's embedding settings to match, and try the run again.
- - If a collection name is not specified, Unstructured creates a new collection in your namespace. If Unstructured generates embeddings,
- the new collections's name will be `u__`.
- If Unstructured does not generate embeddings, the new collections's name will be `u__`.
+ If Unstructured does not generate embeddings, the new collections's name will be `u
+ b. On the **Data Explorer** tab, in the **Keyspace** list, select the name of the target keyspace.
+ c. In the **Collections** list, select **Create collection**.
+ d. Enter some **Collection name**.
+ e. Turn on **Vector-enabled collection**, if it is not already turned on.
+ f. For **Embedding generation method**, select **Bring my own**.
+ g. For **Dimensions**, enter the number of dimensions for the embedding model that you plan to use.
+ h. For **Similarity metric**, select **Cosine**.
+ i. Click **Create collection**.
+
+ [Learn more](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
\ No newline at end of file
diff --git a/snippets/general-shared-text/milvus.mdx b/snippets/general-shared-text/milvus.mdx
index 52055b36..6db481cc 100644
--- a/snippets/general-shared-text/milvus.mdx
+++ b/snippets/general-shared-text/milvus.mdx
@@ -1,7 +1,61 @@
-- For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Zilliz Cloud, and Milvus on IBM watsonx.data) are supported.
+- For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), only Milvus cloud-based instances (such as Milvus on IBM watsonx.data, or Zilliz Cloud) are supported.
- For [Unstructured Ingest](/open-source/ingestion/overview), Milvus local and cloud-based instances are supported.
-The following video shows how to fulfill the minimum set of requirements for Milvus cloud-based instances, demonstrating Milvus on IBM watsonx.data:
+- For Milvus on IBM watsonx.data, you will need:
+
+
+
+ - An [IBM Cloud account](https://cloud.ibm.com/registration).
+ - An IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
+ or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1) within your IBM Cloud account.
+
+ - If you are provisoning a Lite plan, be sure to choose the **Generative AI** use case when prompted, as this is the only use case offered that includes Milvus.
+
+ - A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).
+
+ - If you are creating a Milvus service instance within a watsonx.data Lite plan, when you are prompted to choose a Milvus instance size, you can only select **Lite**. Because the Lite
+ Milvus instance size is recommended only for 384 dimensions, you should also use an embedding model that uses 384 dimensions only.
+ - If you are creating a Milvus service instance within a watsonx.data Enterprise plan, you can choose any available Milvus instance size. However, all Milvus instance sizes other than
+ **Custom** are recommended only for 384 dimensions, which means you should use an embedding model that uses 384 dimensions only.
+ The **Custom** Milvus instance size is recommended for any number of dimensions.
+
+ - The URI of the instance, which takes the format of `https://`, followed by instance's **GRPC host**, followed by a colon and the **GRPC port**.
+ This takes the format of `https://:`. To get this informatation, do the following:
+
+ a. Sign in to your IBM Cloud account.
+ b. On the sidebar, click the **Resource list** icon. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.
+ c. Expand **Databases**, and then click the name of the target **watsonx.data** plan.
+ d. Click **Open web console**.
+ e. On the sidebar, click **Infrastructure manager**. If the sidebar is not visible, click the **Global navigation** icon to the far left of the title bar.
+ f. Click the target Milvus service instance.
+ g. On the **Details** tab, under **Type**, click **View connect details**.
+ h. Under **Service details**, expand **GRPC**, and note the value of **GRPC host** and **GRPC port**.
+
+ - The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
+ - The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
+ - The username and password to access the instance.
+
+ - The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
+ - The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key. To create an IBM Cloud user API key:
+
+ a. Sign in to your IBM Cloud account.
+ b. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
+ c. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.
+ d. Click **Create**.
+ e. Enter some **Name** for the API key.
+ f. Optionally, enter some **Description** for the API key.
+ g. For **Leaked action**, leave **Disable the leaked key** selected.
+ h. For **Session management**, leave **No** selected.
+ i. Click **Create**.
+ j. Click **Download** (or **Copy**), and then download the API key to a secure location (or paste the copied API key into a secure location). You won't be able to access this API key from this dialog again. If you lose this API key, you can create a new one (and you should then delete the old one).
- For Zilliz Cloud, you will need:
@@ -54,31 +108,6 @@ The following video shows how to fulfill the minimum set of requirements for Mil
The number of dimensions for the `embeddings` field must match the number of dimensions for the embedding model that you plan to use.
-- For Milvus on IBM watsonx.data, you will need:
-
-
-
- - An [IBM Cloud account](https://cloud.ibm.com/registration).
- - The [IBM watsonx.data subscription plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started).
- - A [Milvus service instance in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-adding-milvus-service).
- - The URI of the instance, which takes the format of `https://`, followed by instance's **GRPC host**, followed by a colon and the **GRPC port**.
- This takes the format of `https://:`.
- [Get the instance's GRPC host and GRPC port](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-conn-to-milvus).
- - The name of the [database](https://milvus.io/docs/manage_databases.md) in the instance.
- - The name of the [collection](https://milvus.io/docs/manage-collections.md) in the database. Note the collection requirements at the end of this section.
- - The username and password to access the instance.
- The username for Milvus on IBM watsonx.data is always `ibmlhapikey`.
- The password for Milvus on IBM watsonx.data is in the form of an IBM Cloud user API key.
- [Get the user API key](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
-
- For Milvus local, you will need:
- A [Milvus instance](https://milvus.io/docs/install-overview.md).
@@ -89,7 +118,9 @@ The following video shows how to fulfill the minimum set of requirements for Mil
- The [username and password, or token](https://milvus.io/docs/authenticate.md) to access the instance.
All Milvus instances require the target collection to have a defined schema before Unstructured can write to the collection. The minimum viable
-schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows. This example code demonstrates the use of the
+schema for Unstructured contains only the fields `element_id`, `embeddings`, `record_id`, and `text`, as follows.
+
+This example code demonstrates the use of the
[Python SDK for Milvus](https://pypi.org/project/pymilvus/) to create a collection with this schema,
targeting Milvus on IBM watsonx.data. For the `MilvusClient` arguments to connect to other types of Milvus deployments, see your Milvus provider's documentation:
diff --git a/ui/destinations/ibm-watsonxdata.mdx b/ui/destinations/ibm-watsonxdata.mdx
index a2d128a1..667cb8f0 100644
--- a/ui/destinations/ibm-watsonxdata.mdx
+++ b/ui/destinations/ibm-watsonxdata.mdx
@@ -2,6 +2,17 @@
title: IBM watsonx.data
---
+
+ The IBM watsonx.data destination connector relies on an Apache Iceberg-based catalog within the watsonx.data data store instance.
+ Apache Iceberg is suitable for managed data storage and cataloging, but not for embedding storage or semantic similarity
+ queries. For embedding storage and semantic similarity queries, Unstructured recommends that you use the following destination connectors
+ instead:
+
+ - [Astra DB](/ui/destinations/astradb)
+ - [Milvus](/ui/destinations/milvus) on IBM watsonx.data
+
+
+
import FirstTimeUIDestinationConnector from '/snippets/general-shared-text/first-time-ui-destination-connector.mdx';