Skip to content

Conversation

@SimonChou12138
Copy link

@SimonChou12138 SimonChou12138 commented Jun 23, 2025

Issue

Closes #257

Change

My modifications mainly include adding the langchain4j-oceanbase module, implementing the EmbeddingStore interface, and adding test cases and documentation

General checklist

  • There are no breaking changes
  • I have added unit and/or integration tests for my change
  • The tests cover both positive and negative cases
  • I have manually run all the unit and integration tests in the module I have added/changed, and they are all green
  • I have manually run all the unit and integration tests in the core and main modules, and they are all green

Checklist for adding new maven module

  • I have added my new module in the root pom.xml and langchain4j-bom/pom.xml

Checklist for adding new embedding store integration

  • I have added a {NameOfIntegration}EmbeddingStoreIT that extends from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
  • I have added a {NameOfIntegration}EmbeddingStoreRemovalIT that extends from EmbeddingStoreWithRemovalIT

Checklist for changing existing embedding store integration

  • I have manually verified that the {NameOfIntegration}EmbeddingStore works correctly with the data persisted using the latest released version of LangChain4j

@SimonChou12138
Copy link
Author

SimonChou12138 commented Jun 23, 2025

@dliubarskyi @Martin7-1 Hello guy, sorry for the long wait. I have migrated the vector storage integration of Oceanbase over. Please help review it

@Martin7-1 Martin7-1 added enhancement New feature or request P3 Medium priority theme: embedding store Issues/PRs related to embedding store labels Jun 24, 2025
@Martin7-1
Copy link
Collaborator

@SimonChou12138 Thank you for your contribution. Please run make format to pass the spotless check.

@lzx1026078111
Copy link
Contributor

It's able to add unit test for SQLFilter?

Copy link
Collaborator

@Martin7-1 Martin7-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonChou12138 Thank you! Just a quick review :)

<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>1.1.0-SNAPSHOT</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<version>1.1.0-SNAPSHOT</version>
<version>${langchain4j.core.version}</version>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it~


<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be better to use tinylog for test to keep consistency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it~

statement.addBatch("DROP TABLE IF EXISTS " + name);
}

StringBuilder createTableSql = new StringBuilder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be better to use [text block] and format it by String.format(). like this:

            statement.addBatch(String.format("""
                            CREATE TABLE IF NOT EXISTS %s (
                                %s VARCHAR(36) NOT NULL,
                                %s VECTOR(%d),
                                %s VARCHAR(4000),
                                %s JSON,
                                PRIMARY KEY (%s),
                                VECTOR INDEX %s(%s) WITH (
                                    distance=%s,
                                    type=%s
                                )
                            )
                            """,
                    name,
                    idColumn,
                    embeddingColumn, vectorDimension,
                    textColumn,
                    metadataColumn,
                    idColumn,
                    vectorIndexName, embeddingColumn,
                    distanceMetric, indexType
            ));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okkk

private String metadataColumn = "metadata";
private int vectorDimension = 1536;
private String vectorIndexName = "idx_vector";
private String distanceMetric = "L2";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, all scores in EmbeddingStore represent cosine similarity based score. I'm not sure if it works in L2 distance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I can use cosine as the default value

* <code>true</code> if search should use an exact search, or <code>false</code> if
* it should use approximate search.
*/
private final boolean isExactSearch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it can be a local-variable in ctor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep it as a member variable for clarity and consistency with other config fields like dataSource and table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

* @param sqlException SQLException to transform. Not null.
* @return A RuntimeException that wraps the SQLException.
*/
private static RuntimeException uncheckSQLException(SQLException sqlException) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A best practice is to define a new Exception and extend LangChain4jException.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it~

* @return The item at the given index.
* @throws IllegalArgumentException If the item is null.
*/
private static <T> T ensureIndexNotNull(List<T> list, int index, String name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great utility method! I think we can move it to a seperate utility class. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okkk,I will move it~

return item;
}

private static String ensureNotEmpty(String value, String name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

}

@Override
public List<String> generateIds(final int n) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's the same as parent's method. Is it any other intention to override it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, it was just a misunderstanding. I didn't pay attention and implemented too much

ensureNotNull(embeddings, "embeddings");

if (embeddings.isEmpty()) {
return Collections.emptyList();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Collections.emptyList();
return List.of();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okk~

@SimonChou12138
Copy link
Author

It's able to add unit test for SQLFilter?

Yup

@lzx1026078111
Copy link
Contributor

Maybe we can add a new spring-boot-starter for oceanbase.
Or open a new issue later.

@SimonChou12138 SimonChou12138 requested a review from Martin7-1 June 25, 2025 15:45
@SimonChou12138
Copy link
Author

Maybe we can add a new spring-boot-starter for oceanbase. Or open a new issue later.

I think we can bring up a new PR implementation when we have time later

@Martin7-1
Copy link
Collaborator

Martin7-1 commented Jun 27, 2025

Looks like there are some ITs failed.

Also, please fix the issue of license-maven-plugin like langchain4j-community-oci-genai.

@SimonChou12138
Copy link
Author

Looks like there are some ITs failed.

Also, please fix the issue of license-maven-plugin like langchain4j-community-oci-genai.

@Martin7-1 ok, I will check the skipCompliance plugin, and the other itcast failure seems to be due to testContainer startup failure, which may be caused by network issues. Just trigger ci again and it will be fine

@Martin7-1
Copy link
Collaborator

Martin7-1 commented Jun 30, 2025

OceanBaseEmbeddingStoreIT should extend from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT. Also, please implement EmbeddingStoreWithRemovalIT for removal functionality. You can check langchain4j-community-clickhouse for reference.

@Martin7-1
Copy link
Collaborator

@SimonChou12138 Any updates?

@SimonChou12138
Copy link
Author

OceanBaseEmbeddingStoreIT should extend from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT. Also, please implement EmbeddingStoreWithRemovalIT for removal functionality. You can check langchain4j-community-clickhouse for reference.

I'm sorry I've been a bit busy recently. I'll implement this when I have time

@SimonChou12138
Copy link
Author

SimonChou12138 commented Sep 8, 2025

@SimonChou12138 Any updates?

@Martin7-1
I'm sorry, I was very busy a while ago and came late. When I tried to adjust the test cases to extend EmbeddingStoreWithFiltering IT, I found a precision issue that I couldn't solve. The test cases had to have a precision of 9 bits, but I found that the precision was lost when the ocean case was stored, and only 7 bits were left after reading it.

Can you help me take a look together?
image

Additionally, I have found that the similarity cannot be achieved
image

It means that my deletion of extends EmbeddingStoreWithRemovalIT is normal
image

@Martin7-1
Copy link
Collaborator

The test cases had to have a precision of 9 bits, but I found that the precision was lost when the ocean case was stored, and only 7 bits were left after reading it.

I'm not clear on what data types OceanBase supports... But I think as a database there should be a feature that allows you to set how many decimal places to keep (like MySQL)

If not, you can disable this assert by overriding assertEmbedding() and return false.

Additionally, I have found that the similarity cannot be achieved

Please check your similarity calculation, all score in EmbeddingMatch is based on the relevance score (derivative of cosine distance), you can check RelevanceScore and CosineSimilarity for reference.

@SimonChou12138
Copy link
Author

okk, let me see see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request P3 Medium priority theme: embedding store Issues/PRs related to embedding store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support OceanBase Embedding Store

3 participants