- 
                Notifications
    You must be signed in to change notification settings 
- Fork 65
[FEATURE] Support OceanBase Embedding Store #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| @dliubarskyi @Martin7-1 Hello guy, sorry for the long wait. I have migrated the vector storage integration of Oceanbase over. Please help review it | 
| @SimonChou12138 Thank you for your contribution. Please run  | 
| It's able to add unit test for SQLFilter? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SimonChou12138 Thank you! Just a quick review :)
| <dependency> | ||
| <groupId>dev.langchain4j</groupId> | ||
| <artifactId>langchain4j-core</artifactId> | ||
| <version>1.1.0-SNAPSHOT</version> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <version>1.1.0-SNAPSHOT</version> | |
| <version>${langchain4j.core.version}</version> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it~
|  | ||
| <dependency> | ||
| <groupId>ch.qos.logback</groupId> | ||
| <artifactId>logback-classic</artifactId> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be better to use tinylog for test to keep consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it~
| statement.addBatch("DROP TABLE IF EXISTS " + name); | ||
| } | ||
|  | ||
| StringBuilder createTableSql = new StringBuilder(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be better to use [text block] and format it by String.format(). like this:
            statement.addBatch(String.format("""
                            CREATE TABLE IF NOT EXISTS %s (
                                %s VARCHAR(36) NOT NULL,
                                %s VECTOR(%d),
                                %s VARCHAR(4000),
                                %s JSON,
                                PRIMARY KEY (%s),
                                VECTOR INDEX %s(%s) WITH (
                                    distance=%s,
                                    type=%s
                                )
                            )
                            """,
                    name,
                    idColumn,
                    embeddingColumn, vectorDimension,
                    textColumn,
                    metadataColumn,
                    idColumn,
                    vectorIndexName, embeddingColumn,
                    distanceMetric, indexType
            ));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okkk
| private String metadataColumn = "metadata"; | ||
| private int vectorDimension = 1536; | ||
| private String vectorIndexName = "idx_vector"; | ||
| private String distanceMetric = "L2"; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, all scores in EmbeddingStore represent cosine similarity based score. I'm not sure if it works in L2 distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I can use cosine as the default value
| * <code>true</code> if search should use an exact search, or <code>false</code> if | ||
| * it should use approximate search. | ||
| */ | ||
| private final boolean isExactSearch; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it can be a local-variable in ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep it as a member variable for clarity and consistency with other config fields like dataSource and table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
| * @param sqlException SQLException to transform. Not null. | ||
| * @return A RuntimeException that wraps the SQLException. | ||
| */ | ||
| private static RuntimeException uncheckSQLException(SQLException sqlException) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A best practice is to define a new Exception and extend LangChain4jException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it~
| * @return The item at the given index. | ||
| * @throws IllegalArgumentException If the item is null. | ||
| */ | ||
| private static <T> T ensureIndexNotNull(List<T> list, int index, String name) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great utility method! I think we can move it to a seperate utility class. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okkk,I will move it~
| return item; | ||
| } | ||
|  | ||
| private static String ensureNotEmpty(String value, String name) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| } | ||
|  | ||
| @Override | ||
| public List<String> generateIds(final int n) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's the same as parent's method. Is it any other intention to override it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, it was just a misunderstanding. I didn't pay attention and implemented too much
| ensureNotNull(embeddings, "embeddings"); | ||
|  | ||
| if (embeddings.isEmpty()) { | ||
| return Collections.emptyList(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return Collections.emptyList(); | |
| return List.of(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okk~
| 
 Yup | 
| Maybe we can add a new spring-boot-starter for oceanbase. | 
| 
 I think we can bring up a new PR implementation when we have time later | 
| Looks like there are some ITs failed. Also, please fix the issue of  | 
| 
 @Martin7-1 ok, I will check the skipCompliance plugin, and the other itcast failure seems to be due to testContainer startup failure, which may be caused by network issues. Just trigger ci again and it will be fine | 
| 
 | 
| @SimonChou12138 Any updates? | 
| 
 I'm sorry I've been a bit busy recently. I'll implement this when I have time | 
| 
 @Martin7-1 Can you help me take a look together? Additionally, I have found that the similarity cannot be achieved It means that my deletion of extends EmbeddingStoreWithRemovalIT is normal | 
| 
 I'm not clear on what data types OceanBase supports... But I think as a database there should be a feature that allows you to set how many decimal places to keep (like MySQL) If not, you can disable this assert by overriding  
 Please check your similarity calculation, all score in  | 
| okk, let me see see | 



Issue
Closes #257
Change
My modifications mainly include adding the langchain4j-oceanbase module, implementing the EmbeddingStore interface, and adding test cases and documentation
General checklist
Checklist for adding new maven module
pom.xmlandlangchain4j-bom/pom.xmlChecklist for adding new embedding store integration
{NameOfIntegration}EmbeddingStoreITthat extends from eitherEmbeddingStoreITorEmbeddingStoreWithFilteringIT{NameOfIntegration}EmbeddingStoreRemovalITthat extends fromEmbeddingStoreWithRemovalITChecklist for changing existing embedding store integration
{NameOfIntegration}EmbeddingStoreworks correctly with the data persisted using the latest released version of LangChain4j