-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: add Azure embedder support and update configuration handling #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add Azure embedder support and update configuration handling #431
Conversation
Summary of ChangesHello @gustavoreimche, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the system's integration capabilities by adding full support for Azure AI as an embedding provider. It also improves the efficiency of API key management by making Google Generative AI configuration conditional on its active use. A minor adjustment to the maximum allowed token size for requests further refines the system's operational parameters. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for an Azure embedder by adding its configuration to embedder.json, integrating it into the embedder loading and type detection logic within api/config.py, and enabling its use in api/tools/embedder.py. The api/main.py file was refactored to conditionally configure the Google API key only if Google is the default provider, removing a previous general environment variable check. Additionally, the recommended token limit for requests in api/websocket_wiki.py was increased from 8000 to 9000. Review comments suggest refactoring the is_azure_embedder function in api/config.py to use a shared helper to reduce code duplication, and similarly, refactoring the get_embedder_type function's if/elif chain into a dictionary for better scalability. Another comment points out a misleading comment in api/main.py regarding Azure environment variables, and finally, it's recommended to define the new token limit of 9000 in api/websocket_wiki.py as a module-level constant to avoid magic numbers.
| def is_azure_embedder(): | ||
| """ | ||
| Check if the current embedder configuration uses AzureAIClient. | ||
| Returns: | ||
| bool: True if using AzureAIClient, False otherwise | ||
| """ | ||
| embedder_config = get_embedder_config() | ||
| if not embedder_config: | ||
| return False | ||
|
|
||
| model_client = embedder_config.get("model_client") | ||
| if model_client: | ||
| return model_client.__name__ == "AzureAIClient" | ||
|
|
||
| client_class = embedder_config.get("client_class", "") | ||
| return client_class == "AzureAIClient" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new function is_azure_embedder duplicates the logic found in is_ollama_embedder, is_google_embedder, and is_bedrock_embedder. To improve maintainability and reduce code duplication, consider creating a private helper function that takes the client name as an argument. This would centralize the checking logic.
For example, you could have a helper:
def _is_embedder_of_type(client_name: str) -> bool:
embedder_config = get_embedder_config()
if not embedder_config:
return False
model_client = embedder_config.get("model_client")
if model_client:
return model_client.__name__ == client_name
client_class = embedder_config.get("client_class", "")
return client_class == client_nameThen this function could be simplified to return _is_embedder_of_type("AzureAIClient").
| if tokens > 9000: | ||
| logger.warning(f"Request exceeds recommended token limit ({tokens} > 9000)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token limit 9000 is used as a magic number in both the condition and the log message. To improve maintainability and avoid potential inconsistencies (like the one fixed in this change), consider defining this value as a constant at the module level (e.g., REQUEST_TOKEN_LIMIT = 9000) and referencing it in both places.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
No description provided.