-
Notifications
You must be signed in to change notification settings - Fork 158
fix: added retry logic for SQL DB connection, updated agent and thread management #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
feat: Psl thread management
…with exponential backoff (#623)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances system reliability by implementing retry mechanisms for database connections and API calls, while also improving resource uniqueness and optimizing containerized deployment configuration.
- Added exponential backoff retry logic for SQL database connections with up to 5 attempts
- Implemented retry mechanism for frontend API calls with timeout and error handling
- Enhanced resource uniqueness by including resource group name in identifier calculation
- Reduced Docker container workers from 4 to 1 for optimized resource usage
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/App/backend/services/sqldb_service.py | Added retry logic with exponential backoff for database connections |
| src/App/frontend/src/api/api.ts | Implemented retry mechanism and timeout for user API calls |
| src/App/WebApp.Dockerfile | Reduced uvicorn workers from 4 to 1 |
| infra/main.bicep | Enhanced uniqueId calculation to include resource group name |
| try: | ||
| conn = pyodbc.connect( | ||
| f"DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password}", | ||
| timeout=5, | ||
| ) | ||
| logging.info("Connected using Username & Password") | ||
| return conn | ||
| except pyodbc.Error as e: | ||
| logging.error(f"Failed with Username & Password: {str(e)}") | ||
|
|
||
| if attempt < max_retries - 1: | ||
| logging.info(f"Retrying in {retry_delay} seconds...") | ||
| time.sleep(retry_delay) | ||
| retry_delay *= 2 # Exponential backoff | ||
| else: | ||
| raise e | ||
|
|
||
|
|
Copilot
AI
Jul 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested try-catch blocks create complex control flow that's difficult to follow. Consider extracting the password-based connection attempt into a separate function or restructuring the retry logic to handle both authentication methods in a cleaner way.
| try: | |
| conn = pyodbc.connect( | |
| f"DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password}", | |
| timeout=5, | |
| ) | |
| logging.info("Connected using Username & Password") | |
| return conn | |
| except pyodbc.Error as e: | |
| logging.error(f"Failed with Username & Password: {str(e)}") | |
| if attempt < max_retries - 1: | |
| logging.info(f"Retrying in {retry_delay} seconds...") | |
| time.sleep(retry_delay) | |
| retry_delay *= 2 # Exponential backoff | |
| else: | |
| raise e | |
| try: | |
| conn = connect_with_password() | |
| logging.info("Connected using Username & Password") | |
| return conn | |
| except pyodbc.Error as e: | |
| logging.error(f"Failed with Username & Password: {str(e)}") | |
| if attempt < max_retries - 1: | |
| logging.info(f"Retrying in {retry_delay} seconds...") | |
| time.sleep(retry_delay) | |
| retry_delay *= 2 # Exponential backoff | |
| else: | |
| raise e |
| except pyodbc.Error as e: | ||
| logging.error(f"Failed with Username & Password: {str(e)}") | ||
|
|
||
| if attempt < max_retries - 1: | ||
| logging.info(f"Retrying in {retry_delay} seconds...") | ||
| time.sleep(retry_delay) | ||
| retry_delay *= 2 # Exponential backoff | ||
| else: | ||
| raise e |
Copilot
AI
Jul 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception being raised here (e) is from the inner try-catch block (Username & Password authentication failure), but if the Default Credential also failed, that error information is lost. Consider raising a more comprehensive error that includes both failure modes or the original Default Credential error.
| except pyodbc.Error as e: | |
| logging.error(f"Failed with Username & Password: {str(e)}") | |
| if attempt < max_retries - 1: | |
| logging.info(f"Retrying in {retry_delay} seconds...") | |
| time.sleep(retry_delay) | |
| retry_delay *= 2 # Exponential backoff | |
| else: | |
| raise e | |
| except pyodbc.Error as username_password_error: | |
| logging.error(f"Failed with Username & Password: {str(username_password_error)}") | |
| if attempt < max_retries - 1: | |
| logging.info(f"Retrying in {retry_delay} seconds...") | |
| time.sleep(retry_delay) | |
| retry_delay *= 2 # Exponential backoff | |
| else: | |
| raise RuntimeError( | |
| f"Connection failed after multiple attempts. " | |
| f"Default Credential error: {str(default_credential_error)}; " | |
| f"Username & Password error: {str(username_password_error)}" | |
| ) |
| if (attempt < maxRetries && | ||
| error instanceof Error) { |
Copilot
AI
Jul 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry logic only retries on Error instances, but AbortSignal.timeout errors may not always be Error instances. Consider checking for specific error types (like AbortError for timeouts) or removing the instanceof Error check to retry on all errors.
| if (attempt < maxRetries && | |
| error instanceof Error) { | |
| if (attempt < maxRetries) { |
* working agent management * fix pylint * fix test cases * fix pylint --------- Co-authored-by: Shreyas-Microsoft <v-swaikar@microsft.com>
fix: Refactor Azure Authentication and Update Infra Config
|
🎉 This PR is included in version 1.8.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Purpose
This pull request introduces several changes across different parts of the codebase, focusing on improving retry mechanisms, enhancing resource uniqueness, and optimizing configurations. Below are the most important changes grouped by theme:
Retry Mechanisms and Error Handling:
get_connectionfunction insrc/App/backend/services/sqldb_service.py, allowing up to 5 retries with increasing delays when connecting to the database fails. [1] [2]getUsersfunction insrc/App/frontend/src/api/api.tsto handle transient errors during API calls.Resource Uniqueness:
uniqueIdvariable ininfra/main.bicepto includeresourceGroup().namein its calculation, ensuring more unique resource identifiers.Configuration Changes:
CMDinstruction ofsrc/App/WebApp.Dockerfilefrom 4 to 1, likely to optimize resource usage in the containerized environment.Codebase Enhancements:
sleeputility function tosrc/App/frontend/src/api/api.tsfor implementing delays in retry logic.timemodule insrc/App/backend/services/sqldb_service.pyto support the retry mechanism with exponential backoff.Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information