fix: added retry logic for SQL DB connection, updated agent and thread management #625

Harsh-Microsoft · 2025-07-29T07:48:48Z

Purpose

This pull request introduces several changes across different parts of the codebase, focusing on improving retry mechanisms, enhancing resource uniqueness, and optimizing configurations. Below are the most important changes grouped by theme:

Retry Mechanisms and Error Handling:

Added a retry mechanism with exponential backoff to the get_connection function in src/App/backend/services/sqldb_service.py, allowing up to 5 retries with increasing delays when connecting to the database fails. [1] [2]
Implemented a retry mechanism with a single retry and a 5-second delay in the getUsers function in src/App/frontend/src/api/api.ts to handle transient errors during API calls.

Resource Uniqueness:

Updated the uniqueId variable in infra/main.bicep to include resourceGroup().name in its calculation, ensuring more unique resource identifiers.

Configuration Changes:

Reduced the number of workers in the CMD instruction of src/App/WebApp.Dockerfile from 4 to 1, likely to optimize resource usage in the containerized environment.

Codebase Enhancements:

Added a sleep utility function to src/App/frontend/src/api/api.ts for implementing delays in retry logic.
Imported the time module in src/App/backend/services/sqldb_service.py to support the retry mechanism with exponential backoff.

Does this introduce a breaking change?

Yes
No

Golden Path Validation

I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Verify that the following are valid

...

Other Information

#618)

feat: Psl thread management

…with exponential backoff (#623)

Copilot

Pull Request Overview

This PR enhances system reliability by implementing retry mechanisms for database connections and API calls, while also improving resource uniqueness and optimizing containerized deployment configuration.

Added exponential backoff retry logic for SQL database connections with up to 5 attempts
Implemented retry mechanism for frontend API calls with timeout and error handling
Enhanced resource uniqueness by including resource group name in identifier calculation
Reduced Docker container workers from 4 to 1 for optimized resource usage

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/App/backend/services/sqldb_service.py	Added retry logic with exponential backoff for database connections
src/App/frontend/src/api/api.ts	Implemented retry mechanism and timeout for user API calls
src/App/WebApp.Dockerfile	Reduced uvicorn workers from 4 to 1
infra/main.bicep	Enhanced uniqueId calculation to include resource group name

Copilot · 2025-07-29T07:49:57Z

src/App/backend/services/sqldb_service.py

+            try:
+                conn = pyodbc.connect(
+                    f"DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password}",
+                    timeout=5,
+                )
+                logging.info("Connected using Username & Password")
+                return conn
+            except pyodbc.Error as e:
+                logging.error(f"Failed with Username & Password: {str(e)}")
+
+                if attempt < max_retries - 1:
+                    logging.info(f"Retrying in {retry_delay} seconds...")
+                    time.sleep(retry_delay)
+                    retry_delay *= 2  # Exponential backoff
+                else:
+                    raise e




The nested try-catch blocks create complex control flow that's difficult to follow. Consider extracting the password-based connection attempt into a separate function or restructuring the retry logic to handle both authentication methods in a cleaner way.

Suggested change

try:

conn = pyodbc.connect(

f"DRIVER={driver};SERVER={server};DATABASE={database};UID={username};PWD={password}",

timeout=5,

)

logging.info("Connected using Username & Password")

return conn

except pyodbc.Error as e:

logging.error(f"Failed with Username & Password: {str(e)}")

if attempt < max_retries - 1:

logging.info(f"Retrying in {retry_delay} seconds...")

time.sleep(retry_delay)

retry_delay *= 2 # Exponential backoff

else:

raise e

try:

conn = connect_with_password()

logging.info("Connected using Username & Password")

return conn

except pyodbc.Error as e:

logging.error(f"Failed with Username & Password: {str(e)}")

if attempt < max_retries - 1:

logging.info(f"Retrying in {retry_delay} seconds...")

time.sleep(retry_delay)

retry_delay *= 2 # Exponential backoff

else:

raise e

Copilot · 2025-07-29T07:49:57Z

src/App/backend/services/sqldb_service.py

+            except pyodbc.Error as e:
+                logging.error(f"Failed with Username & Password: {str(e)}")
+
+                if attempt < max_retries - 1:
+                    logging.info(f"Retrying in {retry_delay} seconds...")
+                    time.sleep(retry_delay)
+                    retry_delay *= 2  # Exponential backoff
+                else:
+                    raise e


The exception being raised here (e) is from the inner try-catch block (Username & Password authentication failure), but if the Default Credential also failed, that error information is lost. Consider raising a more comprehensive error that includes both failure modes or the original Default Credential error.

Suggested change

except pyodbc.Error as e:

logging.error(f"Failed with Username & Password: {str(e)}")

if attempt < max_retries - 1:

logging.info(f"Retrying in {retry_delay} seconds...")

time.sleep(retry_delay)

retry_delay *= 2 # Exponential backoff

else:

raise e

except pyodbc.Error as username_password_error:

logging.error(f"Failed with Username & Password: {str(username_password_error)}")

if attempt < max_retries - 1:

logging.info(f"Retrying in {retry_delay} seconds...")

time.sleep(retry_delay)

retry_delay *= 2 # Exponential backoff

else:

raise RuntimeError(

f"Connection failed after multiple attempts. "

f"Default Credential error: {str(default_credential_error)}; "

f"Username & Password error: {str(username_password_error)}"

)

Copilot · 2025-07-29T07:49:57Z

src/App/frontend/src/api/api.ts

+      if (attempt < maxRetries && 
+          error instanceof Error) {


The retry logic only retries on Error instances, but AbortSignal.timeout errors may not always be Error instances. Consider checking for specific error types (like AbortError for timeouts) or removing the instanceof Error check to retry on all errors.

Suggested change

if (attempt < maxRetries &&

error instanceof Error) {

if (attempt < maxRetries) {

* working agent management * fix pylint * fix test cases * fix pylint --------- Co-authored-by: Shreyas-Microsoft <v-swaikar@microsft.com>

fix: Refactor Azure Authentication and Update Infra Config

github-actions · 2025-07-31T16:51:21Z

🎉 This PR is included in version 1.8.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Shreyas-Microsoft and others added 7 commits July 21, 2025 16:27

remove worker from dockerfile

4aa09e8

agents from chat service

f2da255

fixed opent telemetry issue CustomDomainInUse, FlagMustBeSetForRestore (

4603e3c

#618)

cleanup duplicate agents

3c51102

pylint fix

4145ad0

Merge pull request #620 from microsoft/psl-thread-management

2a7aa1b

feat: Psl thread management

fix: Implement retry logic for database connection and user fetching …

61b4421

…with exponential backoff (#623)

Copilot AI review requested due to automatic review settings July 29, 2025 07:48

Harsh-Microsoft requested review from Avijit-Microsoft, Prajwal-Microsoft, Roopan-Microsoft, Vinay-Microsoft and aniaroramsft as code owners July 29, 2025 07:48

Copilot AI reviewed Jul 29, 2025

View reviewed changes

feat: working agent management (#626)

4f6e907

* working agent management * fix pylint * fix test cases * fix pylint --------- Co-authored-by: Shreyas-Microsoft <v-swaikar@microsft.com>

Roopan-Microsoft temporarily deployed to production July 29, 2025 13:00 — with GitHub Actions Inactive

Rafi-Microsoft and others added 6 commits July 30, 2025 20:01

sfi changes v1

c1d8230

sfi changes v2

4772ad3

sfi changes v3

2088121

sfi changes v4

2cdbaf9

updated scripts to azureclicreds

8b5e6d2

fix: Refactor Azure Authentication and Update Infra Config

e96b8bf

Avijit-Microsoft temporarily deployed to production July 30, 2025 17:57 — with GitHub Actions Inactive

Rafi-Microsoft and others added 3 commits July 31, 2025 16:51

sfi changes v5

2506157

sfi changes v2

5edf74b

Merge pull request #631 from microsoft/psl-sfi-changesr2

53dd2a8

fix: Refactor Azure Authentication and Update Infra Config

Prajwal-Microsoft temporarily deployed to production July 31, 2025 11:34 — with GitHub Actions Inactive

Prajwal-Microsoft approved these changes Jul 31, 2025

View reviewed changes

Vinay-Microsoft approved these changes Jul 31, 2025

View reviewed changes

Prajwal-Microsoft merged commit e558c65 into main Jul 31, 2025
22 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: added retry logic for SQL DB connection, updated agent and thread management #625

fix: added retry logic for SQL DB connection, updated agent and thread management #625

Uh oh!

Harsh-Microsoft commented Jul 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 29, 2025

Uh oh!

Copilot AI Jul 29, 2025

Uh oh!

Copilot AI Jul 29, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

	if (attempt < maxRetries &&
	error instanceof Error) {
	if (attempt < maxRetries) {

fix: added retry logic for SQL DB connection, updated agent and thread management #625

fix: added retry logic for SQL DB connection, updated agent and thread management #625

Uh oh!

Conversation

Harsh-Microsoft commented Jul 29, 2025

Purpose

Retry Mechanisms and Error Handling:

Resource Uniqueness:

Configuration Changes:

Codebase Enhancements:

Does this introduce a breaking change?

Golden Path Validation

Deployment Validation

What to Check

Other Information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants