Skip to content

Implement Enhanced SDK Logging to Debug Consensus Node Timeouts on Mainnet #4576

@quiet-node

Description

@quiet-node

Problem Statement

Production mainnet is experiencing Consensus Node (CN) timeouts when SDK calls are made from the Relay, but investigation is severely limited due to insufficient debug-level logging at the SDK-Relay boundary.

Current Behavior

  • SDK timeouts occur after 10 seconds (SDK_REQUEST_TIMEOUT default: 10000ms)
  • Upon timeout, 10 retries are attempted before final failure
  • Relay lacks debug/trace logging to identify timeout root causes
  • No visibility into which methods trigger timeouts or frequency patterns

Technical Details

Current Timeout Configuration:

  • SDK_REQUEST_TIMEOUT: 10000ms (in packages/config-service/src/services/globalConfig.ts)
  • CONSENSUS_MAX_EXECUTION_TIME: 15000ms (set via client.setMaxExecutionTime())
  • Timeout occurs in Executable.js at SDK level

Logging Gap:

  • Production Relay operates with global default log level info
  • SDK has trace and debug levels but inherits no logging from Relay
  • No SDK connection logging for timeout analysis

Solution

Implement configurable SDK logging by:

  1. Add SDK_LOG_LEVEL Environment Variable

    • New SDK_LOG_LEVEL config option (default: info to match global level)
    • Allows independent control of SDK logging without affecting Relay global log level
    • Supports all standard log levels: trace, debug, info, warn, error
  2. Configure SDK Logger Inheritance

    • Use SDK.setLogger() to inherit logger from Relay
    • Create child logger with SDK_LOG_LEVEL configuration
    • Enable granular SDK logging when needed for debugging (e.g., SDK_LOG_LEVEL=debug)
  3. Add Timeout-Specific Logging

    • Log SDK connection status and timeout events
    • Track method-specific timeout patterns
    • Add debug traces at SDK-Relay boundary

Acceptance Criteria

  • SDK_LOG_LEVEL environment variable added to config service
  • SDK logger inherits from Relay logger with SDK_LOG_LEVEL configuration
  • SDK logging level configurable independently of global Relay log level
  • Timeout events logged with method context and timing information when debug enabled
  • SDK connection status monitoring implemented
  • Solution enables investigation of mainnet timeout patterns through configurable logging

Impact

This enhancement provides configurable debugging capability to investigate CN timeout issues on mainnet without affecting production performance when debug logging is disabled.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions