Skip to content

sql/pgwire: check context cancellation in serveImpl #158196

@spilchen

Description

@spilchen

During node drain, every pgwire session’s context is canceled once server.shutdown.query_wait expires. In pkg/sql/pgwire/server.go (Server.serveImpl), we currently exit the client message loop only when:

  1. the network read returns a closed-connection error
  2. the returned err is (or maps to) context.Canceled
  3. repeatedErrorCount exceeds maxRepeatedErrorCount (currently 256, but 32 768 on older builds)

Note that case (2) doesn’t check the context directly. It depends on a called function to surface a context.Canceled error.

If a client keeps sending invalid SQL and we continue reading new requests successfully, the loop stays in the repeated-error path until maxRepeatedErrorCount is reached, even if the session’s context has already been canceled by drain. As a result, drain may log "some sessions did not respond to cancellation within 1s" and fail.

We should explicitly check for cancellation in ctx inside the loop. Two possible approaches:

  • Check ctx.Err() at the end of each iteration and break immediately if non-nil.
  • When incrementing repeatedErrorCount, also check ctx.Err() and exit early if canceled.

Jira issue: CRDB-57157

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)branch-release-25.2branch-release-25.3Used to mark GA and release blockers and technical advisories for 25.3branch-release-25.4Used to mark GA and release blockers and technical advisories for 25.4target-release-26.1.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions