Skip to content

Conversation

@jakobht
Copy link
Member

@jakobht jakobht commented Nov 28, 2025

What changed?
The ephemeral shard creator now pings shard owners immediately after creation to verify end-to-end functionality.

This also wires up the spectator, and dispatcher and fixed namespace pinger.

Why?
This adds canary verification that validates:

  • Shards are created successfully
  • Owners can be reached via gRPC
  • Owners actually own the assigned shards

Previously, the shard creator only verified that GetShardOwner returned successfully, but didn't verify that the executor was actually reachable or owned the shard.

How did you test it?
Unit tests

Potential risks
Low risk - this only affects the canary test environment and adds verification without changing core shard creation logic.

Documentation

@jakobht jakobht marked this pull request as draft November 28, 2025 11:02
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch 3 times, most recently from a043f8b to f9d6569 Compare November 28, 2025 12:03
@jakobht jakobht changed the title Add ping verification to ephemeral shard creator feat(shard-distributor): Add ping verification to ephemeral shard creator Nov 28, 2025
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch 7 times, most recently from 0046451 to 9353d41 Compare December 2, 2025 13:47
After creating a shard, the ephemeral shard creator now pings the owner
to verify that:
1. The shard was created successfully
2. The owner can be reached via gRPC
3. The owner actually owns the shard

This provides end-to-end validation of the shard creation and routing
mechanisms in the canary test environment.

Changes:
- Switch from using ShardDistributor client to using Spectators
- Add pingShardOwner() method that sends canary ping after creation
- Verify executor ID and ownership in ping response
- Log warnings if executor ID mismatches or ownership is incorrect
- Add test coverage for ping verification flow
- Use refactored *spectatorclient.Spectators type

Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
Integrate all canary components into the module:
- Create SpectatorPeerChooser and manage its lifecycle
- Provide canary client using the dispatcher
- Create and start the pinger component
- Register ping handler as a YARPC server
- Wire spectators to peer chooser on startup

This connects all the pieces needed for executor-to-executor
canary testing via ping/pong requests.

Dependencies:
- Requires SpectatorPeerChooser
- Requires PingHandler
- Requires Pinger component

Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
Signed-off-by: Jakob Haahr Taankvist <jht@uber.com>
@jakobht jakobht force-pushed the add-ephemeral-shard-ping branch from 3efbb33 to 150093f Compare December 2, 2025 17:05
@jakobht jakobht marked this pull request as ready for review December 3, 2025 05:36

fx.Provide(
func(t *grpc.Transport) peer.Transport { return t },
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have transport in fx.supply, do we need this? it seems redundant

}),

fx.Provide(
func(t *grpc.Transport) peer.Transport { return t },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have transport in fx.supply, do we need this? it seems redundant


for executorID, state := range namespaceState.Executors {
if now.Sub(state.LastHeartbeat) > p.cfg.HeartbeatTTL {
p.logger.Info("Executor has not reported a heartbeat recently", tag.ShardExecutor(executorID), tag.ShardNamespace(p.namespaceCfg.Name), tag.Value(state.LastHeartbeat))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@jakobht jakobht merged commit c268569 into cadence-workflow:master Dec 5, 2025
58 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants