-
Notifications
You must be signed in to change notification settings - Fork 868
refactor: [shard-distributor]Handle context.Cancelled errors #7493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
07c98ce
6b28caa
a68b74f
b94764d
af3b6fa
8ff8c25
61d9aac
ffa7b1d
fcf94ce
cf23b9f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -233,8 +233,14 @@ func (s *executorStoreImpl) GetState(ctx context.Context, namespace string) (*st | |
| executorPrefix := etcdkeys.BuildExecutorsPrefix(s.prefix, namespace) | ||
| resp, err := s.client.Get(ctx, executorPrefix, clientv3.WithPrefix()) | ||
| if err != nil { | ||
| if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | ||
| return nil, ctx.Err() | ||
| } | ||
|
||
| return nil, fmt.Errorf("get executor data: %w", err) | ||
| } | ||
| if ctxErr := ctx.Err(); errors.Is(ctxErr, context.Canceled) || errors.Is(ctxErr, context.DeadlineExceeded) { | ||
| return nil, ctxErr | ||
| } | ||
|
||
|
|
||
| for _, kv := range resp.Kvs { | ||
| key := string(kv.Key) | ||
|
|
@@ -276,8 +282,14 @@ func (s *executorStoreImpl) GetState(ctx context.Context, namespace string) (*st | |
| shardsPrefix := etcdkeys.BuildShardsPrefix(s.prefix, namespace) | ||
| shardResp, err := s.client.Get(ctx, shardsPrefix, clientv3.WithPrefix()) | ||
| if err != nil { | ||
| if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | ||
| return nil, ctx.Err() | ||
| } | ||
| return nil, fmt.Errorf("get shard data: %w", err) | ||
| } | ||
| if ctxErr := ctx.Err(); errors.Is(ctxErr, context.Canceled) || errors.Is(ctxErr, context.DeadlineExceeded) { | ||
| return nil, ctxErr | ||
| } | ||
| for _, kv := range shardResp.Kvs { | ||
| shardID, shardKeyType, err := etcdkeys.ParseShardKey(s.prefix, namespace, string(kv.Key)) | ||
| if err != nil { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expected to be caught in
case <-ctx.Done():instead.Which means, if ctx will be Done during the
GetStatewe won't seeShard stats cleanup loop cancelled., which will is very unexpected.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree it’s a bit counterintuitive at first glance.
In practice the case <-ctx.Done() only fires if the cancellation happens before we enter the branch. In the noisy logs we’re seeing the ticker fire, we drop into the cleanup work, and then the context gets cancelled while GetState is still in flight.
At that point we’re already executing the branch, so the select won’t re-evaluate
the only place we can notice the cancellation is via the error returned from the store.
I added the inline guard so that path exits quietly instead of logging "get executor data: context canceled" since the message is expected any time leadership or shutdown interrupts the iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how it works technically. My point is - this can lead to absense of (which looks like an important) log message of shutting down corresponding to "starting".
We better to preserve that, if it's important. Maybe by logging this outside of select in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we can either log it also in this case, just to make sure we are not missing the implementation, or go to the loop again, in this case there is a risk that we never hit the first case , it should be minimal but exists.