-
Notifications
You must be signed in to change notification settings - Fork 620
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I searched the issues and found no similar issues.
KubeRay Component
kubectl-plugin
What happened + What you expected to happen
- The kubectl-ray plugin currently sets
spec.jobId
with a separate GET -> UPDATE after creating the RayJob. This can race with the kuberay operator's status updates and cause errors as:
Error: Error occurred when trying to add job ID to RayJob: Operation cannot be fulfilled on rayjobs.ray.io "27e93fb9": the object has been modified; please apply your changes to the latest version and try again
- The job ID can be generated and set before create / apply to avoid the conflict. Additionally, ray job submit is started asynchronously in a goroutine, which can lead to
exec: not started
; starting the process synchronously resolves this.
Reproduction script
Add a small sleep between GET and UPDATE:
options.RayJob, err = k8sClients.RayClient().RayV1().RayJobs(options.namespace).Get(ctx, options.RayJob.GetName(), v1.GetOptions{})
if err != nil {
return fmt.Errorf("Failed to get latest version of Ray job: %w", err)
}
options.RayJob.Spec.JobId = rayJobID
time.Sleep(30 * time.Second)
_, err = k8sClients.RayClient().RayV1().RayJobs(options.namespace).Update(ctx, options.RayJob, v1.UpdateOptions{FieldManager: util.FieldManager})
if err != nil {
return fmt.Errorf("Error occurred when trying to add job ID to RayJob: %w", err)
}
and then patch the RayJob in that sleep window:
kubectl patch rayjob -n ray-job 6d3ef4a3 \
--type='merge' \
-p '{"metadata":{"annotations":{"repro/tick":"'$(date +%s%N)'"}}}' \
>/dev/null
Anything else
Happens regularly.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working