Skip to content

Conversation

wstcliyu
Copy link
Collaborator

Fixes / Features

  • Add tpu resouce flavor to kueue config. AxLearn workloads request cpu and memory as well as tpu. So xpk needs to add cpu and memory to tpu flavor for AxLearn workload to be accepted by kueue.

Testing / Documentation

Testing details.

  • [ y ] Tests pass
  • [ y ] Appropriate changes to documentation are included in the PR

@SujeethJinesh
Copy link
Collaborator

This looks good to me, but I wanted to check one thing, would you be able to confirm that a CPU pod doesn't land on a TPU nodepool? May need to add tolerations for this in XPK, but not sure. You can test this by creating a simple cluster (or using an existing cluster) that does not have any CPU nodepools, and launch a cpu only jobset on it and confirm that it's unschedulable.

@wstcliyu wstcliyu force-pushed the wstcliyu/kueue-tpu-flavor branch 4 times, most recently from 937615d to 1bb60af Compare July 17, 2025 20:53
nominalQuota: 2000G"""
if args.enable_pathways:
return resources_yaml
# resources_yaml = """- coveredResources: ["cpu", "memory"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd inline this function into the above.

@wstcliyu wstcliyu force-pushed the wstcliyu/kueue-tpu-flavor branch from 95d7ddb to 832f82e Compare July 23, 2025 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants