-
Notifications
You must be signed in to change notification settings - Fork 147
feat: list all registered schedulers (#1009) #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
could you provide more context to what you want to achieve with this? |
|
All the details are in the linked #1009, @kiukchung. Please let me know if more details are needed there. |
|
Hi @clumsy thanks for the pointer. Could you describe your use-case in wanting the list of supported schedulers offered to your users to be dynamic? Usually torchx users want to control the schedulers they configure for their users. |
|
Sure, @kiukchung Take NeMo for example, NVidia ships it with all dependencies, including Unfortunately nemo-run unconditionally registers custom schedulers: https://github.com/NVIDIA/NeMo-Run/blob/main/pyproject.toml#L43-L48 Thus we cannot use It makes sense to have a feature to restrict the available schedulers, but does it have to be the default one? |
|
@clumsy ah that's an interesting edge-case. What you basically want is for One thing to note about We could do something like: If you're open to it, you can add support for prefixes in and make a change in There's some interesting cases regarding name conflicts and ordering (e.g. if nemo registers a scheduler with the same name as the one somewhere else what do you do?) |
|
Thanks, @kiukchung! The proposed solution works for me and I can provide the implementation shortly. |
|
Please check this implementation, @kiukchung , @andywag, @d4l3k, @tonykao8080 The existing commands continue to work, e.g.:
|
|
@kiukchung has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
The failure seems unrelated to the PR: |
|
Hi @kiukchung, should I do a bogus push to trigger new test run? The last failure seem to have been unrelated. |
|
Hi @kiukchung does this PR need more work or is there anything I can do to help integrate the change? |
|
Sorry for bugging, but could you please let me know if I can help getting this one merged, @kiukchung , @d4l3k , @tonykao8080 , @andywag ? Thanks! |
57ec248 to
8f0c6b0
Compare
|
I just realized this was never merged. Does it need more work, @kiukchung? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1050 +/- ##
=======================================
Coverage 91.61% 91.61%
=======================================
Files 84 84
Lines 6583 6583
=======================================
Hits 6031 6031
Misses 552 552
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8f0c6b0 to
36e77f7
Compare
|
Hm, not sure why I wonder if it's related to the new permissions. I rebased on top of upstream main and pushed to my fork like I always do :/ |
|
Is there anything I can do here, @kiukchung? |
|
Please have a look when you get a chance, @kiukchung |
@clumsy sorry I didn't get to this sooner! Will take a closer look tomorrow morning. Just want to make sure everything is BC! |
|
Appreciate your help, @kiukchung! |
e9891d2 to
cfa412b
Compare
cfa412b to
05924eb
Compare
|
Fixed the BC issues in the new revision and added more unit tests @kiukchung |
| schedulers = load_group(group, default={}) | ||
| if not skip_defaults: | ||
| for name, path in DEFAULT_SCHEDULER_MODULES.items(): | ||
| if name not in schedulers: | ||
| schedulers[name] = _defer_load_scheduler(path) | ||
| return schedulers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for addressing one of the BC incompatibilities. But I think the other one still remains. get_scheduler_factories(skip_defaults=False) now returns BOTH my custom [torchx.schedulers] mappings AND the DEFAULT_SCHEDULER_MODULES which is not ideal for users who expect ONLY my custom schedulers to show. This is important since the default schedulers may not even be available for the users to use.
Unlike named_resources, scheduler_factories is not additive. We should've kept the same behavior in both (either additive or not).
skip_defaults argument in get_scheduler_factories() means "even if you don't find anything in entry-points, do not return schedulers in DEFAULT_SCHEDULER_MODULES. I think the confusion is that DEFAULT_SCHEDULER_MODULES is a badly worded constant. What it "should've been called are BUILTIN_SCHEDULER_MODULES (aka the ones that torchx has "builtin" support for) and skip_defaults should've been called skip_builtins.
If the motivation here is for the CLI to expose custom + builtins such that builtins is always in-sync with any new schedulers added to torchx, we are currently in a mode where we are retiring (not adding) schedulers. The goal being better support/integration rather than wider coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to be able to allow registering custom schedulers next to built-in ones. E.g. if NeMo wants to register some new ones or even overwrite some of them - the rest should be available, e.g. local_cwd. It's not practical to ask the user to adding all built-ins manually - we are not even sure in what order these will be resolved, e.g. both NeMo and some other package registering their own. The simplest idea was to allow additive behavior if we are not using the same names. I guess we can preserve the existing "clear+add" approach as well @kiukchung
|
@kiukchung has imported this pull request. If you are a Meta employee, you can view this in D74743585. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review automatically exported from Phabricator review in Meta.
A simple merge for the list of all registered schedulers.
Test plan:
[x] all existing tests should pass