- 
                Notifications
    You must be signed in to change notification settings 
- Fork 146
(torchx/scheduler) Fill hostnames for each replica in slurm scheduler's describe API #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Differential Revision: D76485112
707637e    to
    b700e74      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| roles=list(roles.values()), | ||
| roles_statuses=list(roles_statuses.values()), | ||
| state=app_state, | ||
| msg=msg, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea msg defaults to an empty string if not specified. We were just setting msg=state so no real functional value added + describe_sacct didn't set msg
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
b700e74    to
    6eca839      
    Compare
  
    …'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
6eca839    to
    3a3772c      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
    
      
        1 similar comment
      
    
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
3a3772c    to
    d94f744      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
d94f744    to
    256b6fe      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
256b6fe    to
    8659984      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
8659984    to
    ff1e48f      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
ff1e48f    to
    0489983      
    Compare
  
    cf83fbd    to
    811a65f      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
811a65f    to
    cc6ef31      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
cc6ef31    to
    5e2fad7      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
5e2fad7    to
    923250a      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
923250a    to
    bd24bf9      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…'s describe API (#1080) Summary: Additionally fill hostname, resource (cpu, memMB), image, entrypoint in `describe_squeue` for each role/replica. Reviewed By: d4l3k Differential Revision: D76485112
bd24bf9    to
    ba2ffea      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D76485112 | 
…ion and hostnames to mesh_spec Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: Pull Request resolved: #296 TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: Pull Request resolved: #296 TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192
…ion and hostnames to mesh_spec (#296) Summary: Pull Request resolved: #296 TorchX's `status` API returns a struct that has `replica.hostname` field. However it is not always filled for all schedulers. meta-pytorch/torchx#1080 makes it such that the slurm scheduler in TorchX fills out the hostname information. This PR adds a `hostnames` field to `monarch.tools.mesh_sepc.MeshSpec` and fills it up with the hostnames returned by TorchX. This information will be used in PR (5/n) to implement a `TorchXAllocator` Reviewed By: suo Differential Revision: D76847192 fbshipit-source-id: 4d55083009ef9dd6ed46717fd375f5a49ee86a95
Summary: Use
scontrolto implement the describe API that fills out the hostnames for each replica.Differential Revision: D76485112