- 
                Notifications
    
You must be signed in to change notification settings  - Fork 15
 
Description
We've had a few discussions about this, and after noodling on it for a while I thought it would help to get us talking to put together something we can poke at. Here's the example from the Rv1 RFC, with the hostname changed to disambiguate for later:
{
  "version": 1,
  "execution": {
    "R_lite": [
      {
        "rank": "19-22",
        "children": {
          "core": "0-47",
          "gpu": "0-7"
        }
      }
    ],
    "nodelist": [
      "fluke[186-189]"
    ],
    "starttime": 1676560542,
    "expiration": 1676562342
  }
}I'm going to walk through a thought process here, so forgive me if this wanders a bit... My goals for the updated format:
- Still reasonably easy for a human to write/read
 - Similar idset-based structure
 - Shared implementation usable by core and sched
 - Non-tree details can be ignored by core
 - The containment hierarchy, our main tree that contains all resource vertices in fluxion, is specified in a way core can read if it cares to
 - If possible:
- incremental parsing, because big JGFs are a problem because we can't there right now
 - Ability to encode everything fluxion needs
 
 
Given that= I'm thinking we make the actual resource tree structure nested like we do for the resource request in jobspec, where there are requirements on the "node" typed entries that match the current requirements. Taking just what is currently the "R_lite" key, probably "R" now or "R_2%-skim" or something, I'm thinking something like this:
The main difference here, is that children is now an array of objects that describe arbitrary resource types and the "ranks" key is missing. Thought process is that the hardware of the machine is probably more or less stable, and the tree structure makes it easy to express "local" IDs at a level where ranks are global.
Here's another one with some more levels just to see what it does:
{
  "type": "cluster",
  "ids": "0",
  "children": [
    {
      "type": "rack",
      "ids": "0-5",
      "children": [
        {
          "type": "ssd",
          "ids": "0-15"
        },
        {
          "type": "node",
          "ids": "0-14",
          "children": [ // note array now
            {
              "type": "socket",
              "ids": "0",
              "children": [
              {
                "type": "core",
                "ids": "24-47",
                "ids_under": "node"
              },
              {
                "type": "gpu",
                "ids": "4-7",
                "ids_under": "node"
              }
             ]
            }
            {
              "type": "socket",
              "ids": "1",
              "children": [
              {
                "type": "core",
                "ids": "0-23",
                "ids_under": "node"
              },
              {
                "type": "gpu",
                "ids": "0-3",
                "ids_under": "node"
              }
             ]
            }
          ]
        }
      ]
    }
  ]
}Now we have a system with 6 racks, each of which has 16 ssds and 15 nodes, each of which has two sockets with numbered sets of cores and GPUs. I'm still scratching my head on this a bit, the ID situation makes it easy to say which cores we want to allow and which we don't, but when it's used in this hierarchical way to expand out into a complete system it seems to run into problems. I threw in an "ids_under" key here that goes in a specific resource to say that the IDs are unique or applicable under a specific resource type which must exist above that level. Here using that for cores and GPUs, referring to the node through the socket. Could maybe also use that to have a global rank mapping, in here but I think that would get confusing fast. Ranks then would be applied like the nodelist is now, based on a mapping to generated resources in order. Now that I have this written out, chewing on if there's another way to break this up, but this is pretty close to how our resource key works in jobspec, in fact if it were "with" and "count"... yeah. Maybe worth making them similar.
To finish out the thought on the rest of it, I'm thinking we could make the full thing a json stream format, multiple consecutive objects rather than one, a canonical version of that is jsonl (just one obj per line). Slightly more annoying to parse, but much more efficient to produce and consume if it has to be large, and means we could easily tolerate having it broken up into files. So, something like this might be a complete one including edges for the fluxion graph:
{
  "version": 2,
  "type": "R",
  "resources": {
    "type": "node",
    "ids": "19-22",
    "children": [ // note array now
      {
        "type": "core",
        "ids": "0-47"
      },
      {
        "type": "gpu",
        "ids": "0-7"
      }
    ]
  }
}
{"type": "edge", "labels":"storage", "from": "node19", "to": "node20"}
... // other extra data
{ "type": "node", "ids": "19-22", "children": [ // note array now { "type": "core", "ids": "0-47", }, { "type": "gpu", "ids": "0-7", } ] }