Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a @job
#1777
Closed
Andrew-S-Rosen
started this conversation in
General
Replies: 1 comment
-
I ultimately agreed with the idea that the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This question was raised by @zulissimeta, and I would like to open it up for further discussion (e.g. perhaps @Nekkrad or @samblau have opinions).
There are a few
@job
s inquacc
that pack multiple tasks together in a single@job
if one or more of them are incredibly short. A good example of this is the VASPdouble_relax_job
, which runs a relaxation and then one follow-up relaxation for good measure. This follow-up relaxation is often extremely inexpensive; perhaps just a step or two. Another nice example is the Quantum ESPRESSObands_job
proposed in #1701 that does a non-self consistent calculation and then two cheap post-processing steps thereafter all in a single@job
.The question I'd like to raise here is: going forward, should we pack cheap tasks in a single
@job
, or should we always defer to a@flow
pattern?The main reason for adopting a single
@job
in such scenarios is that, for users where each@job
is a Slurm job, it reduces the number of jobs in the queue, thereby increasing overall throughput. This can make a big difference for users of Covalent or Jobflow (via FireWorks'qlaunch
method) where this job scheduling paradigm is common. However, it doesn't make a big difference for users of Dask, Parsl, or Prefect which all adopt the pilot job model where multiple@job
s are packed into a single Slurm job that continually pulls in new work. It also doesn't impact people that don't use a workflow engine or users of Redun, where this concept of jobs vs. flows is irrelevant.On the flipside, the benefit of having each discrete unit of work be a
@job
is that it can make things a bit more intuitive. For instance, pretty much all@job
s inquacc
return adict
of some schema; however, in thedouble_relax_job
, a{"relax1": RunSchema, "relax2": RunSchema}
dictionary is returned, which is more akin to what a@flow
typically returns. If for some reason the "short" step fails, it is also easier to just rerun that one component if they are separate@job
s.I can see both sides of the argument and would be interested in getting people's thoughts on this.
Beta Was this translation helpful? Give feedback.
All reactions