Parallel hyperparameter tuning using GNU
Make. The similar
effect may be achieved with an alternative framework
like GNU
Parallel,
Python standard library
multiprocessing
module, or the PyTorch Multiprocessing Module
torch.multiprocessing.
The core idea is analogous to divide and conquer paradigm, just even more banal, and is called scatter and gather. In context, it means to scatter the large job into smaller, and typically independent, pieces that can run in parallel; and gather them once they’re all finished.
The key observation here is that once the data is
preprocessed, each training task may be run
independently; and the results are collated, once all
are finished. Formally put, say
In short, the steps are as follows:
- Create a search space for hyperparameters;
- Preprocess: load, sanitise, split, and resave data.
- Fit the model, evaluate it, and save the pre-trained model. Do so for each set of hyperparameters;
- Collate results and deduce the best.
make -k dist/hparams.json
make -kj 10 allCreate the dist/hparams.json first, and then run the training, evaluation and collation tasks in parallel, with a bandwidth of 10 processes.
In practice, too, the parallelisation resulted in a
speed up of
We use the good old GNU Make for this purpose.
Make is an order of processes defined in Makefile, so
that a dependency graph is inferred and independent
processes may be run in parallel. Once the Makefile,
and thereby the dependency graph, is defined by a user,
invoking make with the necessary switch automatically
runs the independent processes in parallel.
The illustrated example is an essential MWE, and also offers a quick refresher of GNU Make. The actual implementation is slightly more involved, and in the spirit of DRY/DIE.
Consider the following Makefile,
# Makefile
all : dist/.collated
dist/.collated : dist/.trained
python -m collate
touch dist/.collated
dist/.trained : dist/trained/A/.trained dist/trained/B/.trained
touch dist/.trained
dist/trained/A/.trained : dist/preprocessed
python -m train
touch dist/trained/A/.trained
dist/trained/B/.trained : dist/preprocessed
python -m train
touch dist/trained/B/.trained
# ...and so forthIt consists of a set of relationships of the form:
target : [ dependencies ]
[ recipe ]Each target is a filename (or sometimes not).
recipe is a set of shell commands that are
responsible to create the target. dependencies
are prerequisites, such that only after ensuring that
they are up-to-date, the recipe for a target is
invoked.
In the illustrated Makefile, the first rule says,
target all is satisfied if dist/.collated is.
The second says, dist/.collated is satisfied if
dist/.trained is and the subsequent recipe runs
without error. It’s recipe may be understood as
invoking python CLI module collate and thus updating
(or creating) the target file explicitly.
Similarly the third fourth and fifth targets define how
the target dist/.trained are defined.
Once again, the illustrated example is an essential MWE. The actual implementation is slightly more involved, and in the spirit of DRY/DIE.
A target may be invoked from command-line using
make [OPTIONS] [TARGET] [VAR=VAL]If unspecified the first target defined in the
Makefile is the default.
Commonly used [OPTIONS] include
-nto dry run;-Bto always make;-kto keep going as far as possible (even after error);-fto specify Makefile;-Cto change directory;-jfor number of parallel jobs;-ito ignore errors.
Further Reading: GNU Make Manual