Skip to main content

Hyperparameter search

When doing hyperparameter optimization, we want to do multiple training runs with different configurations. In Nix AI this can be achieved by defining multiple configurations for a training run.

This page describes the trainings.<name>.configurations and trainings.<name>.mergeDirectorys options. You may also find all options in the Nix AI Search.

Defining multiple training-configurations

Implicit

We can pass in multiple configuration items for the training like this:

outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
...
trainings."myMnistModel" = {
...
configurations = [
{
seed = 42;
epochs = 20;
learning_rate = [ 0.005 0.001 0.0005 ];
batch_size = [ 16 32 ];
dataset = "myMnistDataset";
}
];
...

The example will generate 6 training configurations:

Modelepochslearning_ratebatch_size
model-0200.00516
model-1200.00532
model-2200.00116
model-3200.00132
model-4200.000516
model-5200.000532

Note that the parameters that are not defined as a list (seed, epochs, dataset) stay the same in all configurations.

We do not have to rely on the cartesian product for multiple training definitions. As trainings.<name>.configurations is a list per definition, we can also define multiple configurations explicitly:

outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
...
trainings."myMnistModel" = {
...
configurations = [
{
seed = 42;
epochs = 20;
learning_rate = 0.005;
batch_size = 16;
dataset = "myMnistDataset";
}
{
seed = 42;
epochs = 20;
learning_rate = 0.001;
batch_size = 16;
dataset = "myMnistDataset";
}
{
seed = 42;
epochs = 20;
learning_rate = 0.0005;
batch_size = 16;
dataset = "myMnistDataset";
}
];
...

The example will generate only 3 training configurations:

Modelepochslearning_ratebatch_size
model-0200.00516
model-1200.00116
model-2200.000516

We can combine both methods to generate all possible training configurations.

Summarizing training results

Each training run will by default create its own folder with checkpoints, logs, etc... But we want to have all logs of all trainings in one folder usually. We can achieve this with the attribute. trainings.<name>.mergeDirectorys:

outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
...
trainings."myMnistModel" = {
...
configurations = [
# multiple configurations
...
];

drop = [ "logs" "checkpoints" ];
mergeDirectories = [ "logs" ];
...

This will copy all files and directories within each logs directory into one central logs directory of the output. This is simply a convenience functionality, as the individual training result folders will also be kept. It is important, that all items of mergeDirectories have to also be part of drop, as otherwise these files and directories would not exist in the individual training result's folder.

Summarizing all logs directories into one will enable logging tool frontends like tensorboard to just read in a single directory with all training runs.