Zum Hauptinhalt springen

Define your Model

The most important part of training is the model. Here we can declare, how we want to train our model.

This page describes the trainings option. You may also find all options in the Nix AI Search.

Define a Training

We are training a simple MNIST classifier:

# flake.nix
...
outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
...
trainings."myMnistModel" = {
GPU = "any";

copyDatasets = [ "myMnistDataset" ];
directoryPath = ./models

commands = ''
python main.py
'';

drop = [ "checkpoints" ];

configurations = [
{
# Json Key-Value-Pairs for our config.json
seed = 42;
...
}
];
...

When training a model there are several considerations to take into account:

  • Name: We can define many models or training procedures. Each training needs an name. We set our name by using a string after trainings.[Name]. In our case the name is myMnistModel.

  • GPU: Which GPU do we want to use for our training? We can just set our GPU to any and use any GPU available.

    GPU = "any";

  • copyDatasets: Which dataset will we train on? Here we can reference the datasets created before:

    copyDatasets = [ "mnist" ];

  • directoryPath: Where does our training code reside? Here we will reference the folder with our Python code. It will be the root directory of our project. The directoryPath should never be ./ and should also never be in quotes:

    directoryPath = ./models;

  • commands: How do we invoke our training script? Here we invoke the main.py, which was copied from our models directory:

    commands = ''
    python main.py
    '';
  • drop: Several outputs result from our training script. We want to keep the checkpoints folder here:

    drop = [ "checkpoints" ];

    Note: In this instance 'drop', does not mean deleting files, but dropping them into the output directory.

Configure Training Parameters

You can customize your training by modifying the `configurations` section in your flake.nix:

...
configurations = [{
seed = 42;
epochs = 20;
learning_rate = 0.001;
batch_size = 16;
dataset = "myMnistDataset";
}];
...

These are config options that are defined by your training. When the training is started, a training directory will be created, where datasets and scripts reside next to a file called config.json. This file will contain the necessary training parameters. We can add multiple parameter sets and run multiple trainings in parallel. This way we can also do automated hyper parameter optimization.

Test Configuration

Before we want to run our training on a costly GPU, we may want to do a little test run on the CPU, to confirm that our training script actually works. This is what the testConfiguration option is for:

...
testConfiguration = {
seed = 42;
epochs = 1; # Only use one epoch for testing purposes
batch_size = 16;
learning_rate = 0.001;
dataset = "myMnistDataset";
test = true;
};
...

Using the Configuration in our Model

All scripts related to your training must reside in the folder models in your repository. Your training script should be organized to work with the Nix AI configuration. Here is an example of how to start your training with the config.json:

# models/main.py
...

# Load the config.json file, which is created by nix-ai from the configurations in the nix file
with open('config.json', 'r') as f:
config = json.load(f)

...

train(
seed=config['seed'],
epochs=config['epochs'],
learning_rate=config['learning_rate'],
dataset=dataset,
batch_size=config['batch_size'],
)

...