Define your Model
The most important part of training is the model. Here we can declare, how we want to train our model.
This page describes the trainings option. You may also find all options in the Nix AI Search.
Define a Training
We are training a simple MNIST classifier:
# flake.nix
...
outputs = { nix-ai, ... }: nix-ai.lib.mkFlake {
...
trainings."myMnistModel" = {
GPU = "any";
copyDatasets = [ "myMnistDataset" ];
directoryPath = ./models
commands = ''
python main.py
'';
drop = [ "checkpoints" ];
configurations = [
{
# Json Key-Value-Pairs for our config.json
seed = 42;
...
}
];
...
When training a model there are several considerations to take into account:
-
Name: We can define many models or training procedures. Each training needs an name. We set our name by using a string after
trainings.[Name]. In our case the name ismyMnistModel. -
GPU: Which GPU do we want to use for our training? We can just set our GPU to any and use any GPU available.
GPU = "any"; -
copyDatasets: Which dataset will we train on? Here we can reference the datasets created before:
copyDatasets = [ "mnist" ]; -
directoryPath: Where does our training code reside? Here we will reference the folder with our Python code. It will be the root directory of our project. The
directoryPathshould never be./and should also never be in quotes:directoryPath = ./models; -
commands: How do we invoke our training script? Here we invoke the
main.py, which was copied from ourmodelsdirectory:commands = ''
python main.py
''; -
drop: Several outputs result from our training script. We want to keep the
checkpointsfolder here:drop = [ "checkpoints" ];Note: In this instance 'drop', does not mean deleting files, but dropping them into the output directory.
Configure Training Parameters
You can customize your training by modifying the `configurations` section in your flake.nix:
...
configurations = [{
seed = 42;
epochs = 20;
learning_rate = 0.001;
batch_size = 16;
dataset = "myMnistDataset";
}];
...
These are config options that are defined by your training. When the training is started, a training directory will be created, where datasets and scripts reside next to a file called config.json. This file will contain the necessary training parameters. We can add multiple parameter sets and run multiple trainings in parallel. This way we can also do automated hyper parameter optimization.
Test Configuration
Before we want to run our training on a costly GPU, we may want to do a little test run on the CPU, to confirm that our training script actually works. This is what the testConfiguration option is for:
...
testConfiguration = {
seed = 42;
epochs = 1; # Only use one epoch for testing purposes
batch_size = 16;
learning_rate = 0.001;
dataset = "myMnistDataset";
test = true;
};
...
Using the Configuration in our Model
All scripts related to your training must reside in the folder models in your repository. Your training script should be organized to work with the Nix AI configuration. Here is an example of how to start your training with the config.json:
# models/main.py
...
# Load the config.json file, which is created by nix-ai from the configurations in the nix file
with open('config.json', 'r') as f:
config = json.load(f)
...
train(
seed=config['seed'],
epochs=config['epochs'],
learning_rate=config['learning_rate'],
dataset=dataset,
batch_size=config['batch_size'],
)
...