Training in AI4OS¶
This page serves a guide on the different options to train a model in the AI4OS platform.
Training options¶
There are currently three main options to train a model in the AI4OS platform:
standard mode: you are given access to a persistent deployment that you can interact with via an IDE (ie. VScode).
batch mode: you deploy a temporary job that runs your training and then is killed when the training is completed
federated mode: you deploy a federated learning server that orchestrates the training. Then you can have several clients joining forces to distribute the training load among all of them.
All these options have the respective pros and cons.
Option |
✅ Pros |
❌ Cons |
---|---|---|
Standard mode (persistent deployment) |
|
|
Batch mode (temporary jobs) |
|
|
|
|
Given the above specifications, we recommend the following typical workflows:
Use standard mode for you preliminary trainings, when you still might need to have direct access to the code/data to debug things.
Use batch mode when your training script is stable, and you are basically tweaking hyperparameters.
Use federated mode if you have sensitive data and/or need to distribute you training across many machines.