Experiment - Launch training
After having initialized an Experiment
in the frame of a Project
, it is time to launch the training of the ModelVersion
we're aiming to create within this Experiment
.
Depending on the way you integrated the Picsellia platform among your infrastructure, there are three different ways to launch the training of a ModelVersion
through an Experiment
:
- Launch the training script on the Picsellia training engine
- Launch the training script on your own training resources from the Picsellia platform
- Launch the training script manually on your own training resources
Each possibility has its own requirement that you need to fulfill to ensure the success of the ModelVersion
training.
1. Launch the training on the Picsellia training engine
You have the possibility to use the Picsellia training engine that is natively integrated with the platform, allowing you to execute the training script directly from your Experiment
.
The Picsellia training engine is hosted by OVH and composed of NVIDIA Tesla V100s GPUs. The training engine is operated by the Picsellia team.
Consumption
Executing your training script on the Picsellia training engine will consume your MPU quota. As a consequence, before launching the execution of a training script on the Picsellia training engine, make sure that you still have enough MPU quota to handle the training time. As a reminder MPU represents the use of Picsellia computing ressources (GPU) for training, serving or preprocessing, the MPU consumption can be tracked though the Plan and usage dashboard.
You can launch the execution of the training script with your current Experiment
by clicking on the button Launch Experiment and Picsellia Infra as shown below:
If you have the necessary permissions and if everything has been properly set up the execution of the training script on the Picsellia training engine will then start.
This execution can be followed in real-time through the Telemetry tab, as long as information is logged by the script on Picsellia.
Actually, the training script that will be executed, is the one contained in the Docker image of the Base Architecture selected for the current Experiment
. If the script has been written following the guidelines provided in this guide, it will get the DatasetVersion
and training parameters to initialize the training step of your model. During the training, in addition to visualizing real-time execution information in the Telemetry tab, the Callback potentially defined in the training script will initialize and fill in the metrics in the Logs tab. At the end of the training, Evaluation
can also be logged and computed by the training script and the outcome files stored as Artifacts on the Picsellia Experiment
.
Once again, all those steps are defined in the training script, so they are under the responsibility of the script author. This is why during the first usage of the platform, we advise you to use a ModelVersion
of the Public Registry as Base Architecture as the training scripts contained under the Docker image attached have been properly written by the Picsellia team according to the main guidelines defined in the guide.
2. Launch the training script on your own training resources from the Picsellia platform
In case you are willing to use your own Computing Ressource to perform the training of your ModelVersion
, you can use the Docker command generated for you and launch this one on your training infrastructure, the data & training parameters will be pulled from your Picsellia Experiment
and the outcome of the training (Metrics, Artifact, Evaluation
..) logged on your Picsellia Experiment
by your training script.
You can launch the execution of the training script on your infrastructure with your current Experiment
by clicking on the button Launch Experiment and Docker as shown below:
If you have the necessary permissions a Docker command will then be displayed in a modal, you just need to copy this command and paste it into your training environment on your own infrastructure. If everything has been properly set up on your side, the execution of the training script on your infrastructure will then start.
This execution can be followed in real-time through the Telemetry tab, as long as information is logged by the script on Picsellia.
Actually, the training script that will be executed, is the one contained in the Docker image of the Base Architecture selected for the current Experiment
. If the script has been written following the guidelines provided in this guide, it will get the DatasetVersion
and training parameters to initialize the training step of your ModelVersion
. During the training, in addition to visualizing real-time execution information in the Telemetry tab, the Callback potentially defined in the training script _will initialize and fill in the _Metrics in the Logs tab. At the end of the training, Evaluation
can also be logged and computed by the training script, and the outcome files stored as Artifacts on the Picsellia Experiment
.
Once again, all those steps are defined in the training script, so they are under the responsibility of the script author. This is why during the first usage of the platform, we advise you to use a ModelVersion
of the Public Registry as Base Architecture as the training scripts contained under the Docker image attached have been properly written by the Picsellia team according to the main guidelines defined in the guide.
This way of launching the training of your ModelVersion
allows you to use your own computing resources and in the meantime leverage the Picsellia platform to structure and orchestrate your Computer Vision projects.
However, this method also implies that you use as Base Architecture a ModelVersion
that has a Docker Image containing the training script attached. This Docker image must also be stored either on a public registry on the Docker Hub or on a private container registry provided by the Picsellia team. In case you do not want to Dockerize your code you can launch the script manually on your own infra as explained below.
3. Launch manually the training script on your own training resources
In case your code is not Dockerized, or if you do not want to publish your Docker image on a container hub, you can also take care yourself of launching the training on your infrastructure.
In this case, you need to make sure that your script has been adapted to Picsellia (as detailed in this guide) so that the DatasetVersion
et training parameters attached to the Experiment
will be pulled properly on the Computing resource that is executing the script. Furthermore, the results of the training (Metrics, Evaluation
and Artifacts) are stored in your Picsellia Experiment
, this way you will ensure the traceability and the structuration of your Project
directly within your Picsellia Organization.
Updated about 1 year ago