Configuration reference

Here we are going to discuss all the additional information that you may need to create and use your own Processings.


Here is the form that allows you to create a Processing, let's detail all the available fields.

  • Name: This is the name that will be displayed and that you will be able to search in order to select a Processing. The Name has to be unique across your organization
  • Description: Here you can provide a short description about what the Processing is actually doing or what it needs to run.
  • Task: You have to select a task category for your processing, we are going to detail this in the dedicated section below
  • Default parameters: Here you can add some parameters with default values that you will be able to edit before launching a Processing and also retrieve in your code
  • Default execution: Those are the resources that we will provide for your code execution. Set them to the needed value.
  • Docker Image: Those are the details that will allow us to find and launch the correct Docker Image that embeds your code. This Docker Image can be either stored in the public Docker Hub or in a Picsellia Private Registry.
    • Name: This is the full name of your image. It contains your Docker Username, the actual Image name and an optional Registry Url. It has to be written this way: REGISTRY_URL/DOCKER_USERNAME/IMAGE_NAME
    • Tag: This is the Image tag we are going to use for the chosen image. Because you can have multiple tags for one single image, remember to set it to the most recent or correct one. We suggest typical names such as 1.0or latest.
    • Flags: This is an optional field, it will contains the flags that we are going to add when running the docker run ...command. For example, if you need GPU support, you can add the flag --gpus=allor if you need a environment variable to be set, you can add the flag -e env_var=value

That's all for the basic configuration of your Processing. But the form is going to change according to the task you choose. Let's explore the multiple options that you have.


This is the task you have to choose if you want to pre-annotate your images with a Model Version coming from your Model Registry or our Model HUB.

If your Processing is of type pre annotation, you will be able to choose a Model Version when launching your Processing.

Here, we selected the yolov8-preannotationProcessing (which is of type pre annotation) and we selected a Yolov8 model from the HUB to pre-annotate with.

As you can see, we can edit or add the parameters from the Processing right before launch.


If you choose pre annotation, you will see a new (optional) input appear that will allow you to select a Model Version Type Constraint

This is useful if you want to make sure that when selecting a Processing to launch, you will only be prompted by Model Version of the right type.

You can also leave it empty and choose the model you want regardless of its type.

Data Augmentation

This is the type you have to choose if you want to perform Data Augmentation on a Dataset Version and add the new (with or without the originals) images to another Dataset Version that will be created automatically when you launch the Processing

As it will automatically create an empty new Dataset Version, you will be prompted with a text input when you select a Processing of this type to launch


If you choose data augmentation, you will see a new (optional) input appear that will allow you to select a Target Dataset Version Type Constraint

This is useful if you want the Dataset Version to be created to be configured with the right type directly.

You can also leave it empty and the new Dataset Version will be of type NOT_CONFIGURED

Dataset Version Creation

This is a broader type that will automatically create a new Dataset Version automatically before launching the Processing. It has the exact same effect and options as the data processing type.


In fact, the data augmentationtype is just a clearer and more specific name than dataset version creationbut they are basically the same thing.

Auto Tagging

This one is the same as pre annotation so you can refer to the latter specification for more information.

Auto Annotation

This one is the same as pre annotation so you can refer to the latter specification for more information.

Context within Jobs

When you launch a Processing, it automatically creates and schedules a Job that you can further find in the Jobspage on Picsellia.

The Job created contains all the information about your Data Processing:

  • The input Dataset Version
  • The optional output Dataset Version
  • The optional Model Version that you select

In order for your code to work properly, you will need to retrieve those information from your Job (using our Python SDK).

You can do it this way

from picsellia import Client
import os

api_token = os.environ["api_token"]
job_id = os.environ["job_id"]

client = Client(

job = client.get_job_by_id(job_id)


The api_tokenand organization_idenvironment variables are added automatically to the execution environment, you can fetch them directly using the oslibrary just as above

Your code should always start with those lines in order for you to retrieve information from your Job and also communicating with Picsellia thanks to the client.

Context access

Now that you retrieved your job thanks to its id. It's time to retrieve what we call its context in order for you to have the essentials information about your Data Processing

context = job.sync()["dataset_version_processing_job"]

input_dataset_version_id = context["input_dataset_version_id"]
output_dataset_version_id = context["output_dataset_version_id"]
model_version_id = context["model_version_id"]
parameters = context["parameters"]


Remember that excepted for the input_dataset_version_id, all the other context variables are optional depending on the type of Processing that you launched, so they might not exist in the context

You can now retrieve the objects needed with the SDK using their id, for example:

input_dataset_version = client.get_dataset_version_by_id(input_dataset_version_id)
output_dataset_version = client.get_dataset_version_by_id(output_dataset_version_id)
model_version = client.get_dataset_version_by_id(model_version_id)
 "batch_size": 4,

And interact with them the way you usually do it!