Configuration reference
Here we are going to discuss all the additional information that you may need to create and use your own Processings.
Initialization
Here is the form that allows you to create a Processing, let's detail all the available fields.
- Name: This is the name that will be displayed and that you will be able to search in order to select a Processing. The Name has to be unique across your organization
- Description: Here you can provide a short description about what the Processing is actually doing or what it needs to run.
- Task: You have to select a task category for your processing, we are going to detail this in the dedicated section below
- Default parameters: Here you can add some parameters with default values that you will be able to edit before launching a Processing and also retrieve in your code
- Default execution: Those are the resources that we will provide for your code execution. Set them to the needed value.
- Docker Image: Those are the details that will allow us to find and launch the correct Docker Image that embeds your code. This Docker Image can be either stored in the public Docker Hub or in a Picsellia Private Registry.
- Name: This is the full name of your image. It contains your Docker Username, the actual Image name and an optional Registry Url. It has to be written this way:
REGISTRY_URL/DOCKER_USERNAME/IMAGE_NAME
- Tag: This is the Image tag we are going to use for the chosen image. Because you can have multiple tags for one single image, remember to set it to the most recent or correct one. We suggest typical names such as
1.0
orlatest
. - Flags: This is an optional field, it will contains the flags that we are going to add when running the
docker run ...
command. For example, if you need GPU support, you can add the flag--gpus=all
or if you need a environment variable to be set, you can add the flag-e env_var=value
- Name: This is the full name of your image. It contains your Docker Username, the actual Image name and an optional Registry Url. It has to be written this way:
That's all for the basic configuration of your Processing. But the form is going to change according to the task you choose. Let's explore the multiple options that you have.

Pre-annotation
This is the task you have to choose if you want to pre-annotate your images with a Model Version coming from your Model Registry or our Model HUB.
If your Processing is of type pre annotation, you will be able to choose a Model Version when launching your Processing.



Here, we selected the yolov8-preannotation
Processing (which is of type pre annotation) and we selected a Yolov8 model from the HUB to pre-annotate with.
As you can see, we can edit or add the parameters from the Processing right before launch.
Configuration

If you choose pre annotation, you will see a new (optional) input appear that will allow you to select a Model Version Type Constraint
This is useful if you want to make sure that when selecting a Processing to launch, you will only be prompted by Model Version of the right type.
You can also leave it empty and choose the model you want regardless of its type.
Data Augmentation
This is the type you have to choose if you want to perform Data Augmentation on a Dataset Version and add the new (with or without the originals) images to another Dataset Version that will be created automatically when you launch the Processing
As it will automatically create an empty new Dataset Version, you will be prompted with a text input when you select a Processing of this type to launch

Configuration

If you choose data augmentation, you will see a new (optional) input appear that will allow you to select a Target Dataset Version Type Constraint
This is useful if you want the Dataset Version to be created to be configured with the right type directly.
You can also leave it empty and the new Dataset Version will be of type NOT_CONFIGURED
Dataset Version Creation
This is a broader type that will automatically create a new Dataset Version automatically before launching the Processing. It has the exact same effect and options as the data processing type.
In fact, the
data augmentation
type is just a clearer and more specific name thandataset version creation
but they are basically the same thing.
Auto Tagging
This one is the same as pre annotation
so you can refer to the latter specification for more information.
Auto Annotation
This one is the same as pre annotation
so you can refer to the latter specification for more information.
Context within Jobs
When you launch a Processing, it automatically creates and schedules a Job that you can further find in the Jobs
page on Picsellia.
The Job created contains all the information about your Data Processing:
- The input Dataset Version
- The optional output Dataset Version
- The optional Model Version that you select
In order for your code to work properly, you will need to retrieve those information from your Job (using our Python SDK).
You can do it this way
from picsellia import Client
import os
api_token = os.environ["api_token"]
job_id = os.environ["job_id"]
client = Client(
api_token=api_token,
)
job = client.get_job_by_id(job_id)
The
api_token
andorganization_id
environment variables are added automatically to the execution environment, you can fetch them directly using theos
library just as above
Your code should always start with those lines in order for you to retrieve information from your Job and also communicating with Picsellia thanks to the client.
Context access
Now that you retrieved your job thanks to its id. It's time to retrieve what we call its context in order for you to have the essentials information about your Data Processing
context = job.sync()["dataset_version_processing_job"]
input_dataset_version_id = context["input_dataset_version_id"]
output_dataset_version_id = context["output_dataset_version_id"]
model_version_id = context["model_version_id"]
parameters = context["parameters"]
Remember that excepted for the
input_dataset_version_id
, all the other context variables are optional depending on the type of Processing that you launched, so they might not exist in thecontext
You can now retrieve the objects needed with the SDK using their id, for example:
input_dataset_version = client.get_dataset_version_by_id(input_dataset_version_id)
output_dataset_version = client.get_dataset_version_by_id(output_dataset_version_id)
model_version = client.get_dataset_version_by_id(model_version_id)
print(parameters)
>>>
{
"batch_size": 4,
...
}
And interact with them the way you usually do it!
Updated 7 days ago