3. Configuration reference
Here we are going to discuss all the additional information that you may need to create and use your own Processing
.
1. Initialization
Here is the form that allows you to create a Processing
, let's detail all the available fields.
- Name: This is the name that will be displayed and that you will be able to search in order to select a Processing. The Name has to be unique across your Organization
- Description: Here you can provide a short description of what the
Processing
is actually doing or what it needs to run. - Task: You have to select a task category for your
Processing
, we are going to detail this in the dedicated section below - Default parameters: Here you can add some parameters with default values that you will be able to edit before launching a
Processing
and also retrieve in your code - Default execution: Those are the resources that we will provide for your code execution. Set them to the needed value.
- Docker Image: Those are the details that will allow us to find and launch the correct Docker Image that embeds your code. This Docker Image can be either stored in the Public Docker Hub or in a Picsellia Private Registry.
- Name: This is the full name of your image. It contains your Docker Username, the actual Docker Image name, and an optional Registry URL. It has to be written this way:
REGISTRY_URL/DOCKER_USERNAME/IMAGE_NAME
- Tag: This is the Image tag we are going to use for the chosen Docker Image. Because you can have multiple tags for one single image, remember to set it to the most recent or correct one. We suggest typical names such as 1.0 or latest.
- Flags: This is an optional field, it will contain the flags that we are going to add when running the docker run ... command. For example, if you need GPU support, you can add the flag --gpus=all or if you need an environment variable to be set, you can add the flag -e env_var=value.
- Name: This is the full name of your image. It contains your Docker Username, the actual Docker Image name, and an optional Registry URL. It has to be written this way:
That's all for the basic configuration of your Processing
. But the form is going to change according to the task you choose. Let's explore the multiple options that you have.
A. Pre-annotation
This is the task you have to choose if you want to pre-annotate your images with a ModelVersion
coming from the Model Registry (Public or Private).
If your Processing
is of type pre annotation, you will be able to choose a ModelVersion
when launching your Processing
.
Here, we selected the yolov8-preannotation Processing
(which is of type pre-annotation) and we selected a Yolov8 model from the HUB to pre-annotate with.
As you can see, we can edit or add the parameters from the Processing
right before launch.
Configuration
If you choose pre-annotation, you will see a new (optional) input appear that will allow you to select a Model Version Type Constraint/
This is useful if you want to make sure that when selecting a Processing
to launch, you will only be prompted by ModelVersion
of the right Detection Type.
You can also leave it empty and choose the ModelVersion
you want regardless of its type.
B. Data Augmentation
This is the type you have to choose if you want to perform Data Augmentation on a DatasetVersion
and add the new (with or without the originals) images to another DatasetVersion
that will be created automatically when you launch the Processing
.
As it will automatically create an empty new DatasetVersion
, you will be prompted with a text input when you select a Processing
of this type to launch.
Configuration
If you choose data augmentation, you will see a new (optional) input appear that will allow you to select a Target Dataset Version Type Constraint.
This is useful if you want the DatasetVersion
to be created to be configured with the right Detection Type directly.
You can also leave it empty and the new DatasetVersion
will be of type NOT_CONFIGURED.
C. Dataset Version Creation
This is a broader type that will automatically create a new DatasetVersion
automatically before launching the Processing
. It has the exact same effect and options as the data processing type.
In fact, the data augmentation type is just a clearer and more specific name than dataset version creation but they are basically the same thing.
D. Auto Tagging
This one is the same as pre-annotation so you can refer to the latter specification for more information.
E. Auto Annotation
This one is the same as _pre annotation _so you can refer to the latter specification for more information.
2. Context within Jobs
When you launch a Processing
, it automatically creates and schedules a Job
that you can further find in the Jobs overview on Picsellia.
The Job
created contains all the information about your Data Processing:
- The input
DatasetVersion
- The optional output
DatasetVersion
- The optional
ModelVersion
that you select
In order for your code to work properly, you will need to retrieve that information from your Job
(using our Python SDK).
You can do it this way:
from picsellia import Client
import os
api_token = os.environ["api_token"]
job_id = os.environ["job_id"]
client = Client(
api_token=api_token,
)
job = client.get_job_by_id(job_id)
The
api_token
andorganization_id
environment variables are added automatically to the execution environment, you can fetch them directly using theos
library just as above
Your code should always start with those lines in order for you to retrieve information from your Job
and also communicate with Picsellia thanks to the Client.
3. Context access
Now that you retrieved your job thanks to its id. It's time to retrieve what we call its context in order for you to have the essential information about your Data Processing.
context = job.sync()["datasetversionprocessingjob"]
input_dataset_version_id = context["input_dataset_version_id"]
output_dataset_version_id = context["output_dataset_version_id"]
model_version_id = context["model_version_id"]
parameters = context["parameters"]
Remember that excepted for the input_dataset_version_id, all the other context variables are optional depending on the type of
Processing
that you launched, so they might not exist in the context.
You can now retrieve the objects needed with the SDK using their id, for example:
input_dataset_version = client.get_dataset_version_by_id(input_dataset_version_id)
output_dataset_version = client.get_dataset_version_by_id(output_dataset_version_id)
model_version = client.get_dataset_version_by_id(model_version_id)
print(parameters)
>>>
{
"batch_size": 4,
...
}
And interact with them the way you usually do it!
Updated 10 months ago