Dataset - Processings
What is a Processing
?
Processing
?A Processing
is a piece of code (like a Python script) that interacts with your data on the platform on demand.
To explain this, let's start with a use case:
Let's say that you want to perform data augmentation on a Picsellia DatasetVersion
.
Normally, the steps to achieve this would be:
- Downloading your images locally
- Running a script with some data-augmentation techniques (like rotating the image for example) on all of your images
- Creating a new
DatasetVersion
you are using - Uploading the augmented images to this new
DatasetVersion
We know it can feel a little bit overwhelming 😮💨 Although running a script can be considered an automatic task, this process is fully manual. In addition, you must be using a computer that is actually able to run the code (it has to be in the correct environment, etc...)
This is why we came up with Processing
🎉 To let you automate this process and launch it whenever you want, on the data you want, directly from the platform!
So let's see how to use the most common Processing
already available (handmade by Picsellia ❤️)
Processing
can be run on DatasetVersion
, so you can perform actions like:
- Pre-annotation with a
ModelVersion
- Data Augmentation
- Smart Version Creations
- Or anything you can think of regarding your data!
In the future, you will find Processing
in every part of Picsellia:
- Models (automatically optimize and convert model weights)
- Experiments (perform evaluation or run benchmarks)
- Deployments (compute Custom metrics...)
But that's just a tease for now 😉
Use a Public Processing
Processing
Our journey starts on the Processings page, which you can access right below the Datasets tab in the Navigation Bar:
If you go there, you will have access to all the Processing
created by the Picsellia team alongside the ones you created.
Let's have a look at this page:
For now, we can see that only two Processing
are available. Given their names, we can conclude that they can be used to pre-annotate our DatasetVersion
with either YOLO or Tensorflow models.
Let's click on the edit icon of yolo-preannotation to see what's inside
You will see the same interface regardless of the
Processing
you want to edit is one of yours or ours.
Processing
usage example: Pre-annotation
Processing
usage example: Pre-annotationTo illustrate how you can use a Processing
, let's see one of the most useful examples: Pre-annotation with a ModelVersion
from your Registry (or our HUB)
Let's assume that we want to annotate all the cars and pedestrians in our Sample Dataset.
First, we are going to check in the Model Registry if we have a ModelVersion
suitable for the task.
Great! This ModelVersion
has been trained on many Labels
, and among them, there are car and people, so it should be apt to pre-annotate my DatasetVersion
😁
But first, let's go back to my DatasetVersion
and create the Label
that we want our ModelVersion
to annotate (in the settings).
Now that the labels are set up, our ModelVersion
will know which Labels
to predict.
Let's go back to my DatasetVersion
, from the Assets overview, you can click on the process button.
After clicking on this button, a modal where you can select a Processing
will open
As we have decided, we are going to pre-annotate using a YOLO Model. This means that we can select the yolo-preannotation Processing
. A new menu to select the Model
and ModelVersion
will open:
Let's select our smart-city-yolo ModelVersion
As we saw in the previous section, we can now edit (if we want) the default parameters of this Processing
. We could increase the prediction batch_size for example, but let's keep it at 8 for now.
Now let's finally Launch our Processing
!
Track the
Processing
progressWhen you launch a
Processing
, it creates aJob
running in the background. You can access the status and many more information about it in the Jobs tab.
On this page, you can see the history of all the Job
that ran or are currently running on your different DatasetVersion
.
If you just launched a Processing
, you should see it at the top of the list. Let's inspect our freshly launched pre-annotation Job
.
When you launch a Processing
, there will be a short moment when the status will be pending. Once your Job
has been scheduled (and you start being billed), the status will change to running and you will see some logs being displayed in real-time (those come from the stdout of the server it runs on)
In this way, you can really track the progress and the status of your Job
and check that everything is going well.
Once your Job
is over, you will have access to the full history of logs, and the total running time, and the status will switch to succeeded (or failed, if there were issues at runtime).
Your Job
will fail sometimes, but you'll be able to find the issue thanks to the stack trace in the Job
logs:
Once you have detected the issue, you have fixed it, and you have updated your Processing's Docker Image, you can click on the Re-run Job button. This will create and launch a second run just like the one on the left of the screen.
Re-run
Job
You can retry your Job as many times as you want, as long as there is no active run (meaning no run in the pending or running
Job
)
Now that our job has finished, let's have a look at our DatasetVersion
! It should be fully annotated with cars and pedestrians!
That's a full success 😎 Our DatasetVersion
has been nicely pre-annotated by our ModelVersion
with barely any effort. That's the power of Data Processings on Picsellia 😉
If you want to create your own
Processing
you can follow this guide.
Updated about 1 year ago