Dataset - Processings

What is a Processing?

A Processing is a piece of code (like a Python script) that interacts with your data on the platform on demand.

To explain this, let's start with a use case:

Let's say that you want to perform data augmentation on a Picsellia DatasetVersion.

Normally, the steps to achieve this would be:

  • Downloading your images locally
  • Running a script with some data-augmentation techniques (like rotating the image for example) on all of your images
  • Creating a new DatasetVersion you are using
  • Uploading the augmented images to this new DatasetVersion

We know it can feel a little bit overwhelming ๐Ÿ˜ฎโ€๐Ÿ’จ Although running a script can be considered an automatic task, this process is fully manual. In addition, you must be using a computer that is actually able to run the code (it has to be in the correct environment, etc...)

This is why we came up with Processing ๐ŸŽ‰ To let you automate this process and launch it whenever you want, on the data you want, directly from the platform!

So let's see how to use the most common Processing already available (handmade by Picsellia โค๏ธ)

Processing can be run on DatasetVersion, so you can perform actions like:

  • Pre-annotation with a ModelVersion
  • Data Augmentation
  • Smart Version Creations
  • Or anything you can think of regarding your data!

In the future, you will find Processing in every part of Picsellia:

  • Models (automatically optimize and convert model weights)
  • Experiments (perform evaluation or run benchmarks)
  • Deployments (compute Custom metrics...)

But that's just a tease for now ๐Ÿ˜‰

Use a Public Processing

Our journey starts on the Processings page, which you can access right below the Datasets tab in the Navigation Bar:

Access _Processings_

Access Processings

If you go there, you will have access to all the Processing created by the Picsellia team alongside the ones you created.

Let's have a look at this page:

Available `Processing`

Available Processing

For now, we can see that only two Processing are available. Given their names, we can conclude that they can be used to pre-annotate our DatasetVersion with either YOLO or Tensorflow models.

Let's click on the edit icon of yolo-preannotation to see what's inside


You will see the same interface regardless of the Processing you want to edit is one of yours or ours.

Processing usage example: Pre-annotation

To illustrate how you can use a Processing, let's see one of the most useful examples: Pre-annotation with a ModelVersion from your Registry (or our HUB)

`DatasetVersion` to pre-annotate

DatasetVersion to pre-annotate

Let's assume that we want to annotate all the cars and pedestrians in our Sample Dataset.

First, we are going to check in the Model Registry if we have a ModelVersion suitable for the task.

`ModelVersion` for pre-annotation

ModelVersion for pre-annotation

Great! This ModelVersion has been trained on many Labels, and among them, there are car and people, so it should be apt to pre-annotate my DatasetVersion ๐Ÿ˜

But first, let's go back to my DatasetVersion and create the Label that we want our ModelVersion to annotate (in the settings).

Setting up the `DatasetVersion`

Setting up the DatasetVersion

Now that the labels are set up, our ModelVersion will know which Labels to predict.

Let's go back to my DatasetVersion, from the Assets overview, you can click on the process button.

Launch the `Processing`

Launch the Processing

After clicking on this button, a modal where you can select a Processing will open

Select the `Processing`

Select the Processing

As we have decided, we are going to pre-annotate using a YOLO Model. This means that we can select the yolo-preannotation Processing. A new menu to select the Model and ModelVersionwill open:

Select the `Model` and `ModelVersion`

Select the Model and ModelVersion

Let's select our smart-city-yolo ModelVersion

Define `Processing`parameters

Define Processingparameters

As we saw in the previous section, we can now edit (if we want) the default parameters of this Processing. We could increase the prediction batch_size for example, but let's keep it at 8 for now.

Now let's finally Launch our Processing!


Track the Processing progress

When you launch a Processing, it creates a Job running in the background. You can access the status and many more information about it in the Jobs tab.

Reach _Jobs_ tab

Reach Jobs tab

On this page, you can see the history of all the Job that ran or are currently running on your different DatasetVersion.

_Jobs overview_

Jobs overview

If you just launched a Processing, you should see it at the top of the list. Let's inspect our freshly launched pre-annotation Job.

`Job`logs and status

Joblogs and status

When you launch a Processing, there will be a short moment when the status will be pending. Once your Job has been scheduled (and you start being billed), the status will change to running and you will see some logs being displayed in real-time (those come from the stdout of the server it runs on)

`Job`logs and status

Joblogs and status

In this way, you can really track the progress and the status of your Job and check that everything is going well.

Once your Job is over, you will have access to the full history of logs, and the total running time, and the status will switch to succeeded (or failed, if there were issues at runtime).

`Job`logs and status

Joblogs and status

Your Job will fail sometimes, but you'll be able to find the issue thanks to the stack trace in the Job logs:

`Job`logs and status

Joblogs and status

Once you have detected the issue, you have fixed it, and you have updated your Processing's Docker Image, you can click on the Re-run Job button. This will create and launch a second run just like the one on the left of the screen.




Re-run Job

You can retry your Job as many times as you want, as long as there is no active run (meaning no run in the pending or running Job)

Now that our job has finished, let's have a look at our DatasetVersion! It should be fully annotated with cars and pedestrians!

Preannotated `DatasetVersion`

Preannotated DatasetVersion

That's a full success :sunglasses: Our DatasetVersion has been nicely pre-annotated by our ModelVersion with barely any effort. That's the power of Data Processings on Picsellia :wink:


If you want to create your own Processing you can follow this guide.