6 - Import your annotations

Objectives:

  • Visualize the assets of your dataset version
  • Import your annotations
  • Use the integrated labeling tool
  • Browse among assets or structure them of the dataset version
  • Download assets & annotations
  • Operate your data processing
  • Assess the quality of your dataset

Vizualise a dataset version

You will access the dataset view by clicking on the dataset version.

First, the dataset overview offers the possibility to visualize your dataset and browse it among its assets. Each dataset also embeds its own tagging system (independent from the datalake tagging system) allowing you to structure your dataset as you wish.

At this step, the dataset should be free of annotations.

Import your annotations and/or use the integrated labeling tool

If you have the annotations already available locally, you can import them directly into your dataset, nevertheless, they must be in COCO, YOLO, or PascalVOC format. To do so, press the “Annotation” button at the top right corner and select “Import annotations”. Then, follow the modal’s instructions to upload the annotations file.

The detection type, the labels, and the annotations will automatically be set up and displayed during the annotation import process. This import task is asynchronous, so you can follow its completion in the "Jobs" interface:

📘

Any Picsellia asynchronous task (annotation import, dataset creation..) can be tracked from this "Internal jobs" overview:

To do it with the SDK:

Even with imported annotations you can still modify the annotation using the Picsellia labeling tool. To do it, you should go to any asset from the dataset and select the “Annotate” button. Then the labeling tool will open and you can remove, add or modify any annotation from your dataset, more details about the annotation are available here.

If you do not have an annotation file already

If your dataset needs to be manually annotated, you can use the labeling tool of Picsellia to annotate your whole dataset from scratch.

To do it, you need first to go to the “Settings” part of your dataset and define in the “Labels” tab the type of recognition and the associated labels (e.g classes).

Once done you can go back to the “Assets” view and start to annotate the assets. More details about the annotation are available here.

In the case of a classification dataset

In case you want to develop a classification model, you can use a handy feature called “Transform tags to classifications”. A prerequisite is that you have uploaded each data with the tags related to its annotation. Be careful, for this manipulation each data uploaded must have only one tag at the datalake level which is supposed to be the annotation later in the dataset. Then after the dataset creation, in the “Settings” tab, you need to set up the dataset as "Classification" and create at least one label to set the classification type of the current dataset (this label could be deleted once tags are transformed into annotations).

Once done, back in the “Assets” view, by clicking the “Annotations” button you should see “Transform tags to classifications”, click on it, and the classes will be automatically created and the assets classified based on their tags. Please note that this operation can take a few minutes but is done asynchronously so you can keep working on Picsellia in the meantime.

📘

Picsellia offers you the opportunity to run processings on your dataset

Processings are the perfect tool to apply specific treatments to your dataset such as data augmentation or pre-annotation. Here you can know more about processings.

Browse among assets

Now that the dataset is fully annotated, you can browse among its assets thanks to the search bar. Do not hesitate to use the autocompletion to get what are the asset's properties you can filter on.

👍

You can even retrieve data properties from the dataset search bar typing "data.***"

Ensure dataset quality

Once the dataset is fully annotated, you can leverage the metrics tab to ensure your dataset is fitting your needs. As dataset quality is key to ensuring model performances, those metrics should be used to assess dataset quality, diversity, and balancing: