5. Create your first Dataset

Objectives:

  • Create your first Dataset & DatasetVersion
  • Use the Dataset versioning system

1. It is now time to create your first DatasetVersion.

Now that your Data are uploaded on Picsellia, you can create your first Dataset, which will be used to train a Model for instance. To do so, you need to leverage your Datalake.

From your Datalake, you need to search for the images that need to be included in your DatasetVersion using the Search Bar and select the subset of images to be included in this new DatasetVersion.For example, to create a DatasetVersion with all the images uploaded with the DataTag smart_city, you need to search those images using the Search Bar, select all and or part of them (Select all or Select subset buttons) and initiate the DatasetVersion by clicking on Dataset as shown below:

For traceability and in order to ease the management of your DatasetVersion, you will be requested to add a title and description.

You can now access the recently created DatasetVersion, by accessing the Datasets view. Picsellia's Dataset versioning system allows you to create as many versions as you want of a given Dataset, each version of a Datasetis called a DatasetVersion. We believe that it is crucial to keep the history of Dataset for a given project as we know that finding the perfect & balanced Dataset requires work and modifications on Data.

You can select the Dataset you're interested in from the Dataset view and display all its existing DatasetVersion (at this step, only one DatasetVersion should be available).

You will see all the Asset composing this first version of your Dataset by clicking on the DatasetVersion.

At this step, the DatasetVersion should be free of Annotation.

📘

Each image composing the DatasetVersion is called an Asset on Picsellia

To do it with the SDK:

2. Visualize a DatasetVersion

You will access the Assets view by clicking on the DatasetVersion icon.

First, the Assets view offers the possibility to visualize your DatasetVersion and browse it among its Asset. Each DatasetVersion also embeds its own tagging system (independent from the Datalake tagging system) allowing you to structure your DatasetVersion as you wish using AssetTag.

As it this case for the Datalake, the QueryLangage allows you to navigate your DatasetVersion based on various properties and metadata.

At this step, the DatasetVersion should be free of Annotation.