Datalake - Dataset creation

From the Datalake, you will at some point need to create a Dataset containing Data that have been previously uploaded to Datalake.

1. Select the Data

To create a Dataset on Picsellia, you first need to select from the Datalake the Data that will compose your Dataset. To do so you can rely on the ordering, filtering, and selecting features on the Datalake detailed in the previously.

Once Data are selected in the Datalake, a button Dataset should appear:

Select Data

Dataset creation from Datalake

2. Dataset creation

Picsellia proposes a Dataset Versioning system, which is detailed here, but basically, it means that each Dataset can have several versions (called DatasetVersion).

Knowing that the Dataset button displayed in the Datalake when Data is selected is proposing several possibilities:

  • "Create New Dataset": It creates a new Dataset and its first version with selected Data. A modal opens where you can type the name of the new Dataset that will be created. The first DatasetVersion that will be also created will by default be named initial.
  • "Create New Dataset Version": It creates a new DatasetVersion of an existing Dataset with selected Data. A modal opens to select the Dataset in which a new DatasetVersion will be created and give a name to this new DatasetVersion.
  • "Add To Existing `DatasetVersion": It appends the selected Data to an existing DatasetVersion. A modal opens to select the Dataset and DatasetVersion to which the selected Data will be added.
Dataset Creation

Dataset creation possibilities

🚧

A Data can not be two times part of a given DatasetVersion

You can now jump to the Dataset tab of Picsellia to visualize the freshly created DatasetVersion.