Datalake - Philosophy and infrastructure
A Datalake is a place shared by all the members of an Organization to gather all the images (called Data) in the frame of your Computer Vision projects.
The Datalake feature mainly aims at having all your Data available for visualization, structuring, and exploration.
First of all, it is important to note that an Organization can have several Datalake.
Each Datalake is connected through a Storage Connector to a bucket on an Object Storage (hosted by a Cloud provider) where the Data visualized on the Datalake is physically stored.
When creating a new Picsellia Organization, a new dedicated bucket is created on the Picsellia Object Storage (hosted by AWS). The Datalake of the freshly created Organization is called Default and is connected to this bucket, tho all Data uploaded to this Datalake will be physically stored on this Picsellia Object Storage.
However, you can also decide to create a new Datalake for your Organization and connect it to your own bucket hosted by your Cloud provider. To do so, please refer to this tutorial.
You can easily switch from a Datalake to another using the navigation bar as shown below:

Muli-Datalake navigation
Every machine learning project begins with data, and in our case of Computer Vision, it starts with images.
There are two ways to upload your Data using Picsellia:
- Import Data already stored on your own Cloud Object Storage to Picsellia's
Datalakeand access them through Picsellia. - Upload locally stored
Datadirectly to Picsellia. In this scenario, yourDatawill be physically stored by Picsellia on the bucket linked to the currentDatalake.
Please note that depending on the Datalake you are using, only one or both methods are available.
Indeed, if you are accessing the Datalake connected to the bucket created for you on the Picsellia Object Storage, which is the Datalake called default and created for you when initializing a new Organization, you will only be able to upload Data from your local drive. The uploaded Data will be physically stored on Picsellia's Object Storage and visualized on default Datalake using the Storage Connector created by default for your Organization named hinokuni-storage-production.

default Datalake create for each Organization
If you create a new Datalake following this tutorial which will use a new Storage Connector linked to your own bucket, then you will be able to either:
- Upload
Datafrom a local drive as explained here. in this case, the uploadedDatawill be physically stored by Picsellia on your bucket - Import
Dataalready stored on your bucket as explained here. In this case, you will be able to visualize and exploreDataalready physically stored on your bucket through your PicselliaDatalake.
All the users with Admin rights in a given Organization can access the Organization Settings, particularly the Storages and Datalakes tab. From this one, you can manage the existing Datalake and Storage Connectors. More details are available in this tutorial.

Storages and Datalakes tab
Updated 4 months ago