Datalake - Philosophy and infrastructure
A Datalake
is a place shared by all the members of an Organization to gather all the images (called Data
) in the frame of your Computer Vision projects.
The Datalake
feature mainly aims at having all your Data
available for visualization, structuring, and exploration.
First of all, it is important to note that an Organization can have several Datalake
.
Each Datalake
is connected through a Storage Connector to a bucket on an Object Storage (hosted by a Cloud provider) where the Data
visualized on the Datalake
is physically stored.
When creating a new Picsellia Organization, a new dedicated bucket is created on the Picsellia Object Storage (hosted by AWS). The Datalake
of the freshly created Organization is called Default and is connected to this bucket, tho all Data
uploaded to this Datalake
will be physically stored on this Picsellia Object Storage.
However, you can also decide to create a new Datalake
for your Organization and connect it to your own bucket hosted by your Cloud provider. To do so, please refer to this tutorial.
You can easily switch from a Datalake
to another using the navigation bar as shown below:
Every machine learning project begins with data, and in our case of Computer Vision, it starts with images.
There are two ways to upload your Data
using Picsellia:
- Import Data already stored on your own Cloud Object Storage to Picsellia's
Datalake
and access them through Picsellia. - Upload locally stored
Data
directly to Picsellia. In this scenario, yourData
will be physically stored by Picsellia on the bucket linked to the currentDatalake
.
Please note that depending on the Datalake
you are using, only one or both methods are available.
Indeed, if you are accessing the Datalake
connected to the bucket created for you on the Picsellia Object Storage, which is the Datalake called default and created for you when initializing a new Organization, you will only be able to upload Data from your local drive. The uploaded Data
will be physically stored on Picsellia's Object Storage and visualized on default Datalake
using the Storage Connector created by default for your Organization named hinokuni-storage-production.
If you create a new Datalake following this tutorial which will use a new Storage Connector linked to your own bucket, then you will be able to either:
- Upload
Data
from a local drive as explained here. in this case, the uploadedData
will be physically stored by Picsellia on your bucket - Import
Data
already stored on your bucket as explained here. In this case, you will be able to visualize and exploreData
already physically stored on your bucket through your PicselliaDatalake
.
All the users with Admin rights in a given Organization can access the Organization Settings, particularly the Storages and Datalakes tab. From this one, you can manage the existing Datalake and Storage Connectors. More details are available in this tutorial.
Updated about 1 year ago