Properties

name Name of this Datalake
connector_id Connector id used by this Datalake

Methods

upload_data

upload_data(
   filepaths: Union[str, Path, list[Union[str, Path]]],
   tags: Optional[list[Union[str, Tag]]] = None, source: Union[str, DataSource,
   None] = None, max_workers: Optional[int] = None,
   error_manager: Optional[ErrorManager] = None, metadata: Union[None, dict,
   list[dict]] = None, fill_metadata: Optional[bool] = False,
   wait_for_unprocessable_data: Optional[bool] = True
)

Description

Upload data into this datalake.

Upload files representing data, into a datalake.
You can give some tags as a list.
You can give a source for your data.

If some data fails to upload, check the example to see how
to retrieve the list of file paths that failed.

For more information about metadata, check https://documentation.picsellia.com/docs/metadata

Examples

from picsellia.services.error_manager import ErrorManager

source_camera_one = client.get_datasource("camera-one")
source_camera_two = client.get_datasource("camera-two")

lake = client.get_datalake()

tag_car = lake.get_data_tag("car")
tag_huge_car = lake.get_data_tag("huge-car")

lake.upload_data(filepaths=["porsche.png", "ferrari.png"], tags=[tag_car], source=source_camera_one)

error_manager = ErrorManager()
lake.upload_data(filepaths=["twingo.png", "path/unknown.png", error_manager=error_manager)

# This call will return a list of UploadError to see what was wrong
error_paths = [error.path for error in error_manager.errors]

Arguments

filepaths (str or Path or list[str or Path]) : Filepaths of your data
tags (list[Tag], optional) : Data Tags that will be given to data. Defaults to [].
source (DataSource, optional) : Source of your data.
max_workers (int, optional) : Number of max workers used to upload. Defaults to os.cpu_count() + 4.
error_manager (ErrorManager, optional) : Giving an ErrorManager will allow you to retrieve errors
metadata (Dict or list[Dict], optional) : Add some metadata to given data, filepaths length must match
this parameter. Defaults to no metadata.
fill_metadata (bool, optional) : Whether read exif tags of image and add it into metadata field.
If some fields are already given in metadata fields, they will be overridden.
wait_for_unprocessable_data (bool, optional) : If true, this method will wait for all data to be fully
uploaded and processed by our services. Defaults to true.

Returns

A Data object or a MultiData object that wraps a list of Data.

find_data

find_data(
   filename: Optional[str] = None, object_name: Optional[str] = None, id: Union[str,
   UUID, None] = None
)

Description

Find a data into this datalake

You can find it by giving its filename or its object name or its id

Examples

my_data = my_datalake.find_data(filename="test.png")

Arguments

filename (str, optional) : filename of the data. Defaults to None.
object_name (str, optional) : object name in the storage S3. Defaults to None.
id (str or UUID, optional) : id of the data. Defaults to None

Raises

If no data match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 data matching this query (for example if filename is duplicated)

Returns

The Data found

list_data

list_data(
   limit: Optional[int] = None, offset: Optional[int] = None,
   page_size: Optional[int] = None, order_by: Optional[list[str]] = None,
   tags: Union[str, Tag, list[Union[str, Tag]], None] = None,
   filenames: Optional[list[str]] = None, intersect_tags: Optional[bool] = False,
   object_names: Optional[list[str]] = None, q: Optional[str] = None,
   ids: Optional[list[Union[str, UUID]]] = None
)

Description

List data of this datalake.

If there is no data, raise a NoDataError exception.

Returned object is a MultiData. An object that allows manipulation of a bunch of data.
You can add tags on them or feed a dataset with them.

Examples

lake = client.get_datalake()
data = lake.list_data()

Arguments

limit (int, optional) : if given, will limit the number of data returned
offset (int, optional) : if given, will return data that would have been returned
after this offset in given order
page_size (int, optional) : page size when returning data paginated, can change performance
order_by (list[str], optional) : if not empty, will order data by fields given in this parameter
filenames (list[str], optional) : if given, will return data that have filename equals to one of given filenames
object_names (list[str], optional) : if given, will return data that have object name equals to one of given object names
tags (str, Tag, list[Tag or str], optional) : if given, will return data that have one of given tags
by default. if intersect_tags is True, it will return data
that have all the given tags
intersect_tags (bool, optional) : if True, and a list of tags is given, will return data that have
all the given tags. Defaults to False.
q (str, optional) : if given, will filter data with given query. Defaults to None.
ids : (list[UUID]): ids of the data you're looking for. Defaults to None.

Raises

NoDataError : When datalake has no data, raise this exception.

Returns

A MultiData object that wraps a list of Data.

create_data_tag

create_data_tag(
   name: str
)

Description

Create a data tag used in this datalake

Examples

tag_car = lake.create_data_tag("car")

Arguments

name (str) : Name of the tag to create

Returns

A Tag object

get_data_tag

get_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake.

Examples

tag_car = lake.get_data_tag("car")

Arguments

name (str) : Name of the tag to retrieve

Returns

A Tag object

get_or_create_data_tag

get_or_create_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake by its name.
If tag does not exist, create it and return it.

Examples

tag = lake.get_or_create_data_tag("new_tag")

Arguments

name (str) : Name of the tag to retrieve or create

Returns

A Tag object

list_data_tags

list_data_tags(
   limit: Optional[int] = None, offset: Optional[int] = None,
   order_by: Optional[list[str]] = None
)

Description

List all tags of this datalake

Examples

tags = lake.list_data_tags()
assert tag_car in tags

Arguments

limit (int, optional) : Limit the number of tags returned. Defaults to None.
offset (int, optional) : Offset to start listing tags. Defaults to None.
order_by (list[str], optional) : Order the tags returned by given fields. Defaults to None.

Returns

A List of Tag

find_all_datas

find_all_datas(
   object_names: list[str]
)

Description

create_projection

create_projection(
   data: Data, name: str, path: str, additional_info: dict = None,
   fill_metadata: bool = False
)

Description

import_bucket_objects

import_bucket_objects(
   prefixes: list[str], tags: Optional[list[Union[str, Tag]]] = None,
   source: Union[str, DataSource, None] = None
)

Description

Asynchronously import objects from your bucket where object names begins with given prefixes.

Args

prefixes : list of prefixes to import
tags : list of tags that will be added to data
source : data source that will be specified on data

Returns

A Job that you can wait for done.