Datalake

Properties


Methods

upload_data

upload_data(
   filepaths: Union[str, Path, List[Union[str, Path]]],
   tags: Optional[List[Union[str, Tag]]] = None, source: Union[str, DataSource,
   None] = None, max_workers: Optional[int] = None,
   error_manager: Optional[ErrorManager] = None
)

Description

Upload data into this datalake.

Upload files representing data, into a datalake.
You can give some tags as a list.
You can give a source for your data.

If some data fails to upload, check the example to see how
to retrieve the list of file paths that failed.

Examples

tag_car = client.get_data_tag("car")
tag_huge_car = client.get_data_tag("huge-car")
source_camera_one = client.get_datasource("camera-one")
source_camera_two = client.get_datasource("camera-two")

lake = client.get_datalake()
lake.upload_data(filepaths=["porsche.png", "ferrari.png"], tags=[tag_car], source=source_camera_one)
lake.upload_data(filepaths="truck.png", tags=[tag_huge_car], source=source_camera_two)


error_manager = ErrorManager()
lake.upload_data(filepaths=["twingo.png", "path/unknown.png", error_manager=error_manager)

# This call will return a list of UploadError to see what was wrong
error_paths = [error.path for error in error_manager.errors]

Arguments

  • filepaths (str or Path or List[str or Path]) : Filepaths of your data

  • tags (List[Tag], optional) : Data Tags that will be given to data. Defaults to [].

  • source (DataSource, optional) : Source of your data.

  • max_workers (int, optional) : Number of max workers used to upload. Defaults to os.cpu_count() + 4.

  • error_manager (ErrorManager, optional) : Giving an ErrorManager will allow you to retrieve errors

Returns

A Data object or a MultiData object that wraps a list of Data.


find_data

find_data(
   filename: Optional[str] = None, object_name: Optional[str] = None, id: Union[str,
   UUID, None] = None
)

Description

Find a data into this datalake

You can find it by giving its filename or its object name or its id

Examples

my_data = my_datalake.find_asset(filename="test.png")

Arguments

  • filename (str, optional) : filename of the data. Defaults to None.

  • object_name (str, optional) : object name in the storage S3. Defaults to None.

  • id (str or UUID, optional) : id of the data. Defaults to None

Raises

If no data match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 data matching this query (for example if filename is duplicated)

Returns

The Data found


list_data

list_data(
   limit: Optional[int] = None, offset: Optional[int] = None,
   page_size: Optional[int] = None, order_by: Optional[List[str]] = None,
   tags: Union[str, Tag, List[Union[str, Tag]], None] = None,
   filenames: Optional[List[str]] = None, intersect_tags: Optional[bool] = False,
   object_names: Optional[List[str]] = None
)

Description

List data of this datalake.

If there is no data, raise a NoDataError exception.

Returned object is a MultiData. An object that allows manipulation of a bunch of data.
You can add tags on them or feed a dataset with them.

Examples

lake = client.get_datalake()
data = lake.list_data()

Args

  • limit (int, optional) : if given, will limit the number of data returned

  • offset (int, optional) : if given, will return data that would have been returned
    after this offset in given order

  • page_size (int, optional) : page size when returning data paginated, can change performance

  • order_by (list[str], optional) : if not empty, will order data by fields given in this parameter

  • filenames (list[str], optional) : if given, will return data that have filename equals to one of given filenames

  • object_names (list[str], optional) : if given, will return data that have object name equals to one of given object names

  • tags (str, Tag, list[Tag or str], optional) : if given, will return data that have one of given tags
    by default. if intersect_tags is True, it will return data
    that have all the given tags

  • intersect_tags (bool, optional) : if True, and a list of tags is given, will return data that have
    all the given tags. Defaults to False.

Raises

  • NoDataError : When datalake has no data, raise this exception.

Returns

A MultiData object that wraps a list of Data.


create_data_tag

create_data_tag(
   name: str
)

Description

Create a data tag used in this datalake

Examples

tag_car = lake.create_data_tag("car")

Arguments

  • name (str) : Name of this tag

Returns

A Tag object


get_data_tag

get_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake.

Examples

tag_car = lake.get_data_tag("car")

Arguments

  • name (str) : Name of the tag you're looking for

Returns

A Tag object


get_or_create_data_tag

get_or_create_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake by its name.
If tag does not exist, create it and return it.

Examples

tag = self.get_or_create_data_tag("new_tag")

Arguments

  • name (str) : Tag to retrieve or create

Returns

A Tag object


list_data_tags

list_data_tags(
   limit: Optional[int] = None, offset: Optional[int] = None,
   order_by: Optional[List[str]] = None
)

Description

List all tags of this datalake

Examples

tags = lake.list_data_tags()
assert tag_car in tags

Returns

A List of Tag


find_all_datas

find_all_datas(
   object_names: List[str]
)

Description