Datalake

Datalake


get_resource_url_on_platform

Signature

get_resource_url_on_platform()

Description

Get platform url of this resource.

Examples

    print(foo_dataset.get_resource_url_on_platform())
    >>> "https://app.picsellia.com/datalake/62cffb84-b92c-450c-bc37-8c4dd4d0f590"

Returns

Url on Platform for this resource


upload_data

Signature

upload_data(
   filepaths: Union[str, List[str]], tags: Optional[List[Union[str, Tag]]] = None,
   source: Union[str, DataSource, None] = None, max_workers: Optional[int] = None
)

Description

Upload data into this datalake.

Upload files representing data, into a datalake.
You can give some tags as a list.
You can give a source for your data.

Examples

    tag_car = client.get_data_tag("car")
    tag_huge_car = client.get_data_tag("huge-car")
    source_camera_one = client.get_datasource("camera-one")
    source_camera_two = client.get_datasource("camera-two")

    lake = client.get_datalake()
    lake.upload_data(filepaths=["porsche.png", "ferrari.png"], tags=[tag_car], source=source_camera_one)
    lake.upload_data(filepaths="truck.png", tags=[tag_huge_car], source=source_camera_two)

Arguments

  • filepaths (str or List[str]) : Filepaths of your data

  • tags (List[Tag], optional) : Data Tags that will be given to data. Defaults to [].

  • source (DataSource, optional) : Source of your data.

  • max_workers (int) : Max workers to use when multithreading upload

Returns

A Data object or a MultiData object that wraps a list of Data.


find_data

Signature

find_data(
   filename: Optional[str] = None, object_name: Optional[str] = None
)

Description

Find a data into this datalake

You can find it by giving its filename or its object name

Examples

    my_data = my_datalake.find_asset(filename="test.png")

Arguments

  • filename (str, optional) : filename of the data. Defaults to None.

  • object_name (str, optional) : object name in the storage S3. Defaults to None.

Raises

If no data match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 data matching this query (for example if filename is duplicated)

Returns

The Data found


list_data

Signature

list_data(
   limit: Optional[int] = None, offset: Optional[int] = None,
   page_size: Optional[int] = None, order_by: Optional[List[str]] = None,
   tags: Union[str, Tag, List[Union[str, Tag]], None] = None,
   filenames: Optional[List[str]] = None
)

Description

List all data of this datalake.

If there is no data, raise a NoDataError exception.

Returned object is a MultiData. An object that allows manipulation of a bunch of data.
You can add tags on them or feed a dataset with them.

Examples

    lake = client.get_datalake()
    data = lake.list_data()

Raises

  • NoDataError : When datalake has no data, raise this exception.

Returns

A MultiData object that wraps a list of Data.


create_data_tag

Signature

create_data_tag(
   name: str
)

Description

Create a data tag used in this datalake

Examples

    tag_car = lake.create_data_tag("car")

Arguments

  • name (str) : Name of this tag

Returns

A Tag object


get_data_tag

Signature

get_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake.

Examples

    tag_car = lake.get_data_tag("car")

Arguments

  • name (str) : Name of the tag you're looking for

Returns

A Tag object


get_or_create_data_tag

Signature

get_or_create_data_tag(
   name: str
)

Description

Retrieve a data tag used in this datalake by its name.
If tag does not exist, create it and return it.

Examples

    tag = self.get_or_create_data_tag("new_tag")

Arguments

  • name (str) : Tag to retrieve or create

Returns

A Tag object


list_data_tags

Signature

list_data_tags(
   limit: Optional[int] = None, offset: Optional[int] = None,
   order_by: Optional[List[str]] = None
)

Description

List all tags of this datalake

Examples

    tags = lake.list_data_tags()
    assert tag_car in tags

Returns

A List of Tag


find_all_datas

Signature

find_all_datas(
   object_names: List[str]
)

Description