Properties
Methods
upload_data
upload_data(
filepaths: Union[str, Path, List[Union[str, Path]]],
tags: Optional[List[Union[str, Tag]]] = None, source: Union[str, DataSource,
None] = None, max_workers: Optional[int] = None,
error_manager: Optional[ErrorManager] = None, metadata: Union[None, Dict,
List[Dict]] = None, fill_metadata: Optional[bool] = False
)
Description
Upload data into this datalake.
Upload files representing data, into a datalake.
You can give some tags as a list.
You can give a source for your data.
If some data fails to upload, check the example to see how
to retrieve the list of file paths that failed.
For more information about metadata, check https://documentation.picsellia.com/docs/metadata
Examples
tag_car = client.get_data_tag("car")
tag_huge_car = client.get_data_tag("huge-car")
source_camera_one = client.get_datasource("camera-one")
source_camera_two = client.get_datasource("camera-two")
lake = client.get_datalake()
lake.upload_data(filepaths=["porsche.png", "ferrari.png"], tags=[tag_car], source=source_camera_one)
error_manager = ErrorManager()
lake.upload_data(filepaths=["twingo.png", "path/unknown.png", error_manager=error_manager)
# This call will return a list of UploadError to see what was wrong
error_paths = [error.path for error in error_manager.errors]
Arguments
-
filepaths (str or Path or List[str or Path]) : Filepaths of your data
-
tags (List[Tag], optional) : Data Tags that will be given to data. Defaults to [].
-
source (DataSource, optional) : Source of your data.
-
max_workers (int, optional) : Number of max workers used to upload. Defaults to os.cpu_count() + 4.
-
error_manager (ErrorManager, optional) : Giving an ErrorManager will allow you to retrieve errors
-
metadata (Dict or List[Dict], optional) : Add some metadata to given data, filepaths length must match
this parameter. Defaults to no metadata. -
fill_metadata (bool, optional) : Whether or not read exif tags of image and add it into metadata field.
If some fields are already given in metadata fields, they will be override.
Returns
A Data object or a MultiData object that wraps a list of Data.
find_data
find_data(
filename: Optional[str] = None, object_name: Optional[str] = None, id: Union[str,
UUID, None] = None
)
Description
Find a data into this datalake
You can find it by giving its filename or its object name or its id
Examples
my_data = my_datalake.find_asset(filename="test.png")
Arguments
-
filename (str, optional) : filename of the data. Defaults to None.
-
object_name (str, optional) : object name in the storage S3. Defaults to None.
-
id (str or UUID, optional) : id of the data. Defaults to None
Raises
If no data match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 data matching this query (for example if filename is duplicated)
Returns
The Data found
list_data
list_data(
limit: Optional[int] = None, offset: Optional[int] = None,
page_size: Optional[int] = None, order_by: Optional[List[str]] = None,
tags: Union[str, Tag, List[Union[str, Tag]], None] = None,
filenames: Optional[List[str]] = None, intersect_tags: Optional[bool] = False,
object_names: Optional[List[str]] = None, q: Optional[str] = None
)
Description
List data of this datalake.
If there is no data, raise a NoDataError exception.
Returned object is a MultiData. An object that allows manipulation of a bunch of data.
You can add tags on them or feed a dataset with them.
Examples
lake = client.get_datalake()
data = lake.list_data()
Args
-
limit (int, optional) : if given, will limit the number of data returned
-
offset (int, optional) : if given, will return data that would have been returned
after this offset in given order -
page_size (int, optional) : page size when returning data paginated, can change performance
-
order_by (list[str], optional) : if not empty, will order data by fields given in this parameter
-
filenames (list[str], optional) : if given, will return data that have filename equals to one of given filenames
-
object_names (list[str], optional) : if given, will return data that have object name equals to one of given object names
-
tags (str, Tag, list[Tag or str], optional) : if given, will return data that have one of given tags
by default. ifintersect_tags
is True, it will return data
that have all the given tags -
intersect_tags (bool, optional) : if True, and a list of tags is given, will return data that have
all the given tags. Defaults to False. -
q (str, optional) : if given, will filter data with given query. Defaults to None.
Raises
- NoDataError : When datalake has no data, raise this exception.
Returns
A MultiData object that wraps a list of Data.
create_data_tag
create_data_tag(
name: str
)
Description
Create a data tag used in this datalake
Examples
tag_car = lake.create_data_tag("car")
Arguments
- name (str) : Name of this tag
Returns
A Tag object
get_data_tag
get_data_tag(
name: str
)
Description
Retrieve a data tag used in this datalake.
Examples
tag_car = lake.get_data_tag("car")
Arguments
- name (str) : Name of the tag you're looking for
Returns
A Tag object
get_or_create_data_tag
get_or_create_data_tag(
name: str
)
Description
Retrieve a data tag used in this datalake by its name.
If tag does not exist, create it and return it.
Examples
tag = self.get_or_create_data_tag("new_tag")
Arguments
- name (str) : Tag to retrieve or create
Returns
A Tag object
list_data_tags
list_data_tags(
limit: Optional[int] = None, offset: Optional[int] = None,
order_by: Optional[List[str]] = None
)
Description
List all tags of this datalake
Examples
tags = lake.list_data_tags()
assert tag_car in tags
Returns
A List of Tag
find_all_datas
find_all_datas(
object_names: List[str]
)
Description