Properties
Methods
upload_data
upload_data(
filepaths: Union[str, Path, list[Union[str, Path]]],
tags: Optional[list[Union[str, Tag]]] = None, source: Union[str, DataSource,
None] = None, max_workers: Optional[int] = None,
error_manager: Optional[ErrorManager] = None, metadata: Union[None, dict,
list[dict]] = None, fill_metadata: Optional[bool] = False,
wait_for_unprocessable_data: Optional[bool] = True
)
Description
Upload data into this datalake.
Upload files representing data, into a datalake.
You can give some tags as a list.
You can give a source for your data.
If some data fails to upload, check the example to see how
to retrieve the list of file paths that failed.
For more information about metadata, check https://documentation.picsellia.com/docs/metadata
Examples
from picsellia.services.error_manager import ErrorManager
source_camera_one = client.get_datasource("camera-one")
source_camera_two = client.get_datasource("camera-two")
lake = client.get_datalake()
tag_car = lake.get_data_tag("car")
tag_huge_car = lake.get_data_tag("huge-car")
lake.upload_data(filepaths=["porsche.png", "ferrari.png"], tags=[tag_car], source=source_camera_one)
error_manager = ErrorManager()
lake.upload_data(filepaths=["twingo.png", "path/unknown.png", error_manager=error_manager)
# This call will return a list of UploadError to see what was wrong
error_paths = [error.path for error in error_manager.errors]
Arguments
-
filepaths (str or Path or list[str or Path]) : Filepaths of your data
-
tags (list[Tag], optional) : Data Tags that will be given to data. Defaults to [].
-
source (DataSource, optional) : Source of your data.
-
max_workers (int, optional) : Number of max workers used to upload. Defaults to os.cpu_count() + 4.
-
error_manager (ErrorManager, optional) : Giving an ErrorManager will allow you to retrieve errors
-
metadata (Dict or list[Dict], optional) : Add some metadata to given data, filepaths length must match
this parameter. Defaults to no metadata. -
fill_metadata (bool, optional) : Whether read exif tags of image and add it into metadata field.
If some fields are already given in metadata fields, they will be overridden. -
wait_for_unprocessable_data (bool, optional) : If true, this method will wait for all data to be fully
uploaded and processed by our services. Defaults to true.
Returns
A Data object or a MultiData object that wraps a list of Data.
find_data
find_data(
filename: Optional[str] = None, object_name: Optional[str] = None, id: Union[str,
UUID, None] = None
)
Description
Find a data into this datalake
You can find it by giving its filename or its object name or its id
Examples
my_data = my_datalake.find_data(filename="test.png")
Arguments
-
filename (str, optional) : filename of the data. Defaults to None.
-
object_name (str, optional) : object name in the storage S3. Defaults to None.
-
id (str or UUID, optional) : id of the data. Defaults to None
Raises
If no data match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 data matching this query (for example if filename is duplicated)
Returns
The Data found
list_data
list_data(
limit: Optional[int] = None, offset: Optional[int] = None,
page_size: Optional[int] = None, order_by: Optional[list[str]] = None,
tags: Union[str, Tag, list[Union[str, Tag]], None] = None,
filenames: Optional[list[str]] = None, intersect_tags: Optional[bool] = False,
object_names: Optional[list[str]] = None, q: Optional[str] = None,
ids: Optional[list[Union[str, UUID]]] = None
)
Description
List data of this datalake.
If there is no data, raise a NoDataError exception.
Returned object is a MultiData. An object that allows manipulation of a bunch of data.
You can add tags on them or feed a dataset with them.
Examples
lake = client.get_datalake()
data = lake.list_data()
Arguments
-
limit (int, optional) : if given, will limit the number of data returned
-
offset (int, optional) : if given, will return data that would have been returned
after this offset in given order -
page_size (int, optional) : page size when returning data paginated, can change performance
-
order_by (list[str], optional) : if not empty, will order data by fields given in this parameter
-
filenames (list[str], optional) : if given, will return data that have filename equals to one of given filenames
-
object_names (list[str], optional) : if given, will return data that have object name equals to one of given object names
-
tags (str, Tag, list[Tag or str], optional) : if given, will return data that have one of given tags
by default. ifintersect_tags
is True, it will return data
that have all the given tags -
intersect_tags (bool, optional) : if True, and a list of tags is given, will return data that have
all the given tags. Defaults to False. -
q (str, optional) : if given, will filter data with given query. Defaults to None.
-
ids : (list[UUID]): ids of the data you're looking for. Defaults to None.
Raises
- NoDataError : When datalake has no data, raise this exception.
Returns
A MultiData object that wraps a list of Data.
create_data_tag
create_data_tag(
name: str
)
Description
Create a data tag used in this datalake
Examples
tag_car = lake.create_data_tag("car")
Arguments
- name (str) : Name of the tag to create
Returns
A Tag object
get_data_tag
get_data_tag(
name: str
)
Description
Retrieve a data tag used in this datalake.
Examples
tag_car = lake.get_data_tag("car")
Arguments
- name (str) : Name of the tag to retrieve
Returns
A Tag object
get_or_create_data_tag
get_or_create_data_tag(
name: str
)
Description
Retrieve a data tag used in this datalake by its name.
If tag does not exist, create it and return it.
Examples
tag = lake.get_or_create_data_tag("new_tag")
Arguments
- name (str) : Name of the tag to retrieve or create
Returns
A Tag object
list_data_tags
list_data_tags(
limit: Optional[int] = None, offset: Optional[int] = None,
order_by: Optional[list[str]] = None
)
Description
List all tags of this datalake
Examples
tags = lake.list_data_tags()
assert tag_car in tags
Arguments
-
limit (int, optional) : Limit the number of tags returned. Defaults to None.
-
offset (int, optional) : Offset to start listing tags. Defaults to None.
-
order_by (list[str], optional) : Order the tags returned by given fields. Defaults to None.
Returns
A List of Tag
find_all_datas
find_all_datas(
object_names: list[str]
)
Description
create_projection
create_projection(
data: Data, name: str, path: str, additional_info: dict = None,
fill_metadata: bool = False
)
Description
import_bucket_objects
import_bucket_objects(
prefixes: list[str], tags: Optional[list[Union[str, Tag]]] = None,
source: Union[str, DataSource, None] = None
)
Description
Asynchronously import objects from your bucket where object names begins with given prefixes.
Args
-
prefixes : list of prefixes to import
-
tags : list of tags that will be added to data
-
source : data source that will be specified on data
Returns
A Job that you can wait for done.