Datasetversion

Properties


Methods

add_tags

add_tags(
   tags: Union[Tag, List[Tag]]
)

Description

Add some tags to an object.
It can be used on Data/MultiData/Asset/MultiAsset/DatasetVersion/Dataset/Model/ModelVersion.

You can give a Tag or a list of Tag.

Examples

tag_bicycle = client.create_tag("bicycle", Target.DATA)
tag_car = client.create_tag("car", Target.DATA)
tag_truck = client.create_tag("truck", Target.DATA)

data.add_tags(tag_bicycle)
data.add_tags([tag_car, tag_truck])

get_tags

get_tags()

Description

Retrieve tags of your dataset version.

Examples

tags = foo_dataset_version.get_tags()
assert tags[0].name == "training-dataset"

Returns

List of Tag objects


add_data

add_data(
   data: Union[Data, List[Data], MultiData], tags: Optional[List[Union[str,
   Tag]]] = None
)

Description

Feed this version with data coming from a datalake.

A versioned dataset DatasetVersion takes Data from Datalake and transform it as annotable Asset.
You can give tags that will be added as asset tags to every created asset.

Examples

foo_dataset = client.create_dataset('foo_dataset')
foo_dataset_version_1 = foo_dataset.create_version('first')
some_data = client.get_datalake().list_data(limit=1000)
foo_dataset_version_1.add_data(some_data)

Arguments

  • data (Data, List[Data] or MultiData) : data to add to dataset

  • tags (List of str or Tag) : tags to add to every asset created


fork

fork(
   version: str, description: Optional[str] = None, assets: Union[List[Asset],
   MultiAsset, Asset, None] = None, type: Union[InferenceType,
   str] = InferenceType.NOT_CONFIGURED, with_tags: bool = False
)

Description

Fork this dataset version into another dataset version, with the same origin.

Will create a new dataset version, with the same origin and the given version.
You can give a description and a default type.
You can give a list of asset coming from this dataset version to add into the new dataset version.
Only these assets will be added to the new dataset.
If with_tags is True, tags of each asset will be transferred to the new dataset version.

Examples

foo_dataset_version = client.get_dataset('my_datatest').get_version('first')
assets = foo_dataset_version.list_assets(limit=100)
bar_dataset_version = foo_dataset_version.fork('second', assets)

Arguments

  • version (str) : new version name

  • description (str) : description, defaults to "Forked from version '<version_name>'"

  • assets (MultiAsset or Asset) : assets to add to the new dataset version, defaults will be all assets

  • type (InferenceType) : inference type of the new dataset version, defaults to NOT_CONFIGURED

  • with_tags (bool) : if true tags of assets will be added to the new dataset version, defaults to false

Returns

A DatasetVersion with given assets


find_asset

find_asset(
   data: Optional[Data] = None, filename: Optional[str] = None,
   object_name: Optional[str] = None
)

Description

Find an asset into this dataset version

You can find it by giving its supposed Data object, its filename or its object name

Examples

my_asset = my_dataset_version.find_asset(filename="test.png")

Arguments

  • data (Data, optional) : data linked to asset. Defaults to None.

  • filename (str, optional) : filename of the asset. Defaults to None.

  • object_name (str, optional) : object name in the storage S3. Defaults to None.

Raises

If no asset match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 assets matching this query (for example if filename is duplicated)

Returns

The Asset found


find_all_assets

find_all_assets(
   filenames: Optional[List[str]] = None, object_names: Optional[List[str]] = None
)

Description

Find some assets of this dataset version from their filenames

Examples

my_asset = my_dataset_version.find_all_assets(filenames=["test.png", "image2.jpg"])

Arguments

  • filenames (List[str]) : filenames of the assets you're looking for. Defaults to None.

  • object_names (List[str]) : object names of the assets you're looking for. Defaults to None.

Returns

A list of Asset found


list_assets

list_assets(
   limit: Optional[int] = None, offset: Optional[int] = None,
   page_size: Optional[int] = None, order_by: Optional[List[str]] = None,
   tags: Union[Tag, List[Tag], str, List[str], None] = None, data_tags: Union[Tag,
   List[Tag], str, List[str], None] = None, intersect_tags: bool = False,
   intersect_data_tags: bool = False
)

Description

List assets of this dataset version

It will retrieve all assets object of this dataset.
You will then be able to manipulate them or add them to another dataset.

Examples

assets = foo_dataset_version.list_assets()

Arguments

  • limit (int) : limit to a certain number of assets

  • offset (int) : offset to access some new objects, if 0 will retrieve starting from the beginning

  • page_size (int) : page size when paginating.

  • order_by (str) : a list of string to use for sorting data, if None will not sort

  • tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given tags
    by default. if intersect_tags is True,
    it will return assets that have all the given tags

  • intersect_tags (bool, optional) : if True, and a list of tags is given, will return assets that have
    all the given tags. Defaults to False.

  • data_tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given
    data tags by default. if intersect_data_tags is True,
    it will return assets that have all the given data tags

  • intersect_data_tags (bool, optional) : if True, and a list of data tags is given, will return assets that have
    all the given data tags. Defaults to False.

Returns

A MultiAsset object that wraps some Asset that you can manipulate.


delete

delete()

Description

Delete a dataset version.

:warning: DANGER ZONE: Be very careful here!

It will remove this dataset version from our database, all of its assets and annotations will be removed.

Examples

foo_dataset_version.delete()

set_type

set_type(
   type: Union[str, InferenceType]
)

Description

Set type of Dataset.

Examples

dataset.set_type('detection')

update

update(
   version: Optional[str] = None, description: Optional[str] = None, type: Union[str,
   InferenceType, None] = None
)

Description

Update version, description and type of a Dataset.

Examples

dataset.update(description='My favourite dataset')

download

download(
   target_path: Union[str, Path, None] = None, force_replace: bool = False,
   max_workers: Optional[int] = None
)

Description

Downloads assets of a dataset.

It will download all assets from a dataset into specified folder.
If target_path is None, it will download into ./<dataset_name>/<dataset_version>
You can precise a number of threads to use while downloading.

Examples

foo_dataset.download('~/Downloads/dataset_pics')

Arguments

  • target_path (str or Path, optional) : Target folder. Defaults to None.

  • force_replace : (bool, optional): Replace an existing file if exists. Defaults to False.

  • max_workers (int, optional) : Number of max workers used to download. Defaults to os.cpu_count() + 4.


list_labels

list_labels()

Description

Get all labels of a dataset

It will retrieve a list of label objects.

Examples

foo_dataset.create_label("today")
labels = foo_dataset.get_labels()
assert labels[0].name == "today"

Returns

List of Label


create_label

create_label(
   name: str
)

Description

Add label to a dataset version.

You have to give a name to the label.

Examples

foo_dataset.create_label("today")

Arguments

  • name (str) : label name to add

Returns

A Label object


get_label

get_label(
   name: str
)

Description

Find label in a dataset version.

Examples

label = foo_dataset.get_label("today")

Arguments

  • name (str) : label name to find

Returns

A Label object


get_or_create_label

get_or_create_label(
   name: str
)

Description

Retrieve a label used in this dataset by its name.
If label does not exist, create it and return it.

Examples

label = self.get_or_create_label("new_label")

Arguments

  • name (str) : label to retrieve or create

Returns

A (label) object


list_annotations

list_annotations(
   worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
   limit: Optional[int] = None, offset: Optional[int] = None,
   order_by: Optional[List[str]] = None, page_size: Optional[int] = None
)

Description

Retrieve annotations of a dataset.

Examples

annotations = foo_dataset.list_annotations()

Arguments

  • limit (Optional[int], optional) : Limit number of annotations to retrieve.
    Defaults to None, all annotations will be retrieved.

  • offset (Optional[int], optional) : Offset to begin with when listing annotations.
    Defaults to None, starting at 0.

  • page_size (Optional[int], optional) : Size of each page when retrieving .
    Defaults to None, page will be equals to default pagination.

  • order_by (Optional[List[str]], optional) : Order annotation by some criteria.
    Defaults to None.

  • worker (Optional[Worker], optional) : Worker filter.
    Defaults to None.

  • status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve.
    Defaults to None.

Raises

  • NoDataError : When no annotations retrieved

Returns

A (MultiAnnotation) object


load_annotations

load_annotations(
   worker: Optional[Worker] = None, status: Optional[AnnotationStatus] = None,
   chunk_size: int = 1000, max_workers: Optional[int] = None, skip_error: bool = False
)

Description

Load these annotation by retrieving shapes with labels, asset_id and worker_id

Examples

dict_annotations = foo_dataset.load_annotations()

Arguments

  • worker (Optional[Worker], optional) : Worker filter. Defaults to None.

  • status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve. Defaults to None.

  • chunk_size (int, optional) : Size of chunk of annotations to load by request. Defaults to 1000.

  • max_workers (int, optional) : Number of max workers used to load annotations. Defaults to os.cpu_count() + 4.

  • skip_error (bool, optional) : skip error of a chunk and return partial annotations. Default to False


export_annotation_file

export_annotation_file(
   annotation_file_type: Union[AnnotationFileType, str], target_path: Union[str,
   Path] = './', assets: Union[MultiAsset, List[Asset], None] = None,
   worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
   force_replace: bool = True
)

Description

Export annotations of this dataset version into a file, and download it.

Giving 'worker' argument, you will retrieve only annotations of this worker if they exist.
If you don't give 'worker', it will only export the last created annotation and its shapes.

Examples

dataset_v0.export_annotation_file(AnnotationFileType.COCO, "./")

Arguments

  • annotation_file_type (AnnotationFileType) : choose to export in Pascal VOC format or COCO format.

  • target_path (str or Path, optional) : directory path where file is downloaded. Defaults to current directory.

  • assets (Union[MultiAsset, List[Asset], None], optional) : a list of assets of this dataset version.
    Only these assets will be concerned by this export. Defaults to None.

  • worker (Worker, optional) : worker of annotations. Defaults to None.

  • status (AnnotationStatus, optional) : status of annotations. Defaults to None.

  • force_replace (bool, optional) : if true, will replace an existing file annotation. Defaults to True.

Returns

Path of downloaded file.


import_annotation_voc_file

import_annotation_voc_file(
   file_path: Union[str, Path], worker: Optional[Worker] = None,
   mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
   force_create_label: bool = True
)

Description

Read a Pascal VOC file, parse it and create some annotations and shape for one given asset

Examples

dataset_v0.import_annotation_voc_file(file_path="voc.xml")

Arguments

  • file_path (str or Path) : Path of file to import

  • worker (Worker, optional) : Worker to use. Defaults to current user.

  • mode (ImportAnnotationMode, optional) : Mode used to import.
    REPLACE will delete worker annotation if exists and replace it.
    CONCATENATE will create shapes on existing annotation.
    SKIP will do nothing on existing annotation.
    Defaults to ImportAnnotationMode.REPLACE.

  • force_create_label (bool) : Ensures labels are created if they don't exist. Defaults to True.

Raises

  • FileNotFoundException : if file is not found

import_annotations_coco_file

import_annotations_coco_file(
   file_path: Union[Path, str], worker: Optional[Worker] = None,
   mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
   force_create_label: bool = True, fail_on_asset_not_found: bool = True
)

Description

Read a COCO file, parse it and create some annotations and shape for given assets

Examples

dataset_v0.import_annotations_coco_file(file_path="coco.json")

Arguments

  • file_path (str) : Path of file to import

  • worker (Worker, optional) : Worker to use. Defaults to current user.

  • mode (ImportAnnotationMode, optional) : Mode used to import.
    REPLACE will delete worker annotation if exists and replace it.
    CONCATENATE will create shapes on existing annotation.
    SKIP will do nothing on existing annotation.
    Defaults to ImportAnnotationMode.REPLACE.

  • force_create_label (bool) : Ensure labels are created if they don't exist. Defaults to True

  • fail_on_asset_not_found (bool) : Raise an error if asset is not found. Default to True

Raises

  • FileNotFoundException : if file is not found

Returns

(List[Tuple[Asset, Optional[Annotation]]]) : A list with tuples of Asset with non-skipped Annotation


delete_all_annotations

delete_all_annotations(
   workers: Optional[List[Worker]] = None
)

Description

Delete all annotations of this dataset

:warning: DANGER ZONE: Be very careful here!

It will remove all annotation of every asset of this dataset
You can give workers on which it will be effectively erased.

Examples

foo_dataset.delete_all_annotations()

synchronize

synchronize(
   target_dir: str, do_download: bool = False
)

Description

Synchronize this dataset with target dir by comparing assets in target dir with assets uploaded in dataset.

Examples

foo_dataset.synchronize('./foo_dataset/first')

Arguments

  • target_dir (str) : directory to synchronize against

  • do_download (bool) : do download files when they are not in local directory


get_or_create_asset_tag

get_or_create_asset_tag(
   name: str
)

Description

Retrieve an asset tag used in this dataset version by its name.
If tag does not exist, create it and return it.

Examples

tag = self.get_or_create_asset_tag("new_tag")

Arguments

  • name (str) : Tag to retrieve or create

Returns

A Tag object


create_asset_tag

create_asset_tag(
   name: str
)

Description

Create asset tag only available in this dataset version.

Examples

tag_dog = dataset_v0.create_asset_tag("dog")

Arguments

  • name (str) : name of tag

Returns

A Tag object


get_asset_tag

get_asset_tag(
   name: str
)

Description

Retrieve an asset tag used in this dataset version.

Examples

tag_dog = dataset_v0.get_asset_tag("dog")

Arguments

  • name (str) : Name of the tag you're looking for

Returns

A Tag object


convert_tags_to_classification

convert_tags_to_classification(
   tag_type: TagTarget, tags: List[Tag]
)

Description


list_asset_tags

list_asset_tags()

Description

List asset tags created in this dataset version

Examples

tags = dataset_v0.list_asset_tags()
    assert tag_dog in tags

Returns

A list of Tag


train_test_split

train_test_split(
   prop: float = 0.8, random_seed: Optional[Any] = None
)

Description

Train test split

Examples

train_assets, eval_assets, count_train, count_eval, labels = dataset.train_test_split()

Arguments

  • prop (float, optional) : Percentage of data for training set. Defaults to 0.8.

  • random_seed (Any, optional) : Use a seed to ensures same result if run multiple times.

Returns

A tuple with all of this information (
list of train assets,
list of test assets,
dict of repartition of classes for train assets,
dict of repartition of classes for test assets,
list of labels
)