Datasetversion

Properties

origin_id UUID of the Dataset origin
name Name of the Dataset origin
version Version of this DatasetVersion
type Type of this DatasetVersion

Methods

add_tags

add_tags(
   tags: Union[Tag, List[Tag]]
)

Description

Add some tags to an object.
It can be used on Data/MultiData/Asset/MultiAsset/DatasetVersion/Dataset/Model/ModelVersion.

You can give a Tag or a list of Tag.

Examples

tag_bicycle = client.create_tag("bicycle", Target.DATA)
tag_car = client.create_tag("car", Target.DATA)
tag_truck = client.create_tag("truck", Target.DATA)

data.add_tags(tag_bicycle)
data.add_tags([tag_car, tag_truck])

get_tags

get_tags()

Description

Retrieve tags of your dataset version.

Examples

tags = foo_dataset_version.get_tags()
assert tags[0].name == "training-dataset"

Returns

List of Tag objects

add_data

add_data(
   data: Union[Data, List[Data], MultiData], tags: Optional[List[Union[str,
   Tag]]] = None
)

Description

Feed this version with data coming from a datalake.

A versioned dataset DatasetVersion takes Data from Datalake and transform it as annotable Asset.
You can give tags that will be added as asset tags to every created asset.

Examples

foo_dataset = client.create_dataset('foo_dataset')
foo_dataset_version_1 = foo_dataset.create_version('first')
some_data = client.get_datalake().list_data(limit=1000)
foo_dataset_version_1.add_data(some_data)

Arguments

data (Data, List[Data] or MultiData) : data to add to dataset
tags (List of str or Tag) : tags to add to every asset created

Returns

A Job object that you can use to monitor the progress of this operation.

fork

fork(
   version: str, description: Optional[str] = None, assets: Union[List[Asset],
   MultiAsset, Asset, None] = None, type: Union[InferenceType,
   str] = InferenceType.NOT_CONFIGURED, with_tags: bool = False,
   with_labels: bool = False, with_annotations: bool = False
)

Description

Fork this dataset version into another dataset version, with the same origin.

Will create a new dataset version, with the same origin and the given version.
You can give a description and a default type.
You can give a list of asset coming from this dataset version to add into the new dataset version.
Only these assets will be added to the new dataset.
If with_tags is True, tags of each asset will be transferred to the new dataset version.
If with_labels is True, labels of source dataset version will be transferred into new dataset version.
If with_annotations is True, labels and annotations will be transferred to new dataset version.
This might take more time.

Examples

foo_dataset_version = client.get_dataset('my_datatest').get_version('first')
assets = foo_dataset_version.list_assets(limit=100)
bar_dataset_version = foo_dataset_version.fork(version='second', assets=assets)

Arguments

version (str) : new version name
description (str, optional) : description, defaults to "Forked from version '<version_name>'"
assets (MultiAsset or Asset, optional) : assets to add to the new dataset version, defaults will be all assets
type (InferenceType, optional) : inference type of the new dataset version, defaults to NOT_CONFIGURED
with_tags (bool, optional) : if true tags of assets will be added to the new dataset version, defaults to false
with_labels (bool, optional) : if true, labelmap will be transferred to new dataset version, defaults to false
with_annotations (bool, optional) : if true annotations of each asset will be added to the new dataset version, defaults to false

Returns

A DatasetVersion with given assets

find_asset

find_asset(
   data: Optional[Data] = None, filename: Optional[str] = None,
   object_name: Optional[str] = None
)

Description

Find an asset into this dataset version

You can find it by giving its supposed Data object, its filename or its object name

Examples

my_asset = my_dataset_version.find_asset(filename="test.png")

Arguments

data (Data, optional) : data linked to asset. Defaults to None.
filename (str, optional) : filename of the asset. Defaults to None.
object_name (str, optional) : object name in the storage S3. Defaults to None.

Raises

If no asset match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 assets matching this query (for example if filename is duplicated)

Returns

The Asset found

find_all_assets

find_all_assets(
   filenames: Optional[List[str]] = None, object_names: Optional[List[str]] = None,
   ids: Optional[List[Union[str, UUID]]] = None
)

Description

Find some assets of this dataset version from their filenames

Examples

my_asset = my_dataset_version.find_all_assets(filenames=["test.png", "image2.jpg"])

Arguments

object_names (List[str], optional) : object names of the assets you're looking for. Defaults to None.
ids : (List[UUID], optional): ids of the assets you're looking for. Defaults to None.
filenames (List[str], optional) : filenames of the assets you're looking for. Defaults to None.

Returns

A MultiAsset object that wraps some Asset that you can manipulate.

list_assets

list_assets(
   limit: Optional[int] = None, offset: Optional[int] = None,
   page_size: Optional[int] = None, order_by: Optional[List[str]] = None,
   tags: Union[Tag, List[Tag], str, List[str], None] = None, data_tags: Union[Tag,
   List[Tag], str, List[str], None] = None, intersect_tags: bool = False,
   intersect_data_tags: bool = False, filenames: Optional[List[str]] = None,
   object_names: Optional[List[str]] = None, ids: Optional[List[Union[str,
   UUID]]] = None, filenames_startswith: Optional[List[str]] = None,
   q: Optional[str] = None
)

Description

List assets of this dataset version

It will retrieve all assets object of this dataset.
You will then be able to manipulate them or add them to another dataset.

Examples

assets = foo_dataset_version.list_assets()

Arguments

limit (int) : limit to a certain number of assets
offset (int) : offset to access some new objects, if 0 will retrieve starting from the beginning
page_size (int) : page size when paginating.
order_by (str) : a list of string to use for sorting data, if None will not sort
tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given tags
by default. if intersect_tags is True,
it will return assets that have all the given tags
intersect_tags (bool, optional) : if True, and a list of tags is given, will return assets that have
all the given tags. Defaults to False.
data_tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given
data tags by default. if intersect_data_tags is True,
it will return assets that have all the given data tags
intersect_data_tags (bool, optional) : if True, and a list of data tags is given, will return assets that have
all the given data tags. Defaults to False.
filenames (list[str], optional) : if given, will return assets that have filename equals to one of given filenames
filenames_startswith (list[str], optional) : if given, will return assets that have filename starting with to one of given filenames
object_names (list[str], optional) : if given, will return assets that have object name equals to one of given object names
ids : (List[UUID]): ids of the assets you're looking for. Defaults to None.
q (str, optional) : if given, will filter data with given query. Defaults to None.

Returns

A MultiAsset object that wraps some Asset that you can manipulate.

delete

delete()

Description

Delete a dataset version.

DANGER ZONE: Be very careful here!

It will remove this dataset version from our database, all of its assets and annotations will be removed.
It will also remove potential annotation campaign of this dataset version.

Examples

foo_dataset_version.delete()

set_type

set_type(
   type: Union[str, InferenceType]
)

Description

Set inference type of this DatasetVersion.
You can pass a string with the exact key corresponding to inference type or an enum value InferenceType.

Examples

dataset_version.set_type('object_detection')
dataset_version.set_type(InferenceType.SEGMENTATION)

Arguments

type (str or InferenceType) : type to give to this dataset version

update

update(
   version: Optional[str] = None, description: Optional[str] = None, type: Union[str,
   InferenceType, None] = None
)

Description

Update version, description and type of Dataset.

Examples

dataset_version.update(description='My favourite dataset')

Arguments

version (str, optional) : New version name of the dataset. Defaults to None.
description (str, optional) : New description of the dataset. Defaults to None.
type (str or InferenceType, optional) : New type of the dataset. Defaults to None.

download

download(
   target_path: Union[str, Path, None] = None, force_replace: bool = False,
   max_workers: Optional[int] = None, use_id: bool = False
)

Description

Downloads assets of a dataset version.

It will download all assets from a dataset version into specified folder.
If target_path is None, it will download into ./<dataset_name>/<dataset_version>
You can precise a number of threads to use while downloading.

Examples

foo_dataset_version.download('~/Downloads/dataset_pics')

Arguments

target_path (str or Path, optional) : Target folder. Defaults to None.
force_replace : (bool, optional): Replace an existing file if exists. Defaults to False.
max_workers (int, optional) : Number of max workers used to download. Defaults to os.cpu_count() + 4.
use_id (bool, optional) : If true, will download file with id and extension as file name. Defaults to False.

list_labels

list_labels()

Description

Get all labels of a dataset version.

It will retrieve a list of label objects.

Examples

foo_dataset_version.create_label("today")
labels = foo_dataset_version.get_labels()
assert labels[0].name == "today"

Returns

List of Label

create_label

create_label(
   name: str
)

Description

Add label to a dataset version.

You have to give a name to the label.

Examples

foo_dataset_version.create_label("today")

Arguments

name (str) : label name to create

Returns

A Label object

get_label

get_label(
   name: str
)

Description

Find label in a dataset version.

Examples

label = foo_dataset_version.get_label("today")

Arguments

name (str) : label name to find

Returns

A Label object

get_or_create_label

get_or_create_label(
   name: str
)

Description

Retrieve a label used in this dataset version by its name.
If label does not exist, create it and return it.

Examples

label = dataset_version.get_or_create_label("new_label")

Arguments

name (str) : label name to retrieve or create

Returns

A (label) object

list_annotations

list_annotations(
   worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
   limit: Optional[int] = None, offset: Optional[int] = None,
   order_by: Optional[List[str]] = None, page_size: Optional[int] = None
)

Description

Retrieve annotations of a dataset version.

Examples

annotations = foo_dataset_version.list_annotations()

Arguments

limit (Optional[int], optional) : Limit number of annotations to retrieve.
Defaults to None, all annotations will be retrieved.
offset (Optional[int], optional) : Offset to begin with when listing annotations.
Defaults to None, starting at 0.
page_size (Optional[int], optional) : Size of each page when retrieving .
Defaults to None, page will be equals to default pagination.
order_by (Optional[List[str]], optional) : Order annotation by some criteria.
Defaults to None.
worker (Optional[Worker], optional) : Worker filter.
Defaults to None.
status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve.
Defaults to None.

Raises

NoDataError : When no annotations retrieved

Returns

A (MultiAnnotation) object

load_annotations

load_annotations(
   worker: Optional[Worker] = None, status: Optional[AnnotationStatus] = None,
   assets: Union[List[Asset], MultiAsset, None] = None, chunk_size: int = 1000,
   max_workers: Optional[int] = None, skip_error: bool = False
)

Description

Load these annotation by retrieving shapes with labels, asset_id and worker_id

Examples

dict_annotations = foo_dataset_version.load_annotations()

Arguments

worker (Optional[Worker], optional) : Worker filter. Defaults to None.
status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve. Defaults to None.
assets (Union[List[Asset], MultiAsset, None], optional) : List of the asset to retrieve. Defaults to None.
chunk_size (int, optional) : Size of chunk of annotations to load by request. Defaults to 1000.
max_workers (int, optional) : Number of max workers used to load annotations. Defaults to os.cpu_count() + 4.
skip_error (bool, optional) : skip error of a chunk and return partial annotations. Default to False

Returns

A dict of annotations

export_annotation_file

export_annotation_file(
   annotation_file_type: Union[AnnotationFileType, str], target_path: Union[str,
   Path] = './', assets: Union[MultiAsset, List[Asset], None] = None,
   worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
   force_replace: bool = True, export_video: bool = False, use_id: bool = False
)

Description

Export annotations of this dataset version into a file, and download it.

Giving 'worker' argument, you will retrieve only annotations of this worker if they exist.
If you don't give 'worker', it will only export the last created annotation and its shapes.

Examples

dataset_v0.export_annotation_file(AnnotationFileType.COCO, "./")

Arguments

annotation_file_type (AnnotationFileType) : choose to export in Pascal VOC format, YOLO format or COCO format.
target_path (str or Path, optional) : directory path where file is downloaded. Defaults to current directory.
assets (Union[MultiAsset, List[Asset], None], optional) : a list of assets of this dataset version.
Only these assets will be concerned by this export. Defaults to None.
worker (Worker, optional) : worker of annotations. Defaults to None.
status (AnnotationStatus, optional) : status of annotations. Defaults to None.
force_replace (bool, optional) : if true, will replace an existing file annotation. Defaults to True.
export_video (bool, optional) : if true, will export video of your dataset, instead of assets. Defaults to False.
use_id (bool, optional) : if true, id will be used when generating annotation files.
For example, in coco file, assuming you have "image_1.png", it will generate tag like
018c59e3-b21b-7006-a82b-047d3931db81.png.
You should combine this method with dataset_version.download(use_id=True)
Defaults to False.

Returns

Path of downloaded file.

build_coco_file_locally

build_coco_file_locally(
   worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
   enforced_ordered_categories: Optional[List[str]] = None,
   assets: Union[MultiAsset, List[Asset], None] = None, use_id: bool = False
)

Description

Build a coco file locally instead of exporting it from the platform.
This method will load annotations of a dataset with given filters, then build all coco annotations,
then load all assets and labels from platform needed in this coco file and return a coco file.

Returned coco file can be then written into a file

Examples

coco_file = dataset_v0.build_coco_file_locally()

Arguments

worker (Worker, optional) : worker of annotations. Defaults to None.
status (AnnotationStatus, optional) : status of annotations. Defaults to None.
assets (Union[MultiAsset, List[Asset]], optional) : assets of annotations. Defaults to None.
enforced_ordered_categories (List of str, optional) : use this parameter to enforce an order of categories
for the coco file. Defaults to None.
use_id (bool, optional) : set True if you downloaded assets with id as filenames, COCO File will then use ids
as filenames. Defaults to False.

Returns

A COCO File object

import_annotations_yolo_files

import_annotations_yolo_files(
   configuration_yaml_path: Union[str, Path], file_paths: List[Union[str, Path]],
   worker: Optional[Worker] = None, mode: Union[ImportAnnotationMode,
   str] = ImportAnnotationMode.REPLACE, force_create_label: bool = True,
   fail_on_asset_not_found: bool = True, status: Optional[AnnotationStatus] = None
)

Description

Read a yolo annotation configuration file, then read all given file paths with annotations parse it and create annotations and shape for all assets

Examples

dataset_v0.import_annotations_yolo_files(configuration_yaml_path="data.yaml", file_paths=["asset1.txt"])

Arguments

configuration_yaml_path (str, Path) : Path to file of configuration
file_paths (List of str or Path) : Paths of annotation files to import
worker (Worker, optional) : Worker to use. Defaults to current user.
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
SKIP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE.
force_create_label (bool) : Ensures labels are created if they don't exist. Defaults to True.
fail_on_asset_not_found (bool) : If one filename is not found in dataset, fail before importing annotations. Defaults to True.
status (AnnotationStatus) : Annotation status to set to created annotations.

Raises

FileNotFoundException : if file is not found

Returns

A dict with annotation id as string keys and number of shapes created as integer.

import_annotation_voc_file

import_annotation_voc_file(
   file_path: Union[str, Path], worker: Optional[Worker] = None,
   mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
   force_create_label: bool = True, status: Optional[AnnotationStatus] = None
)

Description

Read a Pascal VOC file, parse it and create some annotations and shape for one given asset

Examples

dataset_v0.import_annotation_voc_file(file_path="voc.xml")

Arguments

file_path (str or Path) : Path of file to import
worker (Worker, optional) : Worker to use. Defaults to current user.
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
KEEP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE.
force_create_label (bool) : Ensures labels are created if they don't exist. Defaults to True.
status (AnnotationStatus, optional) : status given to created annotations. Defaults to None.

Raises

FileNotFoundException : if file is not found

Returns

A dict with annotation id as string keys and number of shapes created as integer.

import_annotations_coco_file

import_annotations_coco_file(
   file_path: Union[Path, str], worker: Optional[Worker] = None,
   mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
   force_create_label: bool = True, fail_on_asset_not_found: bool = True,
   status: Optional[AnnotationStatus] = None
)

Description

Read a COCO file, parse it and create some annotations and shape for given assets

Examples

dataset_v0.import_annotations_coco_file(file_path="coco.json")

Arguments

file_path (str) : Path of file to import
worker (Worker, optional) : Worker to use. Defaults to current user.
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
KEEP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE.
force_create_label (bool) : Ensure labels are created if they don't exist. Defaults to True
fail_on_asset_not_found (bool) : Raise an error if asset is not found. Defaults to True
status (AnnotationStatus, optional) : Annotation Status of imported annotations, default will be PENDING.
Defaults to None.

Raises

FileNotFoundException : if file is not found

Returns

A dict with annotation id as string keys and number of shapes created as integer.

delete_all_annotations

delete_all_annotations(
   workers: Optional[List[Worker]] = None
)

Description

Delete all annotations of this dataset version.

DANGER ZONE: Be very careful here!

It will remove all annotation of every asset of this dataset version.
You can give workers on which it will be effectively erased.

Examples

foo_dataset_version.delete_all_annotations()

Arguments

workers (List[Worker], optional) : Workers on which annotations will be removed. Defaults to None.

synchronize

synchronize(
   target_dir: str, do_download: bool = False
)

Description

Synchronize this dataset version with target dir by comparing assets in target dir with assets uploaded in dataset version.

Examples

foo_dataset.synchronize('./foo_dataset/first')

Arguments

target_dir (str) : directory to synchronize against
do_download (bool) : do download files when they are not in local directory

Returns

A MultiAsset object with assets downloaded if do_download is True

retrieve_stats

retrieve_stats()

Description

Retrieve statistics of this dataset version (label repartition, number of objects, number of annotations).

Examples

stats = foo_dataset_version.retrieve_stats()
assert stats.nb_objects == 25
assert stats.nb_annotations == 5

Returns

label_repartition : dict with label names as keys and number of shape with these labels as value
- nb_objects: total number of objects (sum of label_repartition values)
- nb_annotations: total number of Annotation objects of this dataset version

A DatasetVersionStats schema with keys:

get_or_create_asset_tag

get_or_create_asset_tag(
   name: str
)

Description

Retrieve an asset tag used in this dataset version by its name.
If tag does not exist, create it and return it.

Examples

tag = dataset_version.get_or_create_asset_tag("new_tag")

Arguments

name (str) : Name of the tag to retrieve or create

Returns

A Tag object

create_asset_tag

create_asset_tag(
   name: str
)

Description

Create asset tag only available in this dataset version.

Examples

tag_dog = dataset_v0.create_asset_tag("dog")

Arguments

name (str) : name of tag to create

Returns

A Tag object

get_asset_tag

get_asset_tag(
   name: str
)

Description

Retrieve an asset tag used in this dataset version.

Examples

tag_dog = dataset_v0.get_asset_tag("dog")

Arguments

name (str) : Name of the tag you're looking for

Returns

A Tag object

convert_tags_to_classification

convert_tags_to_classification(
   tag_type: TagTarget, tags: List[Tag]
)

Description

list_asset_tags

list_asset_tags()

Description

List asset tags created in this dataset version

Examples

tags = dataset_v0.list_asset_tags()
assert tag_dog in tags

Returns

A list of Tag

train_test_split

train_test_split(
   prop: float = 0.8, random_seed: Optional[Any] = None, load_asset_page_size: int = 100
)

Description

Split a DatasetVersion into 2 MultiAssets and return their label repartition.

Examples

train_assets, eval_assets, count_train, count_eval, labels = dataset_version.train_test_split()

Arguments

prop (float, optional) : Percentage of data for training set. Defaults to 0.8.
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.

Returns

: list of labels, "y": list of label count},
dict of repartition of classes for test assets, with {"x": list of labels, "y": list of label count},
list of labels

A tuple with all of this information (
list of train assets,
list of test assets,
)

train_test_val_split

train_test_val_split(
   ratios: List[float] = None, random_seed: Optional[Any] = None,
   load_asset_page_size: int = 100
)

Description

Split a DatasetVersion into 3 MultiAssets and return their label repartition.
By default, will split with a ratio of 0.64, 0.16 and 0.20

Examples

train_assets, test_assets, val_assets, count_train, count_test, count_val, labels = dataset_version.train_test_val_split()

Arguments

ratios (list of float, optional) : Ratios of split used for training and eval set.
Defaults to [0.64, 0.16, 0.20]
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.

Returns

: list of labels, "y": list of label count},
dict of repartition of classes for test assets, with {"x": list of labels, "y": list of label count},
dict of repartition of classes for val assets, with {"x": list of labels, "y": list of label count},
list of labels

A tuple with all of this information (
list of train assets,
list of test assets
list of val assets,
)

split_into_multi_assets

split_into_multi_assets(
   ratios: List[Union[float, int]], random_seed: Optional[Any] = None,
   load_asset_page_size: int = 100
)

Description

Split dataset into multiple MultiAsset, proportionally according to given ratios.

Examples

split_assets, counts, labels = dataset.split_into_multi_assets([0.2, 0.5, 0.3])
train_assets = split_assets[0]
test_assets = split_assets[1]
val_assets = split_assets[2]

Arguments

ratios (list of float) : Percentage of data that will go into each category.
Will be normalized but sum should be equals to one if you don't want to be confused.
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.

Returns

A tuple with all of this information (
list of MultiAsset,
dict of repartition of classes for each MultiAsset,
list of labels
)

create_campaign

create_campaign(
   name: str, description: Optional[str] = None,
   instructions_file_path: Optional[str] = None,
   instructions_text: Optional[str] = None, end_date: Optional[date] = None,
   auto_add_new_assets: Optional[bool] = False,
   auto_close_on_completion: Optional[bool] = False
)

Description

Create campaign on a dataset version.

Examples

foo_dataset_version.create_campaign(name="my-campaign")

Arguments

name (str) : name of the campaign
description (str, optional) : Description of the campaign. Defaults to None.
instructions_file_path (str, optional) : Instructions file path. Defaults to None.
instructions_text (str, optional) : Instructions text. Defaults to None.
end_date (date, optional) : End date of the campaign. Defaults to None.
auto_add_new_assets (bool, optional) : If true, new assets of this dataset will be added as a task
in the campaign. Defaults to False.
auto_close_on_completion (bool, optional) : If true, campaign will be close when all tasks will be done.
Defaults to False.

Returns

An AnnotationCampaign object

get_campaign

get_campaign()

Description

Get campaign of a dataset version.
If there are multiple campaign, it will only retrieve the first one.

Examples

foo_dataset_version.get_campaign()

**Returns**

An [AnnotationCampaign](annotationcampaign) object

---