Properties
-
origin_id
UUID of the Dataset origin -
name
Name of the Dataset origin -
version
Version of this DatasetVersion -
type
Type of this DatasetVersion
Methods
add_tags
add_tags(
tags: Union[Tag, list[Tag]]
)
Description
Add some tags to an object.
It can be used on Data/MultiData/Asset/MultiAsset/DatasetVersion/Dataset/Model/ModelVersion.
You can give a Tag or a list of Tag.
Examples
tag_bicycle = client.create_tag("bicycle", Target.DATA)
tag_car = client.create_tag("car", Target.DATA)
tag_truck = client.create_tag("truck", Target.DATA)
data.add_tags(tag_bicycle)
data.add_tags([tag_car, tag_truck])
remove_tags
remove_tags(
tags: Union[Tag, list[Tag]]
)
Description
Remove some tags from an object (can be used on Data/Asset/DatasetVersion/Dataset/Model/ModelVersion)
You can give a Tag or a list of Tag.
Examples
data.remove_tags(tag_bicycle)
data.remove_tags([tag_car, tag_truck])
get_tags
get_tags()
Description
Retrieve tags of your dataset version.
Examples
tags = foo_dataset_version.get_tags()
assert tags[0].name == "training-dataset"
Returns
List of Tag objects
add_data
add_data(
data: Union[Data, list[Data], MultiData], tags: Optional[list[Union[str,
Tag]]] = None, wait: Optional[bool] = True
)
Description
Feed this version with data coming from a datalake.
A versioned dataset DatasetVersion takes Data from Datalake and transform it as annotable Asset.
You can give tags that will be added as asset tags to every created asset.
Examples
foo_dataset = client.create_dataset('foo_dataset')
foo_dataset_version_1 = foo_dataset.create_version('first')
some_data = client.get_datalake().list_data(limit=1000)
foo_dataset_version_1.add_data(some_data)
Arguments
-
data (Data, list[Data] or MultiData) : data to add to dataset
-
tags (List of str or Tag) : tags to add to every asset created
-
wait : (bool, Optional): if True, it will wait for the background task to end. Defaults to True.
Returns
A Job object that you can use to monitor the progress of this operation.
fork
fork(
version: str, description: Optional[str] = None, assets: Union[list[Asset],
MultiAsset, Asset, None] = None, type: Union[InferenceType,
str] = InferenceType.NOT_CONFIGURED, with_tags: bool = False,
with_labels: bool = False, with_annotations: bool = False,
wait: Optional[bool] = True
)
Description
Fork this dataset version into another dataset version, with the same origin.
Will create a new dataset version, with the same origin and the given version.
You can give a description and a default type.
You can give a list of asset coming from this dataset version to add into the new dataset version.
Only these assets will be added to the new dataset.
If with_tags is True, tags of each asset will be transferred to the new dataset version.
If with_labels is True, labels of source dataset version will be transferred into new dataset version.
If with_annotations is True, labels and annotations will be transferred to new dataset version.
This might take more time.
Examples
foo_dataset_version = client.get_dataset('my_datatest').get_version('first')
assets = foo_dataset_version.list_assets(limit=100)
bar_dataset_version = foo_dataset_version.fork(version='second', assets=assets)
Arguments
-
version (str) : new version name
-
description (str, optional) : description, defaults to "Forked from version '<version_name>'"
-
assets (MultiAsset or Asset, optional) : assets to add to the new dataset version, defaults will be all assets
-
type (InferenceType, optional) : inference type of the new dataset version, defaults to NOT_CONFIGURED
-
with_tags (bool, optional) : if true tags of assets will be added to the new dataset version, defaults to false
-
with_labels (bool, optional) : if true, labelmap will be transferred to new dataset version, defaults to false
-
with_annotations (bool, optional) : if true annotations of each asset will be added to the new dataset version, defaults to false
-
wait : (bool, Optional): if True, it will wait for the background task to end. Defaults to True.
Returns
A tuple with DatasetVersion and Job
copy_assets_to
copy_assets_to(
destination: 'DatasetVersion', assets: Union[list[Asset], MultiAsset, Asset],
with_tags: bool = False, with_annotations: bool = False, wait: Optional[bool] = True
)
Description
Copy assets from this dataset version into a destination dataset version, it must have the same origin.
assets must come from this dataset version.
You need to give a list of asset coming from this dataset version to add into the destination dataset version.
If with_tags is True, tags of each asset will be transferred to the destination dataset version.
If with_annotations is True, labels and annotations will be transferred to destination dataset version.
This might take more time.
Examples
foo_dataset_version = client.get_dataset('my_dataset').get_version('first')
assets = foo_dataset_version.list_assets(limit=100)
bar_dataset_version = client.get_dataset('my_dataset').get_version('second')
foo_dataset_version.copy_assets_to(bar_dataset_version, assets, with_annotations=True)
Arguments
-
destination DatasetVersion : must have the same origin.
-
assets (MultiAsset, list of Asset or Asset) : assets to add to the destination.
-
with_tags (bool, optional) : if true tags of assets will be added to copied assets. Defaults to False
-
with_annotations (bool, optional) : if true annotations of each asset will be added to copied assets. Defaults to False.
-
wait : (bool, Optional): if True, it will wait for the background task to end. Defaults to True.
Returns
A Job that you can wait for
find_asset
find_asset(
data: Optional[Data] = None, filename: Optional[str] = None,
object_name: Optional[str] = None, id: Union[str, UUID, None] = None
)
Description
Find an asset into this dataset version
You can find it by giving its supposed Data object, its filename or its object name
Examples
my_asset = my_dataset_version.find_asset(filename="test.png")
Arguments
-
data (Data, optional) : data linked to asset. Defaults to None.
-
filename (str, optional) : filename of the asset. Defaults to None.
-
object_name (str, optional) : object name in the storage S3. Defaults to None.
-
id (str, optional) : id of the asset. Defaults to None.
Raises
If no asset match the query, it will raise a NotFoundError.
In some case, it can raise an InvalidQueryError,
it might be because platform stores 2 assets matching this query (for example if filename is duplicated)
Returns
The Asset found
list_assets
list_assets(
limit: Optional[int] = None, offset: Optional[int] = None,
page_size: Optional[int] = None, order_by: Optional[list[str]] = None,
tags: Union[Tag, list[Tag], str, list[str], None] = None, data_tags: Union[Tag,
list[Tag], str, list[str], None] = None, intersect_tags: bool = False,
intersect_data_tags: bool = False, filenames: Optional[list[str]] = None,
object_names: Optional[list[str]] = None, ids: Optional[list[Union[str,
UUID]]] = None, filenames_startswith: Optional[list[str]] = None,
q: Optional[str] = None, data_ids: Optional[list[Union[str, UUID]]] = None
)
Description
List assets of this dataset version
It will retrieve all assets object of this dataset.
You will then be able to manipulate them or add them to another dataset.
Examples
assets = foo_dataset_version.list_assets()
Arguments
-
limit (int, optional) : limit to a certain number of assets
-
offset (int, optional) : offset to access some new objects, if 0 will retrieve starting from the beginning
-
page_size (int, optional) : deprecated.
-
order_by (str, optional) : a list of string to use for sorting data, if None will not sort
-
tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given tags
by default. ifintersect_tags
is True,
it will return assets that have all the given tags -
intersect_tags (bool, optional) : if True, and a list of tags is given, will return assets that have
all the given tags. Defaults to False. -
data_tags (str, Tag, list[Tag or str], optional) : if given, will return assets that have one of given
data tags by default. ifintersect_data_tags
is True,
it will return assets that have all the given data tags -
intersect_data_tags (bool, optional) : if True, and a list of data tags is given, will return assets that have
all the given data tags. Defaults to False. -
filenames (list[str], optional) : if given, will return assets that have filename equals to one of given filenames
-
filenames_startswith (list[str], optional) : if given, will return assets that have filename starting with to one of given filenames
-
object_names (list[str], optional) : if given, will return assets that have object name equals to one of given object names
-
ids : (list[UUID]): ids of the assets you're looking for. Defaults to None.
-
q (str, optional) : if given, will filter data with given query. Defaults to None.
-
data_ids : (list[UUID]): ids of the data linked to the assets you are looking for. Defaults to None.
Returns
A MultiAsset object that wraps some Asset that you can manipulate.
delete
delete()
Description
Delete a dataset version.
⚠️ DANGER ZONE: Be very careful here!
It will remove this dataset version from our database, all of its assets and annotations will be removed.
It will also remove potential annotation campaign of this dataset version.
Examples
foo_dataset_version.delete()
set_type
set_type(
type: Union[str, InferenceType]
)
Description
Set inference type of this DatasetVersion.
You can pass a string with the exact key corresponding to inference type or an enum value InferenceType.
Examples
dataset_version.set_type('object_detection')
dataset_version.set_type(InferenceType.SEGMENTATION)
Arguments
- type (str or InferenceType) : type to give to this dataset version
update
update(
version: Optional[str] = None, description: Optional[str] = None, type: Union[str,
InferenceType, None] = None
)
Description
Update version, description and type of Dataset.
Examples
dataset_version.update(description='My favourite dataset')
Arguments
-
version (str, optional) : New version name of the dataset. Defaults to None.
-
description (str, optional) : New description of the dataset. Defaults to None.
-
type (str or InferenceType, optional) : New type of the dataset. Defaults to None.
download
download(
target_path: Union[str, Path, None] = None, force_replace: bool = False,
max_workers: Optional[int] = None, use_id: bool = False
)
Description
Downloads assets of a dataset version.
It will download all assets from a dataset version into specified folder.
If target_path is None, it will download into ./<dataset_name>/<dataset_version>
You can precise a number of threads to use while downloading.
Examples
foo_dataset_version.download('~/Downloads/dataset_pics')
Arguments
-
target_path (str or Path, optional) : Target folder. Defaults to None.
-
force_replace : (bool, optional): Replace an existing file if exists. Defaults to False.
-
max_workers (int, optional) : Number of max workers used to download. Defaults to os.cpu_count() + 4.
-
use_id (bool, optional) : If true, will download file with id and extension as file name. Defaults to False.
list_labels
list_labels()
Description
Get all labels of a dataset version.
It will retrieve a list of label objects.
Examples
foo_dataset_version.create_label("today")
labels = foo_dataset_version.get_labels()
assert labels[0].name == "today"
Returns
List of Label
create_label
create_label(
name: str
)
Description
Add label to a dataset version.
You have to give a name to the label.
Examples
foo_dataset_version.create_label("today")
Arguments
- name (str) : label name to create
Returns
A Label object
get_label
get_label(
name: str
)
Description
Find label in a dataset version.
Examples
label = foo_dataset_version.get_label("today")
Arguments
- name (str) : label name to find
Returns
A Label object
get_or_create_label
get_or_create_label(
name: str
)
Description
Retrieve a label used in this dataset version by its name.
If label does not exist, create it and return it.
Examples
label = dataset_version.get_or_create_label("new_label")
Arguments
- name (str) : label name to retrieve or create
Returns
A (label) object
list_annotations
list_annotations(
worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
limit: Optional[int] = None, offset: Optional[int] = None,
order_by: Optional[list[str]] = None, page_size: Optional[int] = None
)
Description
Retrieve annotations of a dataset version.
Examples
annotations = foo_dataset_version.list_annotations()
Arguments
-
limit (Optional[int], optional) : Limit number of annotations to retrieve.
Defaults to None, all annotations will be retrieved. -
offset (Optional[int], optional) : Offset to begin with when listing annotations.
Defaults to None, starting at 0. -
page_size (Optional[int], optional) : Size of each page when retrieving .
Defaults to None, page will be equals to default pagination. -
order_by (Optional[list[str]], optional) : Order annotation by some criteria.
Defaults to None. -
worker (Optional[Worker], optional) : Worker filter.
Defaults to None. -
status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve.
Defaults to None.
Raises
- NoDataError : When no annotations retrieved
Returns
A (MultiAnnotation) object
load_annotations
load_annotations(
worker: Optional[Worker] = None, status: Optional[AnnotationStatus] = None,
assets: Union[list[Asset], MultiAsset, None] = None, chunk_size: int = 100,
max_workers: Optional[int] = None, skip_error: bool = False
)
Description
Load these annotation by retrieving shapes with labels, asset_id and worker_id
Examples
dict_annotations = foo_dataset_version.load_annotations()
Arguments
-
worker (Optional[Worker], optional) : Worker filter. Defaults to None.
-
status (Optional[AnnotationStatus], optional) : Status of annotations to retrieve. Defaults to None.
-
assets (Union[list[Asset], MultiAsset, None], optional) : List of the asset to retrieve. Defaults to None.
-
chunk_size (int, optional) : Size of chunk of annotations to load by request. Defaults to 100.
-
max_workers (int, optional) : Number of max workers used to load annotations. Defaults to os.cpu_count() + 4.
-
skip_error (bool, optional) : skip error of a chunk and return partial annotations. Default to False
Returns
A dict of annotations
export_annotation_file
export_annotation_file(
annotation_file_type: Union[AnnotationFileType, str], target_path: Union[str,
Path] = './', assets: Union[MultiAsset, list[Asset], None] = None,
worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
force_replace: bool = True, export_video: bool = False, use_id: bool = False
)
Description
Export annotations of this dataset version into a file, and download it.
Giving 'worker' argument, you will retrieve only annotations of this worker if they exist.
If you don't give 'worker', it will only export the last created annotation and its shapes.
Examples
dataset_v0.export_annotation_file(AnnotationFileType.COCO, "./")
Arguments
-
annotation_file_type (AnnotationFileType) : choose to export in Pascal VOC format, YOLO format or COCO format.
-
target_path (str or Path, optional) : directory path where file is downloaded. Defaults to current directory.
-
assets (Union[MultiAsset, list[Asset], None], optional) : a list of assets of this dataset version.
Only these assets will be concerned by this export. Defaults to None. -
worker (Worker, optional) : worker of annotations. Defaults to None.
-
status (AnnotationStatus, optional) : status of annotations. Defaults to None.
-
force_replace (bool, optional) : if true, will replace an existing file annotation. Defaults to True.
-
export_video (bool, optional) : if true, will export video of your dataset, instead of assets. Defaults to False.
-
use_id (bool, optional) : if true, id will be used when generating annotation files.
For example, in coco file, assuming you have "image_1.png", it will generate tag like
018c59e3-b21b-7006-a82b-047d3931db81.png.
You should combine this method with dataset_version.download(use_id=True)
Defaults to False.
Returns
Path of downloaded file.
build_coco_file_locally
build_coco_file_locally(
worker: Optional[Worker] = None, status: Union[AnnotationStatus, str, None] = None,
enforced_ordered_categories: Optional[list[str]] = None,
assets: Union[MultiAsset, list[Asset], None] = None, use_id: bool = False
)
Description
Build a coco file locally instead of exporting it from the platform.
This method will load annotations of a dataset with given filters, then build all coco annotations,
then load all assets and labels from platform needed in this coco file and return a coco file.
It will only build a file with the last created Annotation that match given filters.
Returned coco file can be then written into a file
Examples
coco_file = dataset_v0.build_coco_file_locally()
Arguments
-
worker (Worker, optional) : worker of annotations. Defaults to None.
-
status (AnnotationStatus, optional) : status of annotations. Defaults to None.
-
assets (Union[MultiAsset, list[Asset]], optional) : assets of annotations. Defaults to None.
-
enforced_ordered_categories (List of str, optional) : use this parameter to enforce an order of categories
for the coco file. Defaults to None. -
use_id (bool, optional) : set True if you downloaded assets with id as filenames, COCO File will then use ids
as filenames. Defaults to False.
Returns
A COCO File object
import_annotations_yolo_files
import_annotations_yolo_files(
configuration_yaml_path: Union[str, Path], file_paths: list[Union[str, Path]],
worker: Optional[Worker] = None, mode: Union[ImportAnnotationMode,
str] = ImportAnnotationMode.REPLACE, force_create_label: bool = True,
fail_on_asset_not_found: bool = True, status: Optional[AnnotationStatus] = None
)
Description
Read a yolo annotation configuration file, then read all given file paths with annotations parse it and create annotations and shape for all assets
Examples
dataset_v0.import_annotations_yolo_files(configuration_yaml_path="data.yaml", file_paths=["asset1.txt"])
Arguments
-
configuration_yaml_path (str, Path) : Path to file of configuration
-
file_paths (List of str or Path) : Paths of annotation files to import
-
worker (Worker, optional) : Worker to use. Defaults to current user.
-
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
SKIP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE. -
force_create_label (bool) : Ensures labels are created if they don't exist. Defaults to True.
-
fail_on_asset_not_found (bool) : If one filename is not found in dataset, fail before importing annotations. Defaults to True.
-
status (AnnotationStatus) : Annotation status to set to created annotations.
Raises
- FileNotFoundException : if file is not found
Returns
A dict with annotation id as string keys and number of shapes created as integer.
import_annotation_voc_file
import_annotation_voc_file(
file_path: Union[str, Path], worker: Optional[Worker] = None,
mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
force_create_label: bool = True, status: Optional[AnnotationStatus] = None
)
Description
Read a Pascal VOC file, parse it and create some annotations and shape for one given asset
Examples
dataset_v0.import_annotation_voc_file(file_path="voc.xml")
Arguments
-
file_path (str or Path) : Path of file to import
-
worker (Worker, optional) : Worker to use. Defaults to current user.
-
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
KEEP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE. -
force_create_label (bool) : Ensures labels are created if they don't exist. Defaults to True.
-
status (AnnotationStatus, optional) : status given to created annotations. Defaults to None.
Raises
- FileNotFoundException : if file is not found
Returns
A dict with annotation id as string keys and number of shapes created as integer.
import_annotations_coco_file
import_annotations_coco_file(
file_path: Union[Path, str], worker: Optional[Worker] = None,
mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
force_create_label: bool = True, fail_on_asset_not_found: bool = True,
status: Optional[AnnotationStatus] = None, use_id: bool = False
)
Description
Read a COCO file, parse it and create some annotations and shape for given assets
Examples
dataset_v0.import_annotations_coco_file(file_path="coco.json")
Arguments
-
file_path (str) : Path of file to import
-
worker (Worker, optional) : Worker to use. Defaults to current user.
-
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
KEEP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE. -
force_create_label (bool) : Ensure labels are created if they don't exist. Defaults to True
-
fail_on_asset_not_found (bool) : Raise an error if asset is not found. Defaults to True
-
status (AnnotationStatus, optional) : Annotation Status of imported annotations, default will be PENDING.
Defaults to None. -
use_id (bool, optional) : If your coco file have asset id as filename, set this to true.
Defaults to False.
Raises
- FileNotFoundException : if file is not found
Returns
A dict with annotation id as string keys and number of shapes created as integer.
import_annotations_coco_video_file
import_annotations_coco_video_file(
file_path: Union[Path, str], worker: Optional[Worker] = None,
mode: Union[ImportAnnotationMode, str] = ImportAnnotationMode.REPLACE,
force_create_label: bool = True, fail_on_asset_not_found: bool = True,
status: Optional[AnnotationStatus] = None, use_id: bool = False
)
Description
Read a Video COCO file, parse it and create some annotations and shape for given assets.
Experimental feature: the availability and support of this feature can change until it’s stable
Examples
dataset_v0.import_annotations_coco_video_file(file_path="coco_vid.json")
Arguments
-
file_path (str) : Path of file to import
-
worker (Worker, optional) : Worker to use. Defaults to current user.
-
mode (ImportAnnotationMode, optional) : Mode used to import.
REPLACE will delete worker annotation if exists and replace it.
CONCATENATE will create shapes on existing annotation.
KEEP will do nothing on existing annotation.
Defaults to ImportAnnotationMode.REPLACE. -
force_create_label (bool) : Ensure labels are created if they don't exist. Defaults to True
-
fail_on_asset_not_found (bool) : Raise an error if asset is not found. Defaults to True
-
status (AnnotationStatus, optional) : Annotation Status of imported annotations, default will be PENDING.
Defaults to None. -
use_id (bool, optional) : If your coco file have asset id as filename, set this to true.
Defaults to False.
Raises
- FileNotFoundException : if file is not found
Returns
A dict with annotation id as string keys and number of shapes created as integer.
delete_all_annotations
delete_all_annotations(
workers: Optional[list[Worker]] = None
)
Description
Delete all annotations of this dataset version.
⚠️ DANGER ZONE: Be very careful here!
It will remove all annotation of every asset of this dataset version.
You can give workers on which it will be effectively erased.
Examples
foo_dataset_version.delete_all_annotations()
Arguments
- workers (list[Worker], optional) : Workers on which annotations will be removed. Defaults to None.
synchronize
synchronize(
target_dir: str, do_download: bool = False
)
Description
Synchronize this dataset version with target dir by comparing assets in target dir with assets uploaded in dataset version.
Examples
foo_dataset.synchronize('./foo_dataset/first')
Arguments
-
target_dir (str) : directory to synchronize against
-
do_download (bool) : do download files when they are not in local directory
Returns
A MultiAsset object with assets downloaded if do_download is True
retrieve_stats
retrieve_stats()
Description
Retrieve statistics of this dataset version (label repartition, number of objects, number of annotations).
Examples
stats = foo_dataset_version.retrieve_stats()
assert stats.nb_objects == 25
assert stats.nb_annotations == 5
Returns
- label_repartition : dict with label names as keys and number of shape with these labels as value
- nb_objects: total number of objects (sum of label_repartition values)
- nb_annotations: total number of Annotation objects of this dataset version
A DatasetVersionStats schema with keys:
get_or_create_asset_tag
get_or_create_asset_tag(
name: str
)
Description
Retrieve an asset tag used in this dataset version by its name.
If tag does not exist, create it and return it.
Examples
tag = dataset_version.get_or_create_asset_tag("new_tag")
Arguments
- name (str) : Name of the tag to retrieve or create
Returns
A Tag object
create_asset_tag
create_asset_tag(
name: str
)
Description
Create asset tag only available in this dataset version.
Examples
tag_dog = dataset_v0.create_asset_tag("dog")
Arguments
- name (str) : name of tag to create
Returns
A Tag object
get_asset_tag
get_asset_tag(
name: str
)
Description
Retrieve an asset tag used in this dataset version.
Examples
tag_dog = dataset_v0.get_asset_tag("dog")
Arguments
- name (str) : Name of the tag you're looking for
Returns
A Tag object
convert_tags_to_classification
convert_tags_to_classification(
tag_type: TagTarget, tags: list[Tag]
)
Description
list_asset_tags
list_asset_tags()
Description
List asset tags created in this dataset version
Examples
tags = dataset_v0.list_asset_tags()
assert tag_dog in tags
Returns
A list of Tag
train_test_split
train_test_split(
prop: float = 0.8, random_seed: Optional[Any] = None, load_asset_page_size: int = 100
)
Description
Split a DatasetVersion into 2 MultiAssets and return their label repartition.
Examples
train_assets, eval_assets, count_train, count_eval, labels = dataset_version.train_test_split()
Arguments
-
prop (float, optional) : Percentage of data for training set. Defaults to 0.8.
-
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
-
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.
Returns
: list of labels, "y": list of label count},
dict of repartition of classes for test assets, with {"x": list of labels, "y": list of label count},
list of labels
A tuple with all of this information (
list of train assets,
list of test assets,
)
train_test_val_split
train_test_val_split(
ratios: list[float] = None, random_seed: Optional[Any] = None,
load_asset_page_size: int = 100
)
Description
Split a DatasetVersion into 3 MultiAssets and return their label repartition.
By default, will split with a ratio of 0.64, 0.16 and 0.20
Examples
train_assets, test_assets, val_assets, count_train, count_test, count_val, labels = dataset_version.train_test_val_split()
Arguments
-
ratios (list of float, optional) : Ratios of split used for training and eval set.
Defaults to [0.64, 0.16, 0.20] -
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
-
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.
Returns
: list of labels, "y": list of label count},
dict of repartition of classes for test assets, with {"x": list of labels, "y": list of label count},
dict of repartition of classes for val assets, with {"x": list of labels, "y": list of label count},
list of labels
A tuple with all of this information (
list of train assets,
list of test assets
list of val assets,
)
split_into_multi_assets
split_into_multi_assets(
ratios: list[Union[float, int]], random_seed: Optional[Any] = None,
load_asset_page_size: int = 100
)
Description
Split dataset into multiple MultiAsset, proportionally according to given ratios.
Examples
split_assets, counts, labels = dataset.split_into_multi_assets([0.2, 0.5, 0.3])
train_assets = split_assets[0]
test_assets = split_assets[1]
val_assets = split_assets[2]
Arguments
-
ratios (list of float) : Percentage of data that will go into each category.
Will be normalized but sum should be equals to one if you don't want to be confused. -
random_seed (Any, optional) : Use a seed to ensures same result if run multiple times. Defaults to None.
-
load_asset_page_size (int, optional) : Page size when loading assets. Defaults to 100.
Returns
A tuple with all of this information (
list of MultiAsset,
dict of repartition of classes for each MultiAsset,
list of labels
)
create_campaign
create_campaign(
name: Optional[str] = None, description: Optional[str] = None,
instructions_file_path: Optional[str] = None,
instructions_text: Optional[str] = None, end_date: Optional[date] = None,
auto_add_new_assets: Optional[bool] = False,
auto_close_on_completion: Optional[bool] = False
)
Description
Create campaign on a dataset version.
Examples
foo_dataset_version.create_campaign()
Arguments
-
name (str, optional) : deprecated, it should not be used anymore. Defaults to None.
-
description (str, optional) : Description of the campaign. Defaults to None.
-
instructions_file_path (str, optional) : Instructions file path. Defaults to None.
-
instructions_text (str, optional) : Instructions text. Defaults to None.
-
end_date (date, optional) : End date of the campaign. Defaults to None.
-
auto_add_new_assets (bool, optional) : If true, new assets of this dataset will be added as a task
in the campaign. Defaults to False. -
auto_close_on_completion (bool, optional) : If true, campaign will be close when all tasks will be done.
Defaults to False.
Returns
An AnnotationCampaign object
get_campaign
get_campaign()
Description
Get campaign of a dataset version.
Examples
foo_dataset_version.get_campaign()
**Returns**
An [AnnotationCampaign](annotationcampaign) object
---
## launch_processing
```python
launch_processing(
processing: Processing, parameters: dict = None, cpu: int = None, gpu: int = None
)
Description
Launch given processing onto this dataset version. You can give specific cpu, gpu or parameters.
If not given, it will use default values specified in Processing.
If processing cannot be launched on a DatasetVersion it will raise before launching.
Examples
processing = client.get_processing("pre-annotation")
foo_dataset_version.launch_processing(processing)
**Returns**
A [Job](job) object
---