Dataset Management

This page will provide tutorials for managing datasets in EdgeFirst Studio.

View Dataset

This tutorial will show how to open the gallery of the dataset to see the individual samples in the dataset.

From the "Projects" page, you can click on the dataset button indicated in red to view the datasets contained in the project.

You will now see the datasets contained in the project. Each dataset has a gallery.

Dataset Attributes — Dataset Card UI Breakdown

In this dataset, you can see that it has a total of 1399 images and single label "coffeecup". This dataset also has two partitions; one for training and another for validation. There is a total of 915 images for the training group and 229 images for the validation group. Finally, the dataset has a single annotation set which shows that 1385 images have annotations and there is a total of 3020 annotations (3020 distinct objects in the dataset).

Click on the button on top of the dataset card to navigate to the dataset gallery. The dataset gallery will look like the following below.

This dataset will contain both sequences (videos) and images. Clicking on the sequences will provide video playback. Otherwise, clicking on images will expand the image view.

This dataset will have a complete set of 2D annotations (masks and bounding boxes) of coffee cups. This dataset was quickly annotated using the Automatic Ground Truth Generatation (AGTG) feature in EdgeFirst Studio.

Edit Dataset Information

The dataset name and description can be edited by clicking on the dataset extended menu on the top right portion of the dataset card. This should bring up the options and "Edit Info" as the first in the list. Click on "Edit Info".

This will bring up the window to edit the dataset "Name" and the "Description". Once the changes are made, click "Apply Changes" to save the changes.

The changes should appear on the dataset card as shown below.

View Fusion Dataset

This tutorial will show an example of a dataset that is ready for training.

Verify that the dataset has a training and validation split. The sample dataset shown below has a dedicated split for training (20066 samples) and validation (2229 samples).

Another sample dataset shown below is for training Vision models which has a dedicated split for training (1656 samples) and validation (184 samples).

Verify the contents of the dataset and the annotations. Click the button that navigates to the gallery. This will show the contents of the dataset. The dataset may be comprised of multiple sequences as shown below.

Clicking on any of these sequences will open individual images in the sequence with the visualizations of the annotations. For more information please see viewing datasets.

Datasets that train Fusion models provide annotations of the object's 3D bounding box. For more information on the dataset annotations, please see the EdgeFirst Dataset Format.

Datasets that train Vision models provide image annotations of the object's 2D bounding box and segmentation mask. For more information on the dataset annotations, please see the EdgeFirst Dataset Format.

For cases where the annotations need corrections, please see Manual 2D Annotations or Manual 3D Annotations for more details.

Create Dataset

If you have a video recording or sample images for your dataset, you can create a dataset container in EdgeFirst Studio to contain your video frames or images and annotations.

Navigate to a web browser and login to EdgeFirst Studio. Once logged in to EdgeFirst Studio, navigate to your project. In this case the project name is "Object Detection". Click on the "Datasets" button that is indicated in red below.

This will bring you to the "Datasets" page of the selected project. Create a new dataset container by clicking the "New Dataset" button that is indicated in red.

Add the dataset and annotation container name, labels, and dataset description as indicated by the fields below. It is up to you to specify the information in the fields and you do not have to strictly follow the example shown below. Click the "Create" button once the fields have been filled.

Your created dataset will look as follows.

Copy Dataset

To copy a dataset, navigate to the dataset you would like to copy. On the dataset card, select the "Copy Dataset" from the dataset options as shown below.

This will open a new dialog for the user to specify the "Destination". The "Destination" will be the location of the copied dataset. The "Source" will be set by default to the current dataset card you've selected. However, you can also modify the location here. In the example below, the original dataset is the "Source" which is the "Coffee Cup" dataset from the "Sample Project". The copied dataset will be placed as specified in the "Destination" fields. By default a new dataset container will be created in the specified project. However, you can create a dataset container before copying and specify this dataset container in the "Destination" fields.

Once the options are specified, go ahead and click "Apply" at the bottom right to start the copy process. The progress for the dataset copy will be shown on the new dataset card that was created in the project destination that was specified.

Once the copying process completes, the frames and the annotations have been copied.

Original Dataset	Copied Dataset

Combine Datasets

The process of combining datasets consists of multiple copy processes on a given dataset container. To combine datasets, first create a dataset container. Follow the process for copying a dataset onto the destination dataset container that was created. The copy process will copy the selected dataset onto the same dataset container and thus combining multiple datasets.

Split Dataset

Partitioning the dataset is crucial in reserving dataset portions used for training and portions used for validation to assess the performance of the model. In EdgeFirst Studio, the partitions are 80% towards training and 20% towards validation. This operation randomly shuffles the data prior to assigning them to the specified groups.

Warning

The dataset needs to be re-split whenever new sample images or frames are added to the dataset. Newly added samples are not automatically added to any group that already exists.

Consider the following dataset without any groups reserved.

To create the dataset groups, click on the "+" button in the "Groups" field.

This will open a new dialog to specify the percentages of the partition belonging to the "Training" group or "Validation" group. By default 80% of the samples will be dedicated to training and 20% remaining will be dedicated towards the validation samples.

Once the groups are specified, click "Split" to create the groups. This will automatically divide the samples in the dataset based on the percentages of each group specified.

Export Dataset

Coming Soon