Dataset Management

This page will provide tutorials for managing datasets in EdgeFirst Studio.

Upload MCAPs

This tutorial shows how to upload a MCAP recording. For uploading EdgeFirst Datasets, please see the instructions for Upload from Zip/Arrow File.

In EdgeFirst Studio, select "Data Snapshots" under the tool options.

Note

A project has already been created intended for object detection. This step has been covered in Getting Started.

Once you are in the "Data Snapshots" page, upload the recorded MCAP by clicking "From File" which opens a new window dialog for selecting the MCAP downloaded in your PC.

Once the MCAP file is selected, this would start the upload progress in EdgeFirst Studio. This upload progress may take several minutes depending on the size of the MCAP. Once the upload is complete, the status will be shown like the figure on the right.

Upload Progress	Completed Upload

For instructions on auto-annotating uploaded MCAPs, see Auto Annotations via Snapshot.

Importing Darknet Datasets

This tutorial will show how to import a darknet dataset into EdgeFirst Studio. For importing EdgeFirst Datasets, please see the instructions for Upload from Zip/Arrow File.

This tutorial will show importing a dataset such as COCO128 as an example.

To import a dataset, first create a dataset container. The following dataset is created with the name set to "Coco128" and the description as "Demo import". Furthermore, an annotation set has also been created called "Ground Truth".

For an example dataset, COCO128 was downloaded using the link provided. This will download a ZIP archive which can then be extracted.

Once a container has been created, open the dataset options denoted by the three vertical dots on the top right corner of the dataset card.

Select "Import".

This will popup a new window for you to specify the dataset to be imported. In these options, select the "Import Type" to be "Darknet". Specify the dataset folder "coco128" to be imported. Specify the annotation set to the "Ground Truth" annotation set. The following figure shows the specifications.

The "coco128" dataset that was specified contains the "images" and "labels" subdirectories.

Select "Start Import" at the bottom right to start the import process.

This will start the import process as shown.

Once completed, refresh the page to see the changes. The dataset container will now contain 128 images from COCO and the annotations stored in the "Ground Truth" container.

To view the dataset, refer to the instructions provided in Viewing Datasets.

Viewing Datasets

This tutorial will show how to open the gallery of the dataset to see the individual samples in the dataset.

From the "Projects" page, you can click on the dataset button indicated in red to view the datasets contained in the project.

You will now see the datasets contained in the project. Each dataset has a gallery. To see the images in the gallery, open the gallery by clicking the gallery button indicated in red.

When clicking the gallery button, you will either see the images in the dataset for Image-Based Datasets or sequences for Sequence-Based Datasets.

For sequence-based datasets, you need to specify which sequence you would like to view. This can be done by clicking on the sequence.

When the sequence is clicked, you will now see the frames stored in the sequence along with the annotations.

Verifying Datasets

This tutorial will show an example of a dataset that is ready for training.

Verify that the dataset has a training and validation split. The sample dataset shown below has a dedicated split for training (20066 samples) and validation (2229 samples).

Another sample dataset shown below is for training Vision models which has a dedicated split for training (1656 samples) and validation (184 samples).

Verify the contents of the dataset and the annotations. Click the button that navigates to the gallery. This will show the contents of the dataset. The dataset may be comprised of multiple sequences as shown below.

Clicking on any of these sequences will open individual images in the sequence with the visualizations of the annotations. For more information please see Viewing Datasets above.

Info

Datasets that train Fusion models provide annotations of the object's 3D bounding box. For more information on the dataset annotations, please see EdgeFirst Dataset Format.

Info

Datasets that train Vision models provide image annotations of the object's 2D bounding box and segmentation mask. For more information on the dataset annotations, please see EdgeFirst Dataset Format.

For cases where the annotations need corrections, please see Audit 2D Annotations or Audit 3D Annotations for more details.

Creating Datasets

This tutorial will show how to create an empty dataset container in EdgeFirst Studio.

To create a dataset, first select the project to store the new dataset. Next click the dataset button indicated in red to view the datasets in that selected project.

Next create a new dataset by clicking the "New Dataset" button indicated in red on the top right.

Provide the dataset name and the dataset desciption for this new dataset. Once the fields are filled, click the "Create" button on the bottom left of the window dialog.

Once created, define an annotation set. The annotation set is a container for storing the annotations. To create an annotation set, click the "+" button in the "Annotation Sets" field.

Next provide the name and description for the annotation container as shown below. Once provided, click "Create New Set" to create the annotation set.

You have now created a dataset and an annotation set container as shown below. This container can be used to store copied or combined datasets.

Copying Datasets

This tutorial will show how to copy datasets.

To copy a dataset, navigate to the dataset you would like to copy. On the dataset card, select the "Copy Dataset" from the dataset options as shown below.

This will open a new dialog for the user to specify the "Destination Dataset". The "Destination Dataset" will be the location of the copied dataset. The "Source Dataset" will be set by default to the current dataset card you've selected. However, you can also modify the location here. In the example below, the original dataset is the "Source Dataset" which is the "Raivin Ultra Short 25.03" dataset from the "Sample Project". The copied dataset will be placed as specified in the "Destination Dataset" fields. By default a new dataset container will be created in the specified project. However, you can create a dataset container before copying and specify this dataset container under "Dataset" in the "Destination Dataset" fields.

Once the options are specified, go ahead and click "Apply" at the bottom right to start the copy process.

The progress for the dataset copy will be shown on the new dataset card that was created in the project destination that was specified.

Once the copying process completes, the frames and the annotations have been copied.

Original Dataset	Copied Dataset

Combining Datasets

The process of combining datasets consists of multiple copy processes on a given dataset container. To combine datasets, first create a dataset container. Follow the process for copying a dataset onto the destination dataset container that was created. The copy process will copy the selected dataset onto the same dataset container and thus combining multiple datasets.

Splitting Datasets

A proper dataset has samples reserved for training and validation. This tutorial will show how to split the samples in the dataset into training and validation groups. This operation randomly shuffles the data prior to assigning them to the specified groups.

Warning

This operation needs to be done whenever new sample images or frames are added to the dataset. Newly added samples are not automatically added to any group that already exists.

Consider the following dataset without any groups reserved.

To create the dataset groups, click on the "+" button in the "Groups" field.

This will open a new dialog to specify the percentages of the partition belonging to the "Training" group or "Validation" group. By default 80% of the samples will be dedicated to training and 20% remaining will be dedicated towards the validation samples.

Once the groups are specified, click "Add Groups" to create the groups. This will automatically divide the samples in the dataset based on the percentages of each group specified.

Exporting Datasets

Coming Soon