Skip to content

Dataset Management

This page will provide tutorials for managing datasets in EdgeFirst Studio.

Upload MCAPs

This tutorial shows how to upload a MCAP recording. For uploading EdgeFirst Datasets, please see the instructions for Upload from Zip/Arrow File.

In EdgeFirst Studio, select "Data Snapshots" under the tool options.

Data Snapshots
Data Snapshots

Note

A project has already been created intended for object detection. This step has been covered in Getting Started.

Once you are in the "Data Snapshots" page, upload the recorded MCAP by clicking "From File" which opens a new window dialog for selecting the MCAP downloaded in your PC.

Upload MCAP
Upload MCAP

Once the MCAP file is selected, this would start the upload progress in EdgeFirst Studio. This upload progress may take several minutes depending on the size of the MCAP. Once the upload is complete, the status will be shown like the figure on the right.

Upload Progress Completed Upload
Progress Complete

For instructions on auto-annotating uploaded MCAPs, see Auto Annotations via Snapshot.

Importing Darknet Datasets

This tutorial will show how to import a darknet dataset into EdgeFirst Studio. For importing EdgeFirst Datasets, please see the instructions for Upload from Zip/Arrow File.

This tutorial will show importing a dataset such as COCO128 as an example.

To import a dataset, first create a dataset container. The following dataset is created with the name set to "Coco128" and the description as "Demo import". Furthermore, an annotation set has also been created called "Ground Truth".

COCO128 Dataset Container
COCO128 Dataset Container

For an example dataset, COCO128 was downloaded using the link provided. This will download a ZIP archive which can then be extracted.

Once a container has been created, open the dataset options denoted by the three vertical dots on the top right corner of the dataset card.

Dataset Options
Dataset Options

Select "Import".

Import Option
Import Option

This will popup a new window for you to specify the dataset to be imported. In these options, select the "Import Type" to be "Darknet". Specify the dataset folder "coco128" to be imported. Specify the annotation set to the "Ground Truth" annotation set. The following figure shows the specifications.

Import Options
Import Options

The "coco128" dataset that was specified contains the "images" and "labels" subdirectories.

COCO128
COCO128

Select "Start Import" at the bottom right to start the import process.

Start Import
Start Import

This will start the import process as shown.

Import Process
Import Process

Once completed, refresh the page to see the changes. The dataset container will now contain 128 images from COCO and the annotations stored in the "Ground Truth" container.

Imported COCO128 Dataset
Imported COCO128 Dataset

To view the dataset, refer to the instructions provided in Viewing Datasets.

Viewing Datasets

This tutorial will show how to open the gallery of the dataset to see the individual samples in the dataset.

From the "Projects" page, you can click on the dataset button indicated in red to view the datasets contained in the project.

View Datasets
View Datasets

You will now see the datasets contained in the project. Each dataset has a gallery. To see the images in the gallery, open the gallery by clicking the gallery button indicated in red.

Gallery Button
Gallery Button

When clicking the gallery button, you will either see the images in the dataset for Image-Based Datasets or sequences for Sequence-Based Datasets.

For sequence-based datasets, you need to specify which sequence you would like to view. This can be done by clicking on the sequence.

Dataset Sequence
Dataset Sequence

When the sequence is clicked, you will now see the frames stored in the sequence along with the annotations.

Dataset Sequence
Dataset Sequence

Verifying Datasets

This tutorial will show an example of a dataset that is ready for training.

Verify that the dataset has a training and validation split. The sample dataset shown below has a dedicated split for training (20066 samples) and validation (2229 samples).

Dataset Groups
Fusion Dataset Groups

Another sample dataset shown below is for training Vision models which has a dedicated split for training (1656 samples) and validation (184 samples).

Dataset Groups
Vision Dataset Groups

Verify the contents of the dataset and the annotations. Click the button that navigates to the gallery. This will show the contents of the dataset. The dataset may be comprised of multiple sequences as shown below.

Dataset Sequences
Fusion Dataset Sequences
Dataset Sequences
Vision Dataset Sequences

Clicking on any of these sequences will open individual images in the sequence with the visualizations of the annotations. For more information please see Viewing Datasets above.

Info

Datasets that train Fusion models provide annotations of the object's 3D bounding box. For more information on the dataset annotations, please see EdgeFirst Dataset Format.

Fusion Annotations
Fusion Annotations

Info

Datasets that train Vision models provide image annotations of the object's 2D bounding box and segmentation mask. For more information on the dataset annotations, please see EdgeFirst Dataset Format.

Vision Annotations
Vision Annotations

For cases where the annotations need corrections, please see Audit 2D Annotations or Audit 3D Annotations for more details.

Creating Datasets

This tutorial will show how to create an empty dataset container in EdgeFirst Studio.

To create a dataset, first select the project to store the new dataset. Next click the dataset button indicated in red to view the datasets in that selected project.

Dataset Button
Dataset Button

Next create a new dataset by clicking the "New Dataset" button indicated in red on the top right.

Create Dataset Button
Create Dataset Button

Provide the dataset name and the dataset desciption for this new dataset. Once the fields are filled, click the "Create" button on the bottom left of the window dialog.

Create Dataset Fields
Create Dataset Fields

Once created, define an annotation set. The annotation set is a container for storing the annotations. To create an annotation set, click the "+" button in the "Annotation Sets" field.

Create Annotation Set
Create Annotation Set

Next provide the name and description for the annotation container as shown below. Once provided, click "Create New Set" to create the annotation set.

Annotation Set Fields
Annotation Set Fields

You have now created a dataset and an annotation set container as shown below. This container can be used to store copied or combined datasets.

Created Dataset
Created Dataset

Copying Datasets

This tutorial will show how to copy datasets.

To copy a dataset, navigate to the dataset you would like to copy. On the dataset card, select the "Copy Dataset" from the dataset options as shown below.

Copy Dataset
Copy Dataset

This will open a new dialog for the user to specify the "Destination Dataset". The "Destination Dataset" will be the location of the copied dataset. The "Source Dataset" will be set by default to the current dataset card you've selected. However, you can also modify the location here. In the example below, the original dataset is the "Source Dataset" which is the "Raivin Ultra Short 25.03" dataset from the "Sample Project". The copied dataset will be placed as specified in the "Destination Dataset" fields. By default a new dataset container will be created in the specified project. However, you can create a dataset container before copying and specify this dataset container under "Dataset" in the "Destination Dataset" fields.

Copy Dataset Options
Copy Dataset Options

Once the options are specified, go ahead and click "Apply" at the bottom right to start the copy process.

Copy Dataset Process
Copy Dataset Process

The progress for the dataset copy will be shown on the new dataset card that was created in the project destination that was specified.

Copy Dataset Progress
Copy Dataset Progress

Once the copying process completes, the frames and the annotations have been copied.

Original Dataset Copied Dataset
Original Copied

Combining Datasets

The process of combining datasets consists of multiple copy processes on a given dataset container. To combine datasets, first create a dataset container. Follow the process for copying a dataset onto the destination dataset container that was created. The copy process will copy the selected dataset onto the same dataset container and thus combining multiple datasets.

Splitting Datasets

A proper dataset has samples reserved for training and validation. This tutorial will show how to split the samples in the dataset into training and validation groups. This operation randomly shuffles the data prior to assigning them to the specified groups.

Warning

This operation needs to be done whenever new sample images or frames are added to the dataset. Newly added samples are not automatically added to any group that already exists.

Consider the following dataset without any groups reserved.

No Groups
No Groups

To create the dataset groups, click on the "+" button in the "Groups" field.

Add Groups
Add Groups

This will open a new dialog to specify the percentages of the partition belonging to the "Training" group or "Validation" group. By default 80% of the samples will be dedicated to training and 20% remaining will be dedicated towards the validation samples.

Groups Field
Groups Field

Once the groups are specified, click "Add Groups" to create the groups. This will automatically divide the samples in the dataset based on the percentages of each group specified.

Dataset Groups
Dataset Groups

Exporting Datasets

Coming Soon