Directory Structure
EdgeFirst datasets follow a consistent directory layout where the annotation file and the sensor data container share the same base name.
File Naming
Annotation files use the dataset base name with the format extension:
dataset_name/
├── dataset_name.arrow # Arrow IPC (default)
│ # — OR —
├── dataset_name.parquet # Parquet (transfer)
│ # — OR —
├── dataset_name.json # JSON (human-readable)
└── dataset_name/ # Sensor container (directory or .zip)
Exactly one annotation file per dataset directory — choose a single format from
.arrow, .parquet, or .json. The tree above shows the three supported alternatives;
do not include more than one annotation file for the same dataset. The sensor container
directory name matches the dataset base name regardless of which annotation file format
you choose.
Dataset Layouts
EdgeFirst supports three organizational patterns.
1. Sequence-Based Datasets
Video frames with temporal ordering (from MCAP recordings or video files):
deer_dataset/
├── deer_dataset.arrow
└── deer_dataset/
└── 9331381uhd_3840_2160_24fps/
├── 9331381uhd_3840_2160_24fps_110.camera.jpeg
├── 9331381uhd_3840_2160_24fps_111.camera.jpeg
└── ...
File naming convention:
- Sequence format:
{hostname}_{date}_{time}(from MCAP) - Frame format:
{sequence_name}_{frame_number}.{sensor}.{ext}
2. Image-Based Datasets
Standalone images without temporal ordering:
coco_subset/
├── coco_subset.arrow
└── coco_subset/
├── image001.jpg
├── image002.jpg
└── ...
3. Mixed Datasets
Combination of sequences and standalone images:
mixed_dataset/
├── mixed_dataset.arrow
└── mixed_dataset/
├── sequence_A/
│ ├── sequence_A_001.camera.jpeg
│ └── sequence_A_002.camera.jpeg
├── standalone_image1.jpg
└── standalone_image2.jpg
Multi-Sensor Examples
A single frame can include multiple sensor modalities:
sensor_fusion/
├── sensor_fusion.parquet
└── sensor_fusion/
└── drive_2026_03_18/
├── drive_2026_03_18_001.camera.jpeg
├── drive_2026_03_18_001.radar.png
├── drive_2026_03_18_001.radar.pcd
├── drive_2026_03_18_001.lidar.pcd
├── drive_2026_03_18_002.camera.jpeg
├── drive_2026_03_18_002.radar.png
└── ...
Flattened Structure
As an alternative to nested subdirectories, datasets may use a flat layout with sequence prefixes:
dataset_name/
├── dataset_name.arrow
└── dataset_name/
├── sequence_A_001.camera.jpeg
├── sequence_A_002.camera.jpeg
├── sequence_B_001.camera.jpeg
└── standalone_image.jpg
The EdgeFirst Client SDK detects the layout automatically — no manual configuration is needed.
ZIP Format
EdgeFirst supports ZIP64 as an alternative to directories for the sensor container:
dataset_name/
├── dataset_name.arrow
└── dataset_name.zip # sensor data in ZIP
ZIP64 provides:
- Random access via file index
- Uncompressed storage recommended (JPEG and PNG are already compressed; PCD and other formats may benefit from ZIP compression)
- Cross-platform support
Sensor File Extensions
| Extension | Sensor | Description |
|---|---|---|
.camera.jpeg |
Camera | Camera image (default) |
.camera.png |
Camera | Camera image (lossless) |
.jpg, .png |
Camera | Generic image formats |
.radar.pcd |
Radar | Radar point cloud |
.radar.png |
Radar | Radar data cube (16-bit PNG) |
.lidar.pcd |
LiDAR | LiDAR point cloud |