Skip to content

Annotation Schema & Fields

This page documents all the fields in your EdgeFirst annotations, what they mean, and how to use them.

How Annotations Are Created

Annotations in EdgeFirst are created through several methods:

Method Fields Populated Source
Manual annotation label, box2d, mask User draws in Instance Dashboard
AGTG (Automatic) label, box2d, mask, status SAM-2 AI auto-detection
Model inference label, box2d, box3d Trained model predictions
Import from snapshot All fields Restored from Arrow file

AGTG fills annotation fields automatically

When you run Automatic Ground Truth Generation (AGTG) on a dataset, EdgeFirst Studio uses SAM-2 to detect objects and populate:

  • label: Object class detected
  • box2d: Bounding box coordinates (center-based)
  • mask: Pixel-level segmentation polygon
  • status: Annotation quality indicator

You can then review and adjust these annotations in the Instance Dashboard.

Understanding the Annotation Structure

Each annotation describes one labeled object in one sample (image or frame). An annotation contains:

%%{init: {'flowchart': {'padding': '40'}}}%%
graph TB
    Ann["📝 Annotation"]
    
    Ann -->|"Identifies"| What["🏷️ What (label, class)"]
    Ann -->|"Locates"| Where["📍 Where (box2d, box3d, mask)"]
    Ann -->|"Tracks"| ID["🔗 Identity (object_id)"]
    Ann -->|"Categorizes"| Split["📊 Split (group)"]
    
    style Ann fill:#e1f5ff,stroke:#0277bd,stroke-width:2px
    style What fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style Where fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    style ID fill:#f8bbd0,stroke:#c2185b,stroke-width:2px
    style Split fill:#d1c4e9,stroke:#5e35b1,stroke-width:2px

Core Fields

name

Type: String
What it is: Sample identifier—links the annotation to a specific image or frame
Example: sequence_001_042 or background_image_1

How it's derived:

  • For sequences: Filename without extension and frame number
  • For images: Filename without extension

Examples:

scene_001.camera.jpg      →  name = "scene_001"
deer_sequence_042.jpg     →  name = "deer_sequence" 
image_background.png      →  name = "image_background"

frame

Type: UInt64 (nullable, can be null)
What it is: Frame number within a sequence (0-indexed)
Example: 42 for the 42nd frame, or null for standalone images

When it's used:

  • Sequences: frame = {number} (e.g., 0, 1, 2, ...)
  • Standalone images: frame = null

The combination of (name, frame) uniquely identifies a sample in the dataset.

label

Type: String (Categorical)
What it is: The object classification—what is this thing?
Examples: "person", "car", "tree", "bicycle"

label_index

Type: UInt64
What it is: Numeric index for the label (used by ML models)
Example: In COCO dataset, "person"=0, "car"=2, "bicycle"=1

Why it exists: Pre-trained models expect numeric indices, not strings. The mapping ensures consistency.

object_id

Type: String (UUID strongly recommended)
What it is: Unique identifier for tracking objects across frames and linking annotations
Examples:

550e8400-e29b-41d4-a716-446655440000  (UUID - recommended)
car_track_005
person_01

Use cases:

  • Track the same object across multiple frames in a sequence
  • Link a 2D box to a segmentation mask on the same object
  • Enable object-level queries ("show me all frames with object X")

Best practice: Use UUIDs for guaranteed uniqueness across datasets

group

Type: String (Categorical, nullable)
What it is: Optional dataset split assignment—which set does this sample belong to?
Values: null, "train", "val", "test", or any custom string

Default behavior in Studio: When you split a dataset in EdgeFirst Studio, it assigns "train" and "val" by default. You can also use custom split names for your specific workflow.

Important: This is a sample-level field, not per-annotation. All annotations from the same sample have the same group value.

Typical distribution (when used):

  • 70% train
  • 20% validation
  • 10% test

Geometry Fields

These fields describe where the object is located in the image or 3D world.

box2d

Type: Array(Float32, shape=(4,))
Format: [cx, cy, width, height] (center-based)
Coordinate system: Normalized (0–1), top-left origin

Values:

  • cx: Box center x-coordinate (0=left edge, 1=right edge)
  • cy: Box center y-coordinate (0=top edge, 1=bottom edge)
  • width: Box width as fraction of image width
  • height: Box height as fraction of image height

Example: [0.5, 0.5, 0.2, 0.3] means a box centered in the image, 20% of image width, 30% of image height

Visual:

Image
(0,0) ─────────────────────────→ x=1
  │
  │       (0.5, 0.5)
  │         ●───────┐
  │         │ 0.2w  │ 0.3h
  │         └───────┘
  │
  ▼
  y=1

Learn more in Bounding Box Formats.

box3d

Type: Array(Float32, shape=(6,))
Format: [x, y, z, length, width, height]
Coordinate system: ROS/Ouster (X=forward, Y=left, Z=up)
Units: Meters (normalized 0–1 in some contexts)

Values:

  • x, y, z: Box center in 3D world space
  • length: Dimension along X axis (forward/backward)
  • width: Dimension along Y axis (left)
  • height: Dimension along Z axis (typically vertical/up)

Example: [5.0, -2.0, 1.5, 2.0, 1.8, 4.5] means an object 5m ahead, 2m to the right, 1.5m high

mask

Type: List(Float32)
What it is: Pixel-level segmentation—precise boundary of the object
Format: Flattened array with NaN separators for multiple polygons

Structure:

# Single polygon: [x1, y1, x2, y2, x3, y3, ...]
# Multiple polygons: [x1, y1, x2, y2, ..., NaN, x4, y4, ...]
#                                      ↑ polygon separator

Example (single polygon around a person):

[0.4, 0.3, 0.45, 0.25, 0.5, 0.25, 0.52, 0.28, 0.5, 0.4, 0.45, 0.42, 0.4, 0.35]

Coordinate system: Normalized (0–1), same as 2D boxes

Learn more in Annotation Schema and the official format docs.

Sample Metadata Fields

These fields describe properties of the sample (image), not individual annotations. In Arrow format, they're repeated for each annotation row from the same sample.

size

Type: Array(UInt32, shape=(2,))
Format: [width, height]
What it is: Image dimensions in pixels

Example: [1920, 1080] for a Full HD image

Usage:

# Access in Arrow/DataFrame
width = df['size'][0]
height = df['size'][1]

# Convert from normalized to pixel coordinates
pixel_x = normalized_x * width
pixel_y = normalized_y * height

location

Type: Array(Float32, shape=(2,))
Format: [latitude, longitude]
What it is: GPS coordinates where the image was captured

Example: [37.7749, -122.4194] (San Francisco)

Source:

  • EXIF metadata in image
  • MCAP NavSat topic
  • Manual entry

Note: Altitude may be added in future versions

pose

Type: Array(Float32, shape=(3,))
Format: [roll, pitch, yaw]
What it is: IMU orientation of the camera when image was captured

Values in degrees:

  • roll: Rotation around X axis (-180 to 180°)
  • pitch: Rotation around Y axis (-90 to 90°)
  • yaw: Rotation around Z axis (-180 to 180°)

Example: [0.5, -1.2, 45.3] means slightly tilted, pitched down, rotated 45° counterclockwise

Source:

  • MCAP IMU topic
  • IMU sensor readings
  • Manual entry

degradation

Type: String (nullable)
What it is: Visual quality indicator—how compromised is the camera view?

Typical values:

  • "none": Perfect view, objects fully visible
  • "low": Slight obstruction, targets clearly visible
  • "medium": Higher obstruction, targets visible but not obvious
  • "high": Severe obstruction, objects cannot be seen

Examples of degradation:

  • Fog, rain, snow
  • Camera obstruction (dirt, condensation)
  • Low light, night
  • Backlighting

Use cases:

  • Filter training data by quality level
  • Train robust models for adverse weather
  • Identify which sensor to trust (use radar when camera degraded)

Complete Example

Here's a complete annotation with all fields:

{
    # Sample identification
    "name": "sequence_001_042",
    "frame": 42,
    
    # Object identification
    "label": "person",
    "label_index": 0,
    "object_id": "550e8400-e29b-41d4-a716-446655440000",
    
    # Dataset split
    "group": "train",
    
    # Geometry
    "box2d": [0.5, 0.5, 0.2, 0.3],
    "box3d": [5.0, -2.0, 1.5, 2.0, 1.8, 4.5],
    "mask": [0.48, 0.4, 0.52, 0.4, 0.52, 0.6, 0.48, 0.6],
    
    # Sample metadata
    "size": [1920, 1080],
    "location": [37.7749, -122.4194],
    "pose": [0.5, -1.2, 45.3],
    "degradation": "low"
}

Optional Fields

Some fields may be null or missing depending on your dataset:

Field When Null Reason
frame Always for images Images don't have frame numbers
box2d Sometimes Only 3D annotations or image without 2D box
box3d Sometimes Only 2D annotations
mask Often Not all datasets include segmentation
object_id Rarely Required for tracking
location Often Not all images have GPS data
pose Often Not all images have IMU data
degradation Often Optional quality indicator

Querying Annotations

Once you understand the schema, you can query your annotations:

import polars as pl

df = pl.read_ipc("dataset.arrow")

# Get all person detections
people = df.filter(pl.col("label") == "person")

# Get training split
train_data = df.filter(pl.col("group") == "train")

# Find annotations with 3D boxes
boxes_3d = df.filter(pl.col("box3d").is_not_null())

# Find samples captured in San Francisco
sf_samples = df.filter(
    (pl.col("location")[0] > 37.77) & (pl.col("location")[0] < 37.78)
)

# Track an object across frames
object_track = df.filter(pl.col("object_id") == "550e8400-e29b-41d4-a716-446655440000")

print(f"Total annotations: {len(df)}")
print(f"Unique objects: {df['object_id'].n_unique()}")
print(f"Date range: {df['name'].min()} to {df['name'].max()}")

Further Reading