Skip to content

Bounding Box Formats

This page explains how 2D bounding boxes work in the EdgeFirst Dataset Format, including coordinate systems, normalized coordinates, and the differences between Arrow and JSON formats.

Understanding Normalized Coordinates

All coordinates in EdgeFirst are normalized to 0–1 range. This makes annotations independent of image resolution—the same annotation works for 640×480 images or 4K (3840×2160) images.

normalized_x = pixel_x / image_width
normalized_y = pixel_y / image_height

Why Normalization?

%%{init: {'flowchart': {'padding': '40'}}}%%
graph TB
    subgraph Raw["Pixel Coordinates (Resolution-Specific)"]
        P1["640×480 image - pixel 320,240"]
        P2["3840×2160 image - pixel 1920,1080"]
    end
    
    subgraph Norm["Normalized (Universal)"]
        N["Both = (0.5, 0.5) - center of image"]
    end
    
    P1 -->|"÷ 640, ÷ 480"| N
    P2 -->|"÷ 3840, ÷ 2160"| N
    
    style Raw fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    style Norm fill:#c8e6c9,stroke:#388e3c,stroke-width:2px

Benefits:

  • Resize images without updating annotations
  • Combine datasets with different resolutions
  • Share models across camera resolutions

Coordinate Systems

EdgeFirst uses top-left origin for image coordinates (same as most image libraries):

(0,0) ─────────────────> x (normalized: 0 to 1)
  │
  │     Box example:
  │     ┌─────┐ (cx, cy) = center
  │     │  ●  │ 
  │     └─────┘
  │     width, height
  ▼
  y (normalized: 0 to 1)

EdgeFirst Box2D Format

In the EdgeFirst Dataset Format, 2D bounding boxes are stored in the box2d column as a fixed-size array.

Arrow Format (Primary)

The Arrow file stores box2d as a center-based array:

box2d: Array(Float32, shape=(4,))  # [cx, cy, width, height]
Index Field Description
0 cx Center X coordinate (normalized 0–1)
1 cy Center Y coordinate (normalized 0–1)
2 width Box width (normalized 0–1)
3 height Box height (normalized 0–1)

Example:

box2d = [0.691406, 0.368056, 0.015104, 0.050926]
#        cx        cy        width     height

This format aligns with ML frameworks like YOLO, making it efficient for training pipelines.

JSON Format (Legacy)

The JSON format uses a top-left corner representation for legacy Studio API compatibility:

{
  "box2d": {
    "x": 0.683854,
    "y": 0.342593,
    "w": 0.015104,
    "h": 0.050926
  }
}
Field Description
x Left edge (normalized 0–1)
y Top edge (normalized 0–1)
w Box width (normalized 0–1)
h Box height (normalized 0–1)

Format Difference

The Arrow and JSON formats use different coordinate origins:

  • Arrow: Center-based [cx, cy, w, h]
  • JSON: Top-left {x, y, w, h} (legacy)

The edgefirst_client library handles these conversions automatically.

Conversion Between Formats

Arrow → JSON (center to top-left):

x = cx - width / 2
y = cy - height / 2
# w, h stay the same

JSON → Arrow (top-left to center):

cx = x + width / 2
cy = y + height / 2
# w, h stay the same

Box2D in the Schema

The box2d field is one column in the annotation schema. Related fields include:

Column Type Description
label Categorical Object class (e.g., "person", "car")
label_index UInt64 Numeric index for ML models
box2d Array(Float32, 4) 2D bounding box [cx, cy, w, h]
box3d Array(Float32, 6) 3D bounding box
mask List(Float32) Segmentation polygon
object_id String UUID for tracking across frames

See Annotation Schema for the complete field reference.

How boxes are created

In EdgeFirst Studio, bounding boxes can be created through:

All methods store boxes in the center-based format in the Arrow file.

edgefirst-client abstracts format differences

The edgefirst_client Python library handles format conversions automatically, allowing you to work with your preferred coordinate system regardless of how annotations are stored internally.

3D Bounding Boxes

The box3d column stores 3D bounding boxes in world coordinates:

box3d: Array(Float32, shape=(6,))  # [x, y, z, width, height, length]
Index Field Description
0 x Center X in meters
1 y Center Y in meters
2 z Center Z in meters
3 width Width (Y-axis)
4 height Height (Z-axis)
5 length Length (X-axis)

Coordinate frame: ROS convention (X=forward, Y=left, Z=up)

Origin: Center of capture device (e.g., Maivin) at (0, 0, 0)

Consistent format

Unlike box2d, the box3d format is identical in both Arrow and JSON—both use center-point representation.

Common Box Format Standards

For reference, here's how EdgeFirst compares to other common formats:

Format Representation Used By
EdgeFirst Arrow [cx, cy, w, h] (center) Arrow files, ML training
EdgeFirst JSON {x, y, w, h} (top-left) Legacy Studio API
YOLO cx cy w h (center, normalized) Darknet, Ultralytics
COCO [x, y, w, h] (top-left, pixels) MS COCO dataset
Pascal VOC [x1, y1, x2, y2] (corners, pixels) Pascal VOC dataset

The EdgeFirst Arrow format aligns with YOLO conventions, making it straightforward to use with popular ML frameworks.

Debugging Boxes

If your boxes look wrong, check these common issues:

Problem Cause Fix
Box appears too small Coordinates in pixels, not normalized Divide by image width/height
Box off-center Using JSON format as Arrow Convert: cx = x + w/2
Box position wrong Coordinate origin confusion Verify top-left (0,0) origin
Box too large Swapped width/height Check order of w and h

Further Reading