Upgrading from 2025.10 to 2026.04

This guide covers the breaking changes introduced in schema version 2026.04 and how to migrate existing datasets and code.

What Changed

Area	2025.10	2026.04	Impact
Polygon storage	`mask: List<Float32>` with NaN separators	`polygon: List<List<Float32>>` nested lists	Column name and type changed
Mask type	`List<Float32>` (polygon data)	`Binary` (PNG-encoded raster pixels)	Column type changed; semantics changed
`label_index` semantics	Alphabetically re-indexed (0-based, contiguous)	Source-faithful `category_id` (non-contiguous, preserves gaps)	Existing files remain valid; new exports may differ
`iscrowd` type	`UInt8` (0/1)	`Boolean` (true/false)	Column type changed
New columns	N/A	`polygon_score`, `mask_score`, `box2d_score`, `box3d_score`, `timing`, `iscrowd`, `category_frequency`, `neg_label_indices`, `not_exhaustive_label_indices`	Additive (non-breaking for readers that ignore unknown columns)
File metadata	None	`schema_version`, `box2d_format`, `box2d_normalized`, `category_metadata`, `labels`, etc.	Additive
JSON structure	Bare array `[...]`	Object wrapper `{"schema_version": ..., "samples": [...]}`	Readers must detect top-level type
LiDAR sensors	`.lidar.png`, `.lidar.jpeg`	Removed	Breaking for pipelines that depend on projected LiDAR images
Parquet support	N/A	`.parquet` files supported	New capability

Old code will produce corrupt data

Code that reads the mask column as List<Float32> and splits on NaN values will fail or produce incorrect results when applied to 2026.04 files where mask is Binary (PNG-encoded raster data). Additionally, iscrowd changed from UInt8 to Boolean. Always check the schema version before processing.

Migration Command

Planned for edgefirst-client 2.10.0

The edgefirst migrate command is planned for the EdgeFirst Client SDK version 2.10.0. It is not yet available in current releases.

edgefirst migrate <input.arrow> [--output <output.arrow>]

The migration utility will perform these steps:

Reads the 2025.10 mask: List<Float32> column with NaN separators
Converts to polygon: List<List<Float32>> (split on NaN, pair coordinates)
Removes the old mask column (no raster data to migrate — it did not exist in 2025.10)
Sets schema_version = "2026.04" in file metadata
Writes the new file (preserving all other columns unchanged)

This is a lossless conversion for polygon data. Raster masks are a new capability with no migration path (they did not exist in 2025.10).

Version Detection

Arrow / Parquet Files

Signal	Interpretation
`schema_version` metadata = `"2026.04"`	2026.04 format
`schema_version` absent + `mask: List<Float32>`	2025.10 — NaN-separated polygon data in `mask`
`schema_version` absent + no `mask` / no `polygon`	2025.10 — no geometry
`schema_version` absent + `mask: Binary`	2026.04 — type is unambiguous
`polygon` column present	2026.04

Robustness rule: If the physical type of the mask column is Binary, treat as 2026.04 regardless of schema_version presence. The column type itself is unambiguous.

JSON Files

Signal	Interpretation
Top-level is a JSON array `[...]`	2025.10 — bare array of samples
Top-level is a JSON object with `schema_version`	2026.04 — metadata wrapper

import json

with open("annotations.json") as f:
    data = json.load(f)

if isinstance(data, list):
    # 2025.10 legacy
    samples = data
else:
    # 2026.04
    samples = data["samples"]
    version = data.get("schema_version")

Reading Both Versions

import polars as pl

def read_dataset(path: str):
    """Read an EdgeFirst dataset, handling both 2025.10 and 2026.04."""
    if path.endswith(".parquet"):
        df = pl.read_parquet(path)
    else:
        df = pl.read_ipc(path)

    # Detect version
    if "polygon" in df.columns:
        return read_2026_04(df)

    if "mask" in df.columns:
        mask_dtype = str(df["mask"].dtype)
        if mask_dtype.startswith("List(Float32"):
            return read_2025_10(df)
        elif str(mask_dtype) == "Binary":
            return read_2026_04(df)

    # No geometry columns — compatible with either version
    return df


def read_2025_10(df: pl.DataFrame):
    """Handle 2025.10 NaN-separated polygon data in the mask column."""
    # Convert mask: List<f32> (NaN-separated) -> polygon: List<List<f32>>
    # This is what `edgefirst migrate` does
    print("2025.10 format detected — consider running: edgefirst migrate")
    return df


def read_2026_04(df: pl.DataFrame):
    """Handle 2026.04 format with polygon and raster mask columns."""
    # polygon: List<List<f32>> — interleaved xy pairs per ring
    # mask: Binary — PNG-encoded raster pixels
    return df

Code Migration Checklist

If you read the `mask` column directly

# OLD (2025.10) — WILL BREAK on 2026.04 files
mask_data = row["mask"]  # List<f32> with NaN separators
rings = split_on_nan(mask_data)

# NEW (2026.04)
polygon_data = row["polygon"]  # List<List<f32>>, already split into rings
for ring in polygon_data:
    points = list(zip(ring[0::2], ring[1::2]))

If you check `box2d_format`

# OLD (2025.10) — assumed cxcywh for Arrow, ltwh for JSON
cx, cy, w, h = row["box2d"]

# NEW (2026.04) — check metadata
# box2d_format metadata describes the layout; default is still cxcywh for Arrow
cx, cy, w, h = row["box2d"]  # same default, but verify via metadata

If you process LiDAR visualizations

# OLD (2025.10) — projected LiDAR images
depth_image = load("frame_001.lidar.png")
reflect_image = load("frame_001.lidar.jpeg")

# NEW (2026.04) — project from PCD yourself
pcd = load("frame_001.lidar.pcd")
depth_image = project_to_depth(pcd, calibration)

FAQ

Q: Can I read 2026.04 files with old SDK versions?

No. SDK versions prior to 3.0 do not understand the polygon column or the Binary mask type. Update the EdgeFirst Client SDK to version 3.0 or later.

Q: Do I need to migrate all my datasets at once?

No. The EdgeFirst Client SDK 3.0+ reads both 2025.10 and 2026.04 files transparently. Migrate when convenient — there is no deadline.

Q: What happens to raster mask data during migration?

Raster masks (Binary, PNG-encoded) are a new capability in 2026.04. The 2025.10 mask column contained polygon data (NaN-separated List<Float32>), not raster data. Migration moves polygon data to the new polygon column and removes the old mask column. There is no raster data to lose.

Q: Can polygon and mask coexist in the same file?

Yes. A 2026.04 file can have both polygon (vector contours) and mask (raster pixels) columns populated. This supports use cases like panoptic segmentation where instance polygons and semantic raster masks are both needed.

Q: How do I tell if a JSON file is 2025.10 or 2026.04?

Check the top-level structure. 2025.10 JSON files are a bare array [...]. 2026.04 files are an object {"schema_version": "2026.04", "samples": [...]}.

Q: My label_index values changed after re-importing a dataset. Is that expected?

Yes, if you are importing a file that was created with the 2025.10 SDK. The 2025.10 SDK assigned label_index values using alphabetical ordering of category names (0-based, contiguous). The 2026.04 SDK preserves the original source category_id as label_index (non-contiguous, may not start at 0). Both are valid; the 2026.04 behavior is correct for round-trip fidelity with COCO and LVIS datasets.

Q: Does the COCO importer support LVIS annotations?

Yes. The COCO format importer (--format coco) handles LVIS extensions automatically. LVIS-specific fields (neg_category_ids, not_exhaustive_category_ids, category frequency/synset/synonyms/def) are parsed when present and mapped to the corresponding EdgeFirst columns and file-level metadata. No separate LVIS mode is needed.

Q: What are neg_label_indices and not_exhaustive_label_indices?

These are LVIS-specific sample-level columns that support the LVIS federated annotation evaluation protocol. neg_label_indices lists categories confirmed absent from an image (valid false positives). not_exhaustive_label_indices lists categories with potentially incomplete annotation (unmatched predictions are ignored). Both reference label_index values. They are optional and absent from non-LVIS datasets.