MinIO AIStor Tables and Iceberg V3: Genuine Engineering, Premature Ecosystem

MinIO claims AIStor Tables is the first data store with native Apache Iceberg V3 support, embedding the catalog directly into object storage. The technical architecture is sound and the V3 features are real. But with Trino, Athena, and Snowflake still lacking V3 support, being first to an unfinished spec raises questions about who can actually use this today.

MinIO announced general availability of AIStor Tables in February 2026, positioning it as “the first data store in the industry to support Apache Iceberg V3” with the catalog REST API embedded directly into the object store. The article details four major V3 features — deletion vectors, row-level lineage, variant types, and native geospatial types — and argues that embedding the Iceberg catalog eliminates the operational complexity of external catalog services like Hive Metastore or AWS Glue.

The technical claims about Iceberg V3’s capabilities are accurate — they describe real features in the ratified V3 specification. The architectural decision to embed the catalog into the storage layer is genuinely interesting. But MinIO’s “first mover” framing obscures a critical reality: the V3 ecosystem isn’t ready, and being first to implement a spec that most query engines can’t use yet is a different proposition than being first to deliver a usable product.

The V3 Features: Real and Well-Described

MinIO’s article accurately describes four significant improvements in the Iceberg V3 specification.

Deletion vectors replace V2’s accumulating delete files with compact binary bitmaps (Roaring Bitmaps in Puffin format) that mark deleted rows within data files. V2’s approach of creating separate delete files for each delete operation caused read performance to degrade over time as engines had to merge an increasing number of small delete files during queries. V3 requires writers to consolidate deletes into a single vector per file, shifting the merge cost from read-time to write-time. For CDC pipelines and event correction workflows that perform frequent deletes, this is a meaningful performance improvement.

Row-level lineage assigns each row a persistent ID and last-modified sequence number that survive compaction and file rewrites. This enables precise identification of which rows changed between snapshots without comparing entire datasets — a capability that makes incremental processing substantially more efficient. Before V3, engines performing incremental reads had to scan all data files modified since the last snapshot, even if only a few rows actually changed.

Variant types replace opaque string columns for semi-structured data with a binary format that engines can navigate without full parsing. Instead of deserializing entire JSON blobs to evaluate predicates, engines can skip rows based on individual field values at the storage level. For workloads involving event logs, API responses, or IoT telemetry — data sources where schema inconsistency is the norm — this reduces both I/O and CPU overhead.

Native geospatial types add geometry and geography types with bounding box metadata in manifest files, enabling spatial partition pruning. Queries with geographic predicates can skip entire data files whose bounding boxes don’t intersect the query region, replacing full-table scans for location-based filtering.

These are all well-engineered additions to the Iceberg specification. MinIO describes them accurately and explains the practical impact clearly. The technical content of the article is solid.

The Embedded Catalog: Architecturally Sound

MinIO’s decision to embed the Iceberg REST Catalog directly into the object store eliminates an operational dependency that causes real pain in production Iceberg deployments. Running a separate catalog service — Hive Metastore, AWS Glue, Nessie, or Polaris — adds a component that must be deployed, monitored, scaled, and secured independently from the storage layer. The catalog becomes a single point of failure for metadata operations, and catalog-storage consistency requires careful coordination.

By implementing the Iceberg REST Catalog API within AIStor itself, MinIO makes the catalog a native function of the storage system. Object metadata and table metadata live in the same system, managed by the same process, subject to the same consistency guarantees. This is architecturally elegant — it reduces the number of components an operator must manage and eliminates the network hop between catalog and storage for metadata operations.

The approach has precedent. Snowflake embeds its catalog within the query engine. Databricks Unity Catalog is tightly integrated with the Delta Lake runtime. The trend toward reducing external dependencies in data lakehouse architectures is driven by operational reality: every additional service is another thing that can break, needs upgrading, and requires expertise to operate.

For organizations running MinIO AIStor as their primary object store, having the Iceberg catalog built in rather than bolted on is a genuine operational advantage.

The Ecosystem Problem: V3 Isn’t Ready

Here’s what MinIO’s announcement doesn’t address: most of the query engines that organizations use to access Iceberg tables don’t support V3 yet.

As of February 2026, Spark 4.0 has the most complete V3 support and is the only engine that can reliably run V3 features in production. Flink’s Iceberg integration has caught up on core V3 capabilities. But Trino, Amazon Athena, and Snowflake — three of the most widely deployed engines for querying Iceberg tables — do not support V3 today. Dremio and Starburst (commercial Trino) are in various stages of V3 development.

MinIO’s article lists compatibility with “Spark, Trino, Dremio, Starburst, and Flink.” This is true for V2 compatibility — these engines can read and write V2 Iceberg tables through the REST Catalog API. But “supporting V3” as a catalog and data store doesn’t help if the engines querying those tables can’t use V3 features. Deletion vectors created by one V3-compatible engine can’t be read by a V2-only engine. Variant types stored in V3 format aren’t accessible to engines that don’t implement variant parsing. The V3 features MinIO describes are only useful when both the storage layer and the query engine support them.

This means that for the majority of Iceberg users today — those running Trino, Athena, or Snowflake — AIStor Tables’ V3 support is a forward-looking capability, not an immediately useful one. Being “first to V3” is architecturally commendable, but the practical benefit is limited to the subset of organizations running Spark 4.0 or Flink with V3-compatible configurations.

The “First” Claim: Context Matters

MinIO claims to be “the first data store in the industry to support V3.” This is likely accurate in the narrow sense that no other storage system has implemented the Iceberg V3 REST Catalog specification before AIStor. But the claim requires context.

The Iceberg V3 specification was ratified in mid-2025. Google contributed significantly to the spec development and has been publishing V3 test datasets. Databricks, Snowflake, and other platforms are working on V3 support. Being first to ship a V3 catalog implementation is an engineering achievement, but it doesn’t confer a durable advantage if competitors ship their implementations within months.

The more relevant question is: when the V3 ecosystem matures — when Trino, Athena, and Snowflake support V3 — will AIStor Tables’ implementation be the most complete, the most performant, and the most operationally simple? That’s a question that can only be answered with time and competitive benchmarks, not with first-mover press releases.

Comparisons MinIO Doesn’t Make

The article doesn’t address how AIStor Tables compares to alternative Iceberg catalog implementations on dimensions that matter for production deployments.

How does AIStor Tables’ catalog performance compare to Nessie, Polaris, or AWS Glue for metadata-heavy operations? When a Spark job commits a transaction modifying thousands of files, what’s the catalog’s commit latency compared to alternatives? How does the embedded catalog handle concurrent writers — can multiple Spark clusters and Flink jobs operate against the same tables without contention?

What’s the failure model? If a MinIO node hosting catalog state goes down, how is metadata consistency maintained? Is catalog state replicated across the erasure-coded storage the same way object data is? What’s the RTO for catalog recovery?

How does multi-table ACID transaction performance scale with the number of tables and the frequency of commits? Iceberg’s optimistic concurrency model relies on metadata file updates — at high commit rates, conflict retries can become a bottleneck. How does AIStor’s embedded implementation handle this compared to a dedicated catalog service with optimized conflict resolution?

These are the questions that matter for production lakehouse deployments, and they’re the questions that benchmarks and operational documentation should answer.

The Bottom Line

MinIO’s AIStor Tables represents genuine engineering work. The V3 features are real. The embedded catalog architecture eliminates operational complexity. The decision to implement V3 before competitors demonstrates engineering ambition and execution capability.

But “first to V3” is a market positioning claim, not a product readiness claim. With Trino, Athena, and Snowflake lacking V3 support, the majority of Iceberg users can’t take advantage of V3 features regardless of what their storage layer supports. MinIO is building for a future state of the ecosystem — a reasonable engineering strategy, but one that should be communicated honestly rather than framed as immediately deliverable value.

When the V3 ecosystem matures, AIStor Tables will be well-positioned. Whether it’s the best-positioned depends on performance, reliability, and operational simplicity under production workloads — metrics that require benchmarks, not press releases.

References: