VAST Data's 99.9991% Uptime and 10x Kafka Claims: The New Standard for Unverifiable Marketing

VAST Data claims 99.9991% measured uptime (4.7 minutes downtime per year) and 10x performance advantage over Kafka. We analyze what these numbers actually mean and why they're nearly impossible to verify.

VAST Data has made two extraordinary claims that deserve mathematical scrutiny: 99.9991% measured uptime and a 10x performance advantage over Apache Kafka. Let’s examine what these numbers actually mean and whether they can be verified.

If these claims sound familiar, you may have read our analysis of Cloudian’s absurd “26 nines” durability claim. VAST’s uptime assertion follows the same playbook: pick an impressive-sounding number, provide no methodology, and wait for uncritical tech media coverage to amplify it.

The 99.9991% Uptime Claim

VAST Data claims 99.9991% measured uptime across their customer base. This translates to approximately 4.7 minutes of downtime per year. For context:

Annual downtime by availability percentage:

VAST was founded in 2016. For a company that’s existed for roughly 9 years to claim this level of uptime requires answering several questions that Blocks and Files—the primary source amplifying this claim—apparently didn’t think to ask.

What Gets Measured?

“Measured uptime” is ambiguous. Does this include:

System-level failures? Does a single node failure count if the cluster remains available? Most distributed systems remain operational during individual component failures.

Customer-reported incidents? Are planned maintenance windows excluded? Are partial degradations counted? Is a 50% performance drop considered “up” or “down”?

Which customers? Is this averaged across all customers, or does it represent the median? A few customers with perfect uptime can mathematically offset many with worse uptime in an average.

What time period? Is this lifetime uptime since 2016, or only recent deployments? Newer systems naturally have better uptime records simply because they’ve had less time to fail.

The Statistical Problem

For VAST to claim 99.9991% uptime with statistical confidence, they need either a massive sample size or a long observation period. Let’s calculate what’s required.

If we assume VAST has 100 production customers with an average deployment age of 3 years:

This means VAST can only afford an average of 14.2 minutes of downtime per customer over 3 years. A single catastrophic failure at one customer site lasting 4 hours would consume 16.9% of the entire downtime budget for all customers.

Comparison to Cloud Providers

Major cloud providers with far larger scale and longer operational history publish lower uptime numbers:

These companies operate at exascale with thousands of engineers and decades of operational experience. VAST claiming better uptime than these hyperscalers with a fraction of the scale and operational maturity is extraordinary.

Why This Matters

Uptime claims without methodology are marketing, not engineering. To make this claim credible, VAST needs to publish:

  1. Definition of downtime: What constitutes “down” vs. degraded performance?
  2. Measurement methodology: How is uptime calculated? Customer-reported? Automated monitoring?
  3. Sample size: How many customers? How many system-years of data?
  4. Exclusions: Are planned maintenance, customer-caused issues, or network problems excluded?
  5. Statistical confidence: What’s the confidence interval around 99.9991%?

Without this information, 99.9991% is just a number designed to sound impressive. It’s the new “26 nines” durability claim—mathematically implausible and practically unverifiable.

For readers unfamiliar with the “26 nines” reference: Cloudian claimed 99.99999999999999999999999999% durability (26 consecutive nines), which would require their drives to last longer than the age of the universe. We analyzed why this claim is mathematical nonsense. VAST’s uptime claim isn’t quite as absurd, but it follows the same pattern of unverifiable big numbers.

The Event Broker: 10x Faster Than Kafka

In February 2025, VAST claimed their Event Broker delivers “a 10x+ performance advantage over Kafka on like-for-like hardware” with capability to process “over 500 million messages per second across VAST’s largest cluster deployments.”

Later in November 2025, they refined this to “604% more throughput than Apache Kafka” (which is 7.04x, not 10x) and “one Event Broker node is 156% faster than Redpanda” (2.56x).

The “Like-for-Like Hardware” Question

“Like-for-like hardware” is the key weasel phrase here. Both Kafka and VAST would typically run on NVMe SSDs in modern deployments, so disk type isn’t the differentiator. The real questions are:

Replication factor: Kafka typically runs with replication factor 3 for durability. What replication factor is VAST using in this comparison? Replication factor 1 is faster than replication factor 3 by design—but it’s not “like-for-like” in terms of durability guarantees.

Acknowledgment semantics: Kafka offers different acknowledgment levels:

Which setting is being compared? If VAST uses acks=1 and Kafka uses acks=all, the comparison isn’t measuring equivalent durability guarantees.

Network topology: VAST’s architecture may assume data center-class networking (RDMA, low-latency fabrics) while Kafka is often deployed in cloud environments with higher network latency. Are both systems tested on identical network infrastructure?

Cluster configuration: How many nodes, partitions, brokers? These parameters dramatically affect throughput and latency characteristics.

Burst vs. Sustained Performance

“500 million messages per second” across VAST’s largest clusters is impressive if sustained, but meaningless if it’s burst performance measured over milliseconds.

Message brokers typically quote two metrics:

Marketing materials often conflate these. A system might achieve 500M msg/sec for 100 milliseconds but sustain only 50M msg/sec over an hour. Which number is VAST reporting?

Missing Context

To evaluate VAST’s Event Broker claims, we need:

Message size: 500M messages/second of 100-byte messages is 50 GB/sec. 500M messages/second of 10KB messages is 5 TB/sec. Message size radically changes the performance profile.

Cluster size: How many nodes deliver 500M msg/sec? 10 nodes? 100 nodes? 1,000 nodes? Per-node throughput matters more than total cluster throughput.

Durability guarantees: Is this with synchronous replication? Asynchronous? No replication? Durability trades off directly with throughput.

Latency distribution: What’s the p50, p99, p999 latency? High throughput with high tail latency is useless for many real-time applications.

Workload characteristics: Is this a single producer, single consumer? Multiple producers? What’s the partition count? Consumer group count?

The Benchmark Gap

VAST hasn’t published:

Without this, “10x faster than Kafka” is an assertion, not a verified claim. It’s entirely possible VAST’s Event Broker is genuinely faster—but we can’t verify it without the methodology.

The most likely explanation: VAST is comparing their system with strong durability assumptions against Kafka configured for maximum performance with weaker durability guarantees, or vice versa. This is the classic benchmark trick: measure what makes you look good, omit the configuration details that would reveal the unfair comparison.

The Tech Journalism Problem

Chris Mellor at Blocks and Files has enthusiastically covered both of these VAST claims without apparent due diligence. In his December 2025 article “The nature of the VAST Data beast,” he uncritically reports the 99.9991% uptime figure and the Event Broker performance claims as facts rather than unverified vendor assertions.

Basic journalistic questions that should have been asked:

Instead, the article reads like a press release with added prose. This isn’t journalism—it’s vendor amplification. When tech media publishes unverified performance claims without skepticism, they become part of the marketing apparatus rather than a check on vendor hyperbole.

This isn’t unique to VAST coverage. We’ve seen the same pattern with Cloudian’s absurd durability claims, Dell ECS’s “eleven nines” transparency theater, and countless unverifiable benchmark claims across the storage industry. Tech journalism that fails to ask basic mathematical questions about vendor claims isn’t serving its readers—it’s serving vendor marketing departments.

The Pattern

Both claims—99.9991% uptime and 10x Kafka performance—follow the same pattern:

  1. Big numbers: Pick metrics that sound impressive
  2. Vague methodology: Don’t explain how you measured it
  3. Missing context: Omit crucial details about configuration, workload, or measurement period
  4. Unverifiable: Don’t publish enough information for independent verification
  5. Uncritical amplification: Rely on tech media to report numbers as facts without verification

This is marketing, not engineering. VAST may genuinely have excellent uptime and a fast event broker. But without transparent methodology and reproducible benchmarks, these claims remain unverifiable marketing assertions designed to generate press coverage.

What Would Credible Claims Look Like?

For uptime:

For Event Broker performance:

Until VAST publishes this information, treat their uptime and performance claims as aspirational marketing rather than verified fact. And if tech journalists won’t ask these basic questions, engineers evaluating storage systems must do the due diligence themselves.

Conclusion

VAST Data’s 99.9991% uptime claim and 10x Kafka performance advantage are extraordinary assertions that require extraordinary evidence. Without transparent methodology, these numbers serve marketing purposes rather than providing meaningful information for technical decision-making.

Storage and infrastructure decisions involve millions of dollars and years of operational commitment. Vendors claiming industry-leading metrics have a responsibility to show their work. Tech journalists covering these claims have a responsibility to ask basic verification questions rather than amplifying vendor press releases.

Until both happen, skepticism is the appropriate response.

For more analysis of unverifiable vendor claims, see our coverage of Cloudian’s 26 nines durability claim, Dell ECS’s transparency theater, and the broader problem of unverifiable benchmarks in storage marketing.

References: