Skip to content

Our Methodology

Mithril follows a rigorous, reproducible methodology for analyzing storage vendor claims.

  • Vendor blog posts and press releases
  • Technical whitepapers and documentation
  • Marketing materials and product pages
  • Conference presentations
  • Customer case studies
  • Quantifiable claims: Numbers we can validate
  • Performance metrics: Throughput, latency, IOPS
  • Durability claims: “X nines” of durability/availability
  • Efficiency claims: Overhead, compression, deduplication ratios
  • Cost claims: TCO, price-performance

For schemes like Reed-Solomon, LRC, LEC:

# Calculate actual overhead
overhead = (parity_shards / data_shards) * 100
# Validate failure tolerance
theoretical_tolerance = parity_shards
practical_tolerance = calculate_with_locality(...)
# Rebuild I/O requirements
rebuild_reads = calculate_rebuild_cost(...)

Key Questions:

  • Does the scheme satisfy information-theoretic bounds?
  • What’s the practical vs theoretical failure tolerance?
  • What’s the actual I/O amplification for rebuilds?
  • How does correlated failure affect protection?
# Validate throughput claims
total_bandwidth = throughput_gbps * 8 # Convert to Gbps
nodes_required = total_bandwidth / (network_speed * efficiency)
# Calculate per-node performance
per_node_throughput = total_bandwidth / nodes_required

Key Questions:

  • How many nodes to achieve claimed throughput?
  • What’s the actual per-node performance?
  • Is this peak or sustained?
  • What’s p50, p99, p999 latency?
# Annual Failure Rate (AFR) calculations
afr = 0.01 # 1% typical for HDDs
failures_per_year = num_drives * afr
# Probability of data loss
loss_probability = calculate_exceeding_tolerance(...)
durability_nines = -log10(loss_probability)

Key Questions:

  • Does “X nines” claim match the math?
  • Are assumptions realistic (AFR, MTTR, correlated failures)?
  • What’s the actual risk over 1 year, 5 years, 10 years?

Numbers without context are meaningless. We ask:

  • Cluster size? Number of nodes?
  • Network topology? 10GbE, 25GbE, 100GbE?
  • Drive types? NVMe, SSD, HDD?
  • Cache sizes? RAM, persistent memory?
  • Sequential or random?
  • Read or write heavy?
  • Block size?
  • Concurrency level?
  • Synthetic benchmark or real workload?
  • Fresh system or aged data?
  • Peak burst or sustained?
  • Empty system or 80% full?

We compare claims across vendors:

VendorClaimConfigurationReality
Vendor A360 GB/s29 nodes, 100GbE12.4 GB/s per node
Vendor B200 GB/s10 nodes, 25GbE20 GB/s per node

This reveals:

  • Vendor B has 60% higher per-node efficiency
  • Vendor A’s claim requires 3× more infrastructure

Every design involves trade-offs:

High locality (fast rebuilds):

  • ✅ Low rebuild I/O
  • ❌ Reduced failure tolerance
  • ❌ Complex failure domains

Low locality (standard RS):

  • ✅ Maximum failure tolerance
  • ✅ Simple failure model
  • ❌ High rebuild I/O

We make these trade-offs explicit.

Theory vs Practice:

  • Information-theoretic bounds
  • Mathematical guarantees
  • Best-case scenarios
  • Correlated failures (firmware bugs, bad batches)
  • Human error (misconfiguration)
  • Operational complexity
  • Vendor lock-in risk
  • Long-term viability

We provide guidance based on:

  1. Scale: Small (<100TB), Medium (100TB-10PB), Large (>10PB)
  2. Workload: Object, block, file, database, AI/ML
  3. Priority: Performance, cost, simplicity, features
  4. Risk tolerance: Conservative vs aggressive

Never: “Use Vendor X” Always: “For workload Y at scale Z, consider options A, B, C with trade-offs…“

All analysis is reproducible:

Terminal window
# Clone our tools
git clone https://github.com/yourusername/mithril
# Run the same analysis
python src/mithril/validator/erasure_coding_comprehensive.py
# See the numbers yourself
  • Open source (MIT license)
  • Well-documented
  • Tested
  • Peer-reviewable
  • Raw vendor claims saved
  • Processing steps documented
  • Calculations shown
  • Assumptions stated

We make mistakes. When we do:

  1. Acknowledge: Clearly note corrections
  2. Update: Fix the analysis
  3. Explain: What was wrong and why
  4. Archive: Keep history visible

Submit corrections via:

  • GitHub Pull Requests (with math/evidence)
  • GitHub Issues (for discussion)
  • Email (for private disclosure)
  • ❌ Vendor sponsorships
  • ❌ Paid reviews
  • ❌ Affiliate commissions
  • ❌ Speaking fees from vendors we analyze
  • No vendor relationships
  • No financial incentives
  • Open source tools
  • Community-driven corrections

See our first analysis

Applied Methodology:

  1. ✅ Collected claims from gist document
  2. ✅ Validated 146+4 erasure coding mathematically
  3. ✅ Identified LRC bound violation
  4. ✅ Compared with MinIO AIStor, Ceph, AWS, Azure
  5. ✅ Explained trade-offs (overhead vs protection)
  6. ✅ Assessed real-world risks (correlated failures)
  7. ✅ Provided vendor-neutral recommendations
  8. ✅ Published reproducible code
  9. ✅ Open to corrections

Result: Found practical failure tolerance is 2, not 4 as claimed.


Think our methodology has flaws? Let us know:

We improve by being challenged.


The goal: Make storage vendor analysis as reproducible as scientific research.

That’s how we stay honest.