Our Methodology

How We Analyze Storage Claims

Mithril follows a rigorous, reproducible methodology for analyzing storage vendor claims.

1. Claim Collection

Sources

Vendor blog posts and press releases
Technical whitepapers and documentation
Marketing materials and product pages
Conference presentations
Customer case studies

What We Look For

Quantifiable claims: Numbers we can validate
Performance metrics: Throughput, latency, IOPS
Durability claims: “X nines” of durability/availability
Efficiency claims: Overhead, compression, deduplication ratios
Cost claims: TCO, price-performance

2. Mathematical Validation

Erasure Coding Analysis

For schemes like Reed-Solomon, LRC, LEC:

# Calculate actual overhead
overhead = (parity_shards / data_shards) * 100

# Validate failure tolerance
theoretical_tolerance = parity_shards
practical_tolerance = calculate_with_locality(...)

# Rebuild I/O requirements
rebuild_reads = calculate_rebuild_cost(...)

Key Questions:

Does the scheme satisfy information-theoretic bounds?
What’s the practical vs theoretical failure tolerance?
What’s the actual I/O amplification for rebuilds?
How does correlated failure affect protection?

Performance Claims

# Validate throughput claims
total_bandwidth = throughput_gbps * 8  # Convert to Gbps
nodes_required = total_bandwidth / (network_speed * efficiency)

# Calculate per-node performance
per_node_throughput = total_bandwidth / nodes_required

Key Questions:

How many nodes to achieve claimed throughput?
What’s the actual per-node performance?
Is this peak or sustained?
What’s p50, p99, p999 latency?

Durability Mathematics

# Annual Failure Rate (AFR) calculations
afr = 0.01  # 1% typical for HDDs
failures_per_year = num_drives * afr

# Probability of data loss
loss_probability = calculate_exceeding_tolerance(...)
durability_nines = -log10(loss_probability)

Key Questions:

Does “X nines” claim match the math?
Are assumptions realistic (AFR, MTTR, correlated failures)?
What’s the actual risk over 1 year, 5 years, 10 years?

3. Context Analysis

Numbers without context are meaningless. We ask:

Configuration Details

Cluster size? Number of nodes?
Network topology? 10GbE, 25GbE, 100GbE?
Drive types? NVMe, SSD, HDD?
Cache sizes? RAM, persistent memory?

Workload Characteristics

Sequential or random?
Read or write heavy?
Block size?
Concurrency level?

Test Conditions

Synthetic benchmark or real workload?
Fresh system or aged data?
Peak burst or sustained?
Empty system or 80% full?

4. Comparative Analysis

We compare claims across vendors:

Vendor	Claim	Configuration	Reality
Vendor A	360 GB/s	29 nodes, 100GbE	12.4 GB/s per node
Vendor B	200 GB/s	10 nodes, 25GbE	20 GB/s per node

This reveals:

Vendor B has 60% higher per-node efficiency
Vendor A’s claim requires 3× more infrastructure

5. Trade-off Identification

Every design involves trade-offs:

Example: Locality vs Protection

High locality (fast rebuilds):

✅ Low rebuild I/O
❌ Reduced failure tolerance
❌ Complex failure domains

Low locality (standard RS):

✅ Maximum failure tolerance
✅ Simple failure model
❌ High rebuild I/O

We make these trade-offs explicit.

6. Real-World Risk Assessment

Theory vs Practice:

Theoretical Analysis

Information-theoretic bounds
Mathematical guarantees
Best-case scenarios

Practical Analysis

Correlated failures (firmware bugs, bad batches)
Human error (misconfiguration)
Operational complexity
Vendor lock-in risk
Long-term viability

7. Vendor-Neutral Recommendations

We provide guidance based on:

Scale: Small (<100TB), Medium (100TB-10PB), Large (>10PB)
Workload: Object, block, file, database, AI/ML
Priority: Performance, cost, simplicity, features
Risk tolerance: Conservative vs aggressive

Never: “Use Vendor X” Always: “For workload Y at scale Z, consider options A, B, C with trade-offs…“

8. Reproducibility

All analysis is reproducible:

# Clone our tools
git clone https://github.com/yourusername/mithril

# Run the same analysis
python src/mithril/validator/erasure_coding_comprehensive.py

# See the numbers yourself

Our Code

Open source (MIT license)
Well-documented
Tested
Peer-reviewable

Our Data

Raw vendor claims saved
Processing steps documented
Calculations shown
Assumptions stated

9. Correction Process

We make mistakes. When we do:

Acknowledge: Clearly note corrections
Update: Fix the analysis
Explain: What was wrong and why
Archive: Keep history visible

Submit corrections via:

GitHub Pull Requests (with math/evidence)
GitHub Issues (for discussion)
Email (for private disclosure)

10. Disclosure

What We Don’t Accept

❌ Vendor sponsorships
❌ Paid reviews
❌ Affiliate commissions
❌ Speaking fees from vendors we analyze

How We Stay Independent

No vendor relationships
No financial incentives
Open source tools
Community-driven corrections

Example: VAST Data Analysis

See our first analysis

Applied Methodology:

✅ Collected claims from gist document
✅ Validated 146+4 erasure coding mathematically
✅ Identified LRC bound violation
✅ Compared with MinIO AIStor, Ceph, AWS, Azure
✅ Explained trade-offs (overhead vs protection)
✅ Assessed real-world risks (correlated failures)
✅ Provided vendor-neutral recommendations
✅ Published reproducible code
✅ Open to corrections

Result: Found practical failure tolerance is 2, not 4 as claimed.

Questions?

Think our methodology has flaws? Let us know:

GitHub: Open an issue
Email: methodology@mithril.blog

We improve by being challenged.

The goal: Make storage vendor analysis as reproducible as scientific research.

That’s how we stay honest.