Our Methodology
How We Analyze Storage Claims
Section titled “How We Analyze Storage Claims”Mithril follows a rigorous, reproducible methodology for analyzing storage vendor claims.
1. Claim Collection
Section titled “1. Claim Collection”Sources
Section titled “Sources”- Vendor blog posts and press releases
- Technical whitepapers and documentation
- Marketing materials and product pages
- Conference presentations
- Customer case studies
What We Look For
Section titled “What We Look For”- Quantifiable claims: Numbers we can validate
- Performance metrics: Throughput, latency, IOPS
- Durability claims: “X nines” of durability/availability
- Efficiency claims: Overhead, compression, deduplication ratios
- Cost claims: TCO, price-performance
2. Mathematical Validation
Section titled “2. Mathematical Validation”Erasure Coding Analysis
Section titled “Erasure Coding Analysis”For schemes like Reed-Solomon, LRC, LEC:
# Calculate actual overheadoverhead = (parity_shards / data_shards) * 100
# Validate failure tolerancetheoretical_tolerance = parity_shardspractical_tolerance = calculate_with_locality(...)
# Rebuild I/O requirementsrebuild_reads = calculate_rebuild_cost(...)Key Questions:
- Does the scheme satisfy information-theoretic bounds?
- What’s the practical vs theoretical failure tolerance?
- What’s the actual I/O amplification for rebuilds?
- How does correlated failure affect protection?
Performance Claims
Section titled “Performance Claims”# Validate throughput claimstotal_bandwidth = throughput_gbps * 8 # Convert to Gbpsnodes_required = total_bandwidth / (network_speed * efficiency)
# Calculate per-node performanceper_node_throughput = total_bandwidth / nodes_requiredKey Questions:
- How many nodes to achieve claimed throughput?
- What’s the actual per-node performance?
- Is this peak or sustained?
- What’s p50, p99, p999 latency?
Durability Mathematics
Section titled “Durability Mathematics”# Annual Failure Rate (AFR) calculationsafr = 0.01 # 1% typical for HDDsfailures_per_year = num_drives * afr
# Probability of data lossloss_probability = calculate_exceeding_tolerance(...)durability_nines = -log10(loss_probability)Key Questions:
- Does “X nines” claim match the math?
- Are assumptions realistic (AFR, MTTR, correlated failures)?
- What’s the actual risk over 1 year, 5 years, 10 years?
3. Context Analysis
Section titled “3. Context Analysis”Numbers without context are meaningless. We ask:
Configuration Details
Section titled “Configuration Details”- Cluster size? Number of nodes?
- Network topology? 10GbE, 25GbE, 100GbE?
- Drive types? NVMe, SSD, HDD?
- Cache sizes? RAM, persistent memory?
Workload Characteristics
Section titled “Workload Characteristics”- Sequential or random?
- Read or write heavy?
- Block size?
- Concurrency level?
Test Conditions
Section titled “Test Conditions”- Synthetic benchmark or real workload?
- Fresh system or aged data?
- Peak burst or sustained?
- Empty system or 80% full?
4. Comparative Analysis
Section titled “4. Comparative Analysis”We compare claims across vendors:
| Vendor | Claim | Configuration | Reality |
|---|---|---|---|
| Vendor A | 360 GB/s | 29 nodes, 100GbE | 12.4 GB/s per node |
| Vendor B | 200 GB/s | 10 nodes, 25GbE | 20 GB/s per node |
This reveals:
- Vendor B has 60% higher per-node efficiency
- Vendor A’s claim requires 3× more infrastructure
5. Trade-off Identification
Section titled “5. Trade-off Identification”Every design involves trade-offs:
Example: Locality vs Protection
Section titled “Example: Locality vs Protection”High locality (fast rebuilds):
- ✅ Low rebuild I/O
- ❌ Reduced failure tolerance
- ❌ Complex failure domains
Low locality (standard RS):
- ✅ Maximum failure tolerance
- ✅ Simple failure model
- ❌ High rebuild I/O
We make these trade-offs explicit.
6. Real-World Risk Assessment
Section titled “6. Real-World Risk Assessment”Theory vs Practice:
Theoretical Analysis
Section titled “Theoretical Analysis”- Information-theoretic bounds
- Mathematical guarantees
- Best-case scenarios
Practical Analysis
Section titled “Practical Analysis”- Correlated failures (firmware bugs, bad batches)
- Human error (misconfiguration)
- Operational complexity
- Vendor lock-in risk
- Long-term viability
7. Vendor-Neutral Recommendations
Section titled “7. Vendor-Neutral Recommendations”We provide guidance based on:
- Scale: Small (<100TB), Medium (100TB-10PB), Large (>10PB)
- Workload: Object, block, file, database, AI/ML
- Priority: Performance, cost, simplicity, features
- Risk tolerance: Conservative vs aggressive
Never: “Use Vendor X” Always: “For workload Y at scale Z, consider options A, B, C with trade-offs…“
8. Reproducibility
Section titled “8. Reproducibility”All analysis is reproducible:
# Clone our toolsgit clone https://github.com/yourusername/mithril
# Run the same analysispython src/mithril/validator/erasure_coding_comprehensive.py
# See the numbers yourselfOur Code
Section titled “Our Code”- Open source (MIT license)
- Well-documented
- Tested
- Peer-reviewable
Our Data
Section titled “Our Data”- Raw vendor claims saved
- Processing steps documented
- Calculations shown
- Assumptions stated
9. Correction Process
Section titled “9. Correction Process”We make mistakes. When we do:
- Acknowledge: Clearly note corrections
- Update: Fix the analysis
- Explain: What was wrong and why
- Archive: Keep history visible
Submit corrections via:
- GitHub Pull Requests (with math/evidence)
- GitHub Issues (for discussion)
- Email (for private disclosure)
10. Disclosure
Section titled “10. Disclosure”What We Don’t Accept
Section titled “What We Don’t Accept”- ❌ Vendor sponsorships
- ❌ Paid reviews
- ❌ Affiliate commissions
- ❌ Speaking fees from vendors we analyze
How We Stay Independent
Section titled “How We Stay Independent”- No vendor relationships
- No financial incentives
- Open source tools
- Community-driven corrections
Example: VAST Data Analysis
Section titled “Example: VAST Data Analysis”Applied Methodology:
- ✅ Collected claims from gist document
- ✅ Validated 146+4 erasure coding mathematically
- ✅ Identified LRC bound violation
- ✅ Compared with MinIO AIStor, Ceph, AWS, Azure
- ✅ Explained trade-offs (overhead vs protection)
- ✅ Assessed real-world risks (correlated failures)
- ✅ Provided vendor-neutral recommendations
- ✅ Published reproducible code
- ✅ Open to corrections
Result: Found practical failure tolerance is 2, not 4 as claimed.
Questions?
Section titled “Questions?”Think our methodology has flaws? Let us know:
- GitHub: Open an issue
- Email: methodology@mithril.blog
We improve by being challenged.
The goal: Make storage vendor analysis as reproducible as scientific research.
That’s how we stay honest.