Erasure Code Calculator
Calculate storage efficiency, overhead, and analyze MDS properties and Zigzag feasibility
Compare erasure coding schemes, understand MDS properties, and see when advanced codes like Zigzag are feasible. The calculator provides real-time analysis of theoretical limits and practical trade-offs.
Understanding MDS Codes
Maximum Distance Separable (MDS) codes achieve the Singleton bound: with m parity shards, an MDS code can recover from ANY combination of m failures. Reed-Solomon is the classic MDS code used in most storage systems.
The key property: d = n - k + 1 = m + 1 where d is the minimum distance. This means the code has optimal failure tolerance for its overhead.
When MDS matters: If you need guaranteed recovery from any m failures, you need MDS. Locality codes (LRC, LDEC) trade this guarantee for faster single-failure rebuilds.
Zigzag Codes
Zigzag codes (Tamo, Wang, Bruck 2013) are MDS array codes optimized for single-node rebuilding. They achieve rebuild ratio of 1/r for r parity nodes.
Constraints for optimal rebuilding:
- Code rate k/n must be < 0.5 (at least as many parity nodes as data nodes)
- Subpacketization grows as r^k, limiting practical k values
Example: For 4+4 (k=4, r=4), Zigzag achieves 25% rebuild reads. For 12+4 (k=12, r=4), the rate is 75% > 50%, so optimal rebuilding is not achievable.
For wide stripes like 146+4, Zigzag is mathematically infeasible. The subpacketization would be approximately 4^146 ≈ 10^87, exceeding atoms in the observable universe.
Reed-Solomon Trade-offs
Reed-Solomon is always feasible and provides MDS guarantees, but rebuild reads equal k/(k+m) of the stripe. For 12+4, that’s 75% of surviving data must be read for any reconstruction.
At small scale (k ≤ 16), this is acceptable. At wide stripe scale (k > 100), rebuilds become impractical without locality optimization.
Local Reconstruction Codes
LRC (Azure), LDEC (VAST), and similar codes add local parity groups. This enables ~25% rebuild reads for single failures regardless of stripe width.
The trade-off: LRC codes are NOT MDS. They cannot guarantee recovery from all m-failure patterns. Azure’s LRC 12+2+2 explicitly states it “cannot tolerate arbitrary 4 failures.”
For wide stripes, this is often acceptable: the probability of specific failure patterns is low, and the operational benefit of fast rebuilds outweighs theoretical worst-case tolerance.
Practical Recommendations
For k ≤ 16, m ≤ 4: Standard Reed-Solomon. Simple, proven, full MDS guarantees.
For k > 16, rebuild-sensitive: Consider LRC if single-failure rebuild time is critical.
For k > 100 (wide stripes): Locality is essential. Accept the MDS trade-off or accept very slow rebuilds.
The calculator above shows real-time analysis as you adjust parameters.