Erasure Code Calculator

Calculate storage efficiency, overhead, and analyze MDS properties and Zigzag feasibility

Data Shards (k)

Parity Shards (m)

Usable Capacity (TB)

Cost per TB ($)

Storage Efficiency

75.00%

Storage Overhead

33.33%

Failure Tolerance

4 drives

Rebuild Reads

75.00%

Raw Capacity Needed
133.33 TB

Total Hardware Cost
$13,333

Configuration: Reed-Solomon 12+4 MDS

Presets:

Use Local Reconstruction Codes (LRC/LDEC)

Enables fast rebuilds but trades MDS property for locality. Not all failure patterns recoverable.

Compare erasure coding schemes, understand MDS properties, and see when advanced codes like Zigzag are feasible. The calculator provides real-time analysis of theoretical limits and practical trade-offs.

Understanding MDS Codes

Maximum Distance Separable (MDS) codes achieve the Singleton bound: with m parity shards, an MDS code can recover from ANY combination of m failures. Reed-Solomon is the classic MDS code used in most storage systems.

The key property: d = n - k + 1 = m + 1 where d is the minimum distance. This means the code has optimal failure tolerance for its overhead.

When MDS matters: If you need guaranteed recovery from any m failures, you need MDS. Locality codes (LRC, LDEC) trade this guarantee for faster single-failure rebuilds.

Zigzag Codes

Zigzag codes (Tamo, Wang, Bruck 2013) are MDS array codes optimized for single-node rebuilding. They achieve rebuild ratio of 1/r for r parity nodes.

Constraints for optimal rebuilding:

Code rate k/n must be < 0.5 (at least as many parity nodes as data nodes)
Subpacketization grows as r^k, limiting practical k values

Example: For 4+4 (k=4, r=4), Zigzag achieves 25% rebuild reads. For 12+4 (k=12, r=4), the rate is 75% > 50%, so optimal rebuilding is not achievable.

For wide stripes like 146+4, Zigzag is mathematically infeasible. The subpacketization would be approximately 4^146 ≈ 10^87, exceeding atoms in the observable universe.

Reed-Solomon Trade-offs

Reed-Solomon is always feasible and provides MDS guarantees, but rebuild reads equal k/(k+m) of the stripe. For 12+4, that’s 75% of surviving data must be read for any reconstruction.

At small scale (k ≤ 16), this is acceptable. At wide stripe scale (k > 100), rebuilds become impractical without locality optimization.

Local Reconstruction Codes

LRC (Azure), LDEC (VAST), and similar codes add local parity groups. This enables ~25% rebuild reads for single failures regardless of stripe width.

The trade-off: LRC codes are NOT MDS. They cannot guarantee recovery from all m-failure patterns. Azure’s LRC 12+2+2 explicitly states it “cannot tolerate arbitrary 4 failures.”

For wide stripes, this is often acceptable: the probability of specific failure patterns is low, and the operational benefit of fast rebuilds outweighs theoretical worst-case tolerance.

Practical Recommendations

For k ≤ 16, m ≤ 4: Standard Reed-Solomon. Simple, proven, full MDS guarantees.

For k > 16, rebuild-sensitive: Consider LRC if single-failure rebuild time is critical.

For k > 100 (wide stripes): Locality is essential. Accept the MDS trade-off or accept very slow rebuilds.

The calculator above shows real-time analysis as you adjust parameters.