Erasure Coding

A data protection method that splits, encodes, and stores files across many nodes with redundancy so the original can be reconstructed.

Erasure coding is an error-correction and data-protection technique used in distributed systems to keep data available even when some pieces are lost or nodes go offline. Instead of storing one full copy of a file in a single place, erasure coding breaks the data into fragments, expands it with mathematically derived redundant fragments (often called parity), and distributes those pieces across multiple locations.

How erasure coding works

At a high level, a dataset is split into “data shards,” then an algorithm computes additional “parity shards.” A common way to describe this is an (k, n) scheme, where k shards hold original data and n−k shards hold parity. The system can reconstruct the original dataset from any k shards out of n, meaning it can tolerate the loss of up to n−k shards without losing the file. Because each individual shard is only a piece of the whole, a single shard is typically not useful on its own, while the full data remains recoverable when enough shards are available.

Why blockchains and decentralized storage use it

In crypto and Web3, erasure coding helps distributed storage networks and data-availability layers balance resilience with efficiency. Replicating full copies across many nodes is simple but expensive. Erasure coding reduces storage overhead while still providing strong guarantees that data remains retrievable.

For example, a decentralized storage protocol might distribute coded shards of a user’s file across many independent operators. Even if several operators go offline or delete their shards, the network can still recover the file from the remaining shards. Similarly, some blockchain architectures use related techniques to improve data availability, ensuring participants can obtain the data needed to verify blocks.

Erasure coding matters because it strengthens integrity and availability in decentralized systems, helping networks stay reliable without requiring excessive duplication of data.