MUSE: Multi-Use Error Correcting Codes

Evgeny Manzhosov, Adam Hastings, Meghna Pancholi, Ryan Piersma, Mohamed Tarek Ibn Ziad, Simha Sethumadhavan

In this work we present a new set of error correcting codes -- Multi-Use Error Correcting Codes (MUSE ECC) -- that have the ability to match reliability guarantees of all commodity, conventional state-of-the-art ECC with fewer bits of storage. MUSE ECC derives its power by building on arithmetic coding methods (first used in an experimental system in 1960s). We show that our MUSE construction can be used as a "drop in" replacement within error correction frameworks used widely today. Further, we show how MUSE is a promising fit for emerging technologies such as a DDR5 memories. Concretely, all instantiations of MUSE we show in this paper offer 100% Single Error Correction, and multi-bit error detection between 70% and 95% while using fewer check bits. MUSE ECC corrects failure of a single chip on a DIMM with check bit space savings of 12.5% compared to conventional techniques. The performance overheads, if any, are negligible. Our results open the possibility of reusing ECC storage for things beyond reliability without compromising reliability, thus solving a 40-year-old puzzle.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment