Mitigating the Effects of Large Multiple Cell Upsets (MCUs) in Memories
- Authors
- Antonio Maestro, Juan; Reviriego, Pedro; Baeg, Sanghyeon; Wen, Shijie; Wong, Richard
- Issue Date
- Oct-2011
- Publisher
- ASSOC COMPUTING MACHINERY
- Keywords
- Fault-tolerant memory; Error-correcting codes; high-level protection technique; protection against radiation
- Citation
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, v.16, no.4, pp.1 - 10
- Indexed
- SCIE
SCOPUS
- Journal Title
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
- Volume
- 16
- Number
- 4
- Start Page
- 1
- End Page
- 10
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/39224
- DOI
- 10.1145/2003695.2003705
- ISSN
- 1084-4309
- Abstract
- Reliability is a critical issue for memories. Radiation particles that hit the device can cause errors in some cells, which can lead to data corruption. To avoid this problem, memories are protected with per-word error correction codes (ECCs). Typically, single-error correction and double-error detection (SEC-DED) codes are used. As technology scales, errors caused by radiation particles on memories tend to affect more than one cell-what is known as a multiple cell upset (MCU). To ensure that only a single cell is affected in each word, interleaving is used. With interleaving, cells that belong to the same word are placed at a sufficient distance such that an MCU will only affect a single cell on each word. The use of interleaving significantly increases the cost of the device. Also, determining the interleaving distance (ID) required to avoid MCUs causing double errors is not trivial. Typically, accelerated radiation experiments with a limited number of particle hits are used. They provide a lower bound on the required ID, but larger MCUs may occur with a low probability. But even if the percentage of such large MCUs is very low, the impact on reliability can be significant. This article presents a technique to mitigate the effects of large MCUs that is, those that exceed the ID, on memory reliability. The proposed approach is able to correct most double errors caused by large MCUs by exploiting the locality of the errors within an MCU.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/39224)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.