Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract— Currently, faults suffered by SRAM memory Traditionally, error correction codes (ECCs) have been used
systems have increased due to the aggressive CMOS integra- to protect memory systems. Common ECCs employed to
tion density. Thus, the probability of occurrence of single-cell protect standard memories are single-error correction (SEC)
upsets (SCUs) or multiple-cell upsets (MCUs) augments. One
of the main causes of MCUs in space applications is cosmic or single-error-correction–double-error-detection (SEC–DED)
radiation. A common solution is the use of error correction codes [11]–[13]. SEC codes are able to correct an error in one
codes (ECCs). Nevertheless, when using ECCs in space applica- single memory cell. SEC–DED codes can correct an error in
tions, they must achieve a good balance between error coverage one single memory cell, as well as they can detect two errors
and redundancy, and their encoding/decoding circuits must be in two independent cells.
efficient in terms of area, power, and delay. Different codes have
been proposed to tolerate MCUs. For instance, Matrix codes use In critical applications, such as space applications, more
Hamming codes and parity checks in a bi-dimensional layout to complex and sophisticated codes are used [14]–[17], [19]–[21].
correct and detect some patterns of MCUs. Recently presented, For instance, Matrix code [17] is a well-known code that
column–line–code (CLC) has been designed to tolerate MCUs combines Hamming codes with parity check in a matrix,
in space applications. CLC is a modified Matrix code, based allowing the correction of two bits in error. Recently presented,
on extended Hamming codes and parity checks. Nevertheless,
a common property of these codes is the high redundancy column–line–code (CLC) [19] follows a similar approach, that
introduced. In this paper, we present a series of new low- is, it uses extended Hamming codes and parity bits to correct
redundant ECCs able to correct MCUs with reduced area, power, up to two adjacent bits in error.
and delay overheads. Also, these new codes maintain, or even The main problem when memory systems employ an ECC
improve, memory error coverage with respect to Matrix and is the redundancy required. The extra bits added are used to
CLC codes.
detect and/or correct the possible errors occurred. Also, redun-
Index Terms— Error correction codes (ECCs), fault tolerance, dant bits must be added for each data word stored in memory.
multiple-cell upsets (MCUs), reliability. In this way, the amount of storage occupied for redundant bit
scales with the memory capacity. For example, if an ECC with
I. I NTRODUCTION
100% of redundancy is employed in a 2-GB memory, only 1
TABLE II
S YNDROME B ITS FOR THE H AMMING (7, 4) C ODE
Fig. 2. Layout of a 16-bit data word for the (32, 16) Matrix code [18].
Fig. 3. Layout of a 16-bit data word for the (40, 16) CLC [19].
H · eiT = H · eTj ; ∀ei ∈ E , e j ∈ E + . (5) code implemented in this paper presents better correction and
However, several detectable errors may have the same detection performance than an extended Hamming code, as it
syndrome. is able to correct all single errors and to correct or to detect
To find the matrix, a recursive backtracking algorithm is all 2-bit burst errors. Nevertheless, this Matrix code presents
used. It checks partial matrices and adds a new column only a higher redundancy than an extended Hamming code. In this
if the previous matrix satisfies the requirements. In this way, way, memory required for code bits is increased, and also,
the algorithm starts with an empty partial_H matrix. New area, energy, and delay overheads.
columns, with n–k rows, are added, and the new partial Recently presented, CLC code [19], [30] is another matrix
matrices are checked recursively. The added columns must be code proposed to be used in space applications. The layout we
nonzero, so there are 2n−k –1 combinations for each column. have employed for the implementation of this code is shown
The complete execution of the algorithm is commonly in Fig. 3 (extracted from [19]), where X i is the primary data
unfeasible. Nevertheless, the first solutions are usually found input, divided into groups of 4 bits. Each group is codified by
quickly, if the code exists. Once selected the H matrix, it is an SEC–DED (8, 4) extended Hamming code (Ci and Pai ).
easy to determine the logic equations to calculate each parity Finally, a set of vertical parity bits (Pi ) form the matrix.
and syndrome bit, as well as the syndrome lookup table. They As just commented, and unlike the Matrix code, CLC uses
are required for the encoder and decoder implementation. an Extended Hamming code, allowing the correction of all
In addition, we can apply two different optimization criteria. single and 2-bit burst errors. Nevertheless, CLC introduces a
If we want to decrease the delay of the encoders and decoders, higher number of redundant bits, provoking a greater area,
we have to reduce the number of 1’s in those rows with the power, and delay overheads, and reducing the available mem-
highest number of 1’s of the parity-check matrix. In the case ory for the payload.
of area reduction, the total number of 1’s in the parity-check On the other hand, Saiz-Adalid et al. [27] introduce an
matrix must be reduced. SEC–double-adjacent error detection (SEC–DAED) code with
A detailed explanation of this algorithm, as well as a code a very low redundancy. This code is able to correct all single
design example, can be found in [31]. errors or to detect all double adjacent errors in a 16-bit data
word with only five redundant bits. Fig. 4 shows the data word
III. E RROR C ORRECTION C ODES D ESCRIPTION layout of this code, where Ci are the code bits, and X i are the
A. Previous Proposals data bits.
As commented previously, Matrix and CLC codes have been On the contrary, Sanchez-Macian et al. [28] present a
designed to tolerate MCUs [17], [19], a critical concern in different approach to generate Hamming-based ECCs. In this
space applications. case, we have selected an SEC–DAED code with the same
Combining Hamming codes and parity checks [17], [18], coverage characteristics, as well as the same redundancy of
Matrix codes form a 2-D scheme for correcting and detecting the SEC–DAED code from [27]. The main difference is the
some patterns of MCUs. For instance, in this paper, we have code layout, as shown in Fig. 5.
used the bit layout shown in Fig. 2 (extracted from [18]), To obtain the value of the different Ci bits, the parity-check
where X i are the data bits, Ci are the horizontal check bits matrix of these two SEC–DAED codes can be seen in [27]
(calculated as a Hamming code), and Pi are the column parity and [28], respectively.
bits (even parity).
The basic behavior of this Matrix code is as follows. The B. Our Approach
primary data input (X i ) is divided into groups of several bits. By using the FUEC methodology [31], we have been able
In this paper, this division is in groups of 4 bits. Each group to design several codes that improve the coverage and/or
is codified by a (7, 4) Hamming code (Ci ). Last, a set of the redundancy of the different codes presented previously
vertical parity bits (Pi ) completes the matrix. The Matrix (Matrix, CLC, and both SEC–DAED codes). The layout of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE IV
N UMBER OF C ODE B ITS FOR A 16-bit D ATA W ORD
Fig. 11. Correction coverage for burst errors. Fig. 12. Detection coverage for burst errors.
TABLE V
A REA O CCUPIED BY D IFFERENT ECC S ( IN nm2 )
Fig. 11 shows the results obtained for the correction cover-
age of each code. This coverage has been calculated as
Errors_Corrected
Ccorrec = × 100 (9)
Errors_Injected
where Errors_Corrected are the number of errors corrected by
the ECC, and Errors_Injected are the total amount of errors
injected for a given burst length.
As expected, FUEC–TAEC and FUEC–QUAEC codes
present better correction coverage than Matrix, CLC, and
both SEC–DAED codes, as they are able to correct up to
In conclusion, our codes are very efficient to tolerate burst
3- and 4-bit burst errors respectively. On the other hand, our
errors of from 2- to 4-bit length. Beyond 4-bit burst errors,
FUEC–DAEC code can correct up to 2-bit burst errors, such
the performance of our codes decreases notably due to their
as CLC.
low redundancy. If these errors are expected to occur, more
Nevertheless, for longer burst errors (5 bits or more), the
powerful codes (with a higher redundancy) must be employed.
correction capabilities of our codes degrade more deeply
Nevertheless, the probability of occurrence of burst errors
than Matrix and CLC ones. This is provoked by the lower
decreases significantly when increasing its length [6], [7].
redundancy of our codes, which causes a lower number of
available syndromes to be used to correct longer burst errors.
In other words, our codes employ the available syndromes B. Synthesis Results
in a more efficient way inside the expected error rank, but We have shown in Section III that our three codes,
degrading quickly outside this rank. This effect can be seen FUEC–DAEC, FUEC–TAEC, and FUEC–QUAEC, present a
also in both SEC–DAED codes. In fact, these codes present lower redundancy with respect to Matrix and CLC codes. This
the greatest degradation from multiple-bit errors (2-bit burst lower redundancy provokes a lower storage requirement for
errors or longer). As it happens with our codes, the very low code bits.
redundancy of both SEC–DAED codes provokes this result. Nevertheless, the question that arises now is that this lower
Finally, Fig. 12 shows the detection coverage for all codes. redundancy would translate into an improvement on the area,
This detection coverage is calculated as power, and delay overheads in the encoder and decoder circuits
with respect to matrix, CLC, and both SEC–DAED codes.
Errors_Corrected + Errors_Detected
Cdetec = × 100 (10) Although the area overhead of the encoders and decoders may
Errors_Injected be negligible in comparison with memory overhead, power and
where Errors_Detected corresponds to the number of errors delay overheads can be important, especially in deep space
not corrected but detected by the ECCs. systems.
As can be seen from Fig. 12, all our codes present a 100% To solve this question, we have synthesized the encoder and
detection of up to 4-bit burst errors, improving the behavior decoder circuits for all ECCs. To do this, we have implemented
of Matrix, CLC, and both SEC–DAED codes. them in VHDL, and using CADENCE software [35], we have
As in the correction coverage, the percentage of detected carried out a logic synthesis for 45-nm technology by using
errors in our FUEC–TAEC and FUEC–QUAEC codes the NanGate FreePDK45 Open Cell Library [36], [37].
degrades sharply for longer bursts. By contrast, FUEC–DAEC Table V (and Fig. 13) shows the area occupied by the
maintains a coverage over 60%, near the Matrix detection different circuits in nm2 (1 nm = 10−9 m). As it can be
capability. seen, our encoders present a slightly greater area than the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 13. Area occupied by the different ECCs (in nm2 ). Fig. 14. Power consumption (in μW).
TABLE VI
TABLE VII
P OWER C ONSUMPTION ( IN μW)
D ELAY OVERHEAD FOR D IFFERENT ECC S ( IN ps)
With respect to the delay, FUEC–DAEC presents the fastest [15] S. Pontarelli, G. C. Cardarilli, M. Re, and A. Salsano, “Error correction
correction, even better than both SEC–DAED codes. On the codes for SEU and SEFI tolerant memory systems,” in Proc. 24th
IEEE Int. Symp. Defect Fault Tolerance VLSI Syst. (DFT), Oct. 2009,
contrary, FUEC–TAEC and FUEC–QUAEC codes introduce pp. 425–430.
the highest correction delay, because they are designed to [16] A. Sánchez-Macián, P. Reviriego, J. Tabero, A. Regadío, and
correct longer burst errors. In this way, decoder circuits are J. A. Maestro, “SEFI protection for nanosat 16-bit chip onboard
computer memories,” IEEE Trans. Device Mater. Rel., vol. 17, no. 4,
more complex. Nevertheless, encoder circuits are quite quick. pp. 698–707, Dec. 2017, doi: 10.1109/TDMR.2017.2750718.
In general, and according to the M metric, introduced [17] C. Argyrides, D. K. Pradhan, and T. Kocak, “Matrix codes
to evaluate the overall features of the analyzed codes, our for reliable and cost efficient memory chips,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 19, no. 3, pp. 420–428,
FUEC–DAEC code presents the best performance to correct Mar. 2011.
single or 2-bit burst errors, or to detect 3- or 4-bit burst [18] C. Argyrides, H. R. Zarandi, and D. K. Pradhan, “Matrix codes:
errors. In addition, FUEC–TAEC and FUEC–QUAEC have Multiple bit upsets tolerant method for SRAM memories,” in Proc.
22nd IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Sep. 2007,
demonstrated good features to correct 3-bit and 4-bit burst pp. 340–348.
errors, respectively. Thus, our codes become an appropriate [19] H. S. de Castro, J. A. N. da Silveira, A. A. P. Coelho, F. G. A. e Silva,
option for critical applications in embedded systems. Beyond P. D. S. Magalhães, and O. A. de Lima, “A correction code for multiple
cells upsets in memory devices for space applications,” in Proc. 14th
4-bit burst errors, the performance of our codes decreases IEEE Int. New Circuits Syst. Conf. (NEWCAS), Jun. 2016, pp. 1–4.
notably due to their low redundancy. If these errors are [20] S. Ahmad, M. Zahra, S. Z. Farooq, and A. Zafar, “Comparison of EDAC
expected to occur, more powerful ECCs must be employed. schemes for DDR memory in space applications,” in Proc. Int. Conf.
Aerosp. Sci. Eng. (ICASE), Aug. 2013, pp. 1–5.
In a future work, we want to continue developing ECCs, [21] D. E. Müller, “Application of Boolean algebra to switching circuit design
decreasing area, power, and delay overheads while maintain- and to error detection,” IRE Trans. Electron. Comput., vol. 3, no. 3,
ing, or even increasing, the code coverage. We want focus on pp. 6–12, 1954.
[22] J. M. Berger, “A note on error detection codes for asymmetric channels,”
long burst errors, which are expected to have more and more Inf. Control, vol. 4, pp. 68–73, Mar. 1961.
impact on space systems. [23] O. Goldreich and L. A. Levin, “A hard-core predicate for all one-
way functions,” in Proc. 21st ACM Symp. Theory Comput., 1989,
pp. 25–32.
R EFERENCES [24] M. Bossert, Channel Coding for Telecommunications. New York, NY,
USA: Wiley. 1999.
[1] (2013). The International Technology Roadmap for Semiconductors. [25] M. J. E. Golay, “Notes on digital coding,” Proc. IRE, vol. 37, p. 657,
[Online]. Available: http://www.itrs2.net/2013-itrs.html Jun. 1949.
[2] S. K. Kurinec and K. Iniewski, Nanoscale Semiconductor Memories: [26] N. B. Anne, U. Thirunavukkarasu, and S. Latifi, “Three and four-
Technology and Application. Boca Raton, FL, USA: CRC Press, 2014. dimensional parity-check codes for correction and detection of multiple
[3] J. Barak, M. Murat, and A. Akkerman, “SEU due to electrons in silicon errors,” in Proc. Int. Conf. Inf. Technol. Coding Comput. (ITCC),
devices with nanometric sensitive volumes and small critical charge,” Apr. 2004, pp. 837–842.
Nucl. Instrum. Methods Phys. Res. B, Beam Interact. Mater. At., vol. 287, [27] L. J. Saiz-Adalid, P. Gil, J.-C. Baraza-Calvo, J.-C. Ruiz,
pp. 113–119, Sep. 2012. D. Gil-Tomas, and J. Gracia-Moran, “Modified Hamming codes
[4] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba, “Impact of to enhance short burst error detection in semiconductor memories,” in
scaling on neutron-induced soft error in SRAMs from a 250 nm to Proc. IEEE Eur. Dependable Comput. Conf. (EDCC), Newcastle, U.K.,
a 22 nm design rule,” IEEE Trans. Electron Devices, vol. 57, no. 7, May 2014, pp. 62–65.
pp. 1527–1538, Jul. 2010. [28] A. Sánchez-Macián, P. Reviriego, and J. A. Maestro, “Hamming SEC-
[5] G. Tsiligiannis et al., “Multiple cell upset classification in commercial DAED and extended Hamming SEC-DED-TAED codes through selec-
SRAMs,” IEEE Trans. Nucl. Sci., vol. 61, no. 4, pp. 1747–1754, tive shortening and bit placement,” IEEE Trans. Device Mater. Rel.,
Aug. 2014. vol. 14, no. 1, pp. 574–576, Mar. 2014.
[6] G. I. Zebrev, K. S. Zemtsov, R. G. Useinov, M. S. Gorbunov, [29] A. Neubauer, J. Freudenberger, and V. Kühn, Coding Theory: Algo-
V. V. Emeliyanov, and A. I. Ozerov, “Multiple cell upset cross-section rithms, Architectures and Applications. New York, NY, USA: Wiley,
uncertainty in nanoscale memories: Microdosimetric approach,” in Proc. 2007.
15th Eur. Conf. Radiat. Effects Compon. Syst. (RADECS), Sep. 2015, [30] F. Silva et al., “Evaluation of multiple bit upset tolerant codes for
pp. 1–5. NoCs buffering,” in Proc. IEEE 8th Latin Amer. Symp. Circuits
[7] N. Chechenin and M. Sajid, “Multiple cell upsets rate estimation for Syst. (LASCAS), Feb. 2017, pp. 1–4.
65 nm SRAM bit-cell in space radiation environment,” in Proc. 3rd Int. [31] L. J. Saiz-Adalid et al., “Flexible unequal error control codes with
Conf. Exhib. Satell. Space Missions, May 2017, p. 77. selectable error detection and correction levels,” in Proc. 32th Int. Conf.
[8] N. N. Mahatme, B. L. Bhuva, Y.-P. Fang, and A. S. Oates, “Impact of Comput. Safety, Rel. Secur. (SAFECOMP), Sep. 2013, pp. 178–189.
strained-Si PMOS transistors on SRAM soft error rates,” IEEE Trans. [32] K. A. LaBel. (Aug. 2009). Proton Single Event Effects (SEE) Guideline.
Nucl. Sci., vol. 59, no. 4, pp. 845–850, Aug. 2012. NASA Electronic Parts and Packaging (NEPP). [Online]. Available:
[9] Y. Bentoutou, “Program memories error detection and correction on- https://nepp.nasa.gov/files/18365/Proton_RHAGuide_NASAAug09.pdf
board earth observation satellites,” Int. J. Elect. Comput. Eng., vol. 4, [33] M. Greenberg. (2014). Reliability, Availability, and Service-
no. 6, pp. 933–936, 2010. ability (RAS) for DDR Dram Interfaces. [Online]. Available:
[10] M. J. Gadlage, A. H. Roach, A. R. Duncan, A. M. Williams, http://www.memcon.com/pdfs/proceedings2014/NET105.pdf
D. P. Bossev, and M. J. Kay, “ Multiple-cell upsets induced by sin- [34] S. Shamshiri and K.-T. Cheng, “Error-locality-aware linear coding to
gle high-energy electrons,” IEEE Trans. Nucl. Sci., vol. 65, no. 1, correct multi-bit upsets in SRAMs,” in Proc. IEEE Int. Test Conf. (ITC),
pp. 211–216, Jan. 2018, doi: 10.1109/TNS.2017.2756441. Nov. 2010, pp. 1–10.
[11] E. Fujiwara, Code Design for Dependable Systems: Theory and Prac- [35] Cadence: EDA Tools and IP for System Design Enablement. Accessed:
tical Applications. Hoboken, NJ, USA: Wiley, 2006. May 23, 2018. [Online]. Available: https://www.cadence.com
[12] R. W. Hamming, “Error detecting and error correcting codes,” Bell Syst. [36] J. E. Stine et al., “FreePDK: An open-source variation-aware design kit,”
Tech. J., vol. 29, no. 2, pp. 147–160, Apr. 1950. in Proc. IEEE Int. Conf. Microelectron. Syst. Educ. (MSE), Jun. 2007,
[13] C. L. Chen and M. Y. Hsiao, “Error-correcting codes for semiconductor pp. 173–174.
memory applications: A state-of-the-art review,” IBM J. Res. Develop., [37] NanGate FreePDK45 Open Cell Library. Accessed: May 28, 2018.
vol. 58, no. 2, pp. 124–134, Mar. 1984. [Online]. Available: http://www.nangate.com/?page_id=2325
[14] G. C. Cardarilli, M. Ottavi, S. Pontarelli, M. Re, and A. Salsano, “Fault [38] A. Mukati, “Review: A survey of memory error correcting techniques
tolerant solid state mass memory for space applications,” IEEE Trans. for improved reliability,” J. Netw. Comput. Appl., vol. 34, no. 2,
Aerosp. Electron. Syst., vol. 41, no. 4, pp. 1353–1372, Oct. 2005. pp. 517–522, Mar. 2011.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Joaquín Gracia-Morán received the B.Sc., M.Sc., Daniel Gil-Tomás received the B.Sc. degree in elec-
and Ph.D. degrees in computer engineering from the trical and electronic physics and the Ph.D. degree
Universitat Politècnica de València (UPV), Valencia, in computer engineering from the Universitat de
Spain, in 1995, 1997, and 2004, respectively. València (UPV), Valencia, Spain, in 1985 and 1999,
He is currently an Associate Professor at the respectively.
Department of Computer Engineering, UPV. He is He is currently an Associate Professor at the
a member of the Fault-Tolerant Systems (STF) Department of Computer Engineering, UPV. He is
research line within the Instituto ITACA. His current a member of STF-ITACA. His current research
research interests include the design and implemen- interests include the design and validation of fault-
tation of digital systems, the design and validation tolerant systems, reliability physics, and the reliabil-
of STF, and VHDL-based fault injection. ity of emerging nanotechnologies.