Microarchitecture-level reliability assessment of multi-bit upsets in processors

Postgraduate Thesis uoadl:2875205 712 Read counter

Unit:
Κατεύθυνση Σχεδίασης Ολοκληρωμένων Κυκλωμάτων
Πληροφορική
Deposit date:
2019-05-29
Year:
2019
Author:
Katsoridas Giorgos
Gavanas Christos
Supervisors info:
Δημήτριος Γκιζόπουλος, Καθηγητής, Τμήμα Πληροφορικής & Τηλεπικοινωνιών, Εθνικό & Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Microarchitecture-level reliability assessment of multi-bit upsets in processors
Languages:
English
Translated title:
Microarchitecture-level reliability assessment of multi-bit upsets in processors
Summary:
The continuing decrease in feature sizes for modern Integrated Circuits (ICs) leads to an ever-important role of reliability and vulnerability assessments on the core in early stages of the design (pre-silicon validation). With the increase of the lithography resolution in recent technological nodes, the radiation effects play a bigger role, leading to more severe effects in the devices and increased numbers of multi-bit faults. Therefore, it is crucial to utilize some common fault injection mechanisms to evaluate each design, using micro-architectural simulators, which provide us with flexibility and improved latency, compared to RTL (Register Transfer Level) designs.
This thesis focuses on the multi-bit faults, showing their effects on different components of a microarchitectural model of the ARM Cortex-A9 core, implemented on the Gem5 simulator. For that, the GeFIN (Gem-5 based Fault INjector) is used for the fault injection campaigns, with the addition of an improved fault mask generation tool for the creation of fault masks with some particular characteristics. The improved version of the fault mask generator includes the capability for the injection of multi-bit faults in adjacent areas of a structure, a case very common in real environments. The generator also includes the ability to insert faults in interleaved memories, a widely used technique to mitigate the effects of multiple bit upsets.
The results of this study showed that some specific components of the core under test (e.g. the Instruction Translation Lookaside Buffer) showed significant vulnerability to fault injection, with rates as low as 25% correct executions for 1000 experiments, while others like the Level 1 Data/Instruction Caches and the Level 2 Cache showed bigger vulnerability to the increasing number of faults injected, with a variation of as high as 24% between single and triple bit fault injection for the L1 D-Cache. Those numbers were related to the “theoretical” Architectural Vulnerability Factor (AVF), independent of the fabrication technology node. An extension in the calculation was done to compute the AVFs for each technology node from 250 nm to 22 nm, showing increasing AVF rates as the node decreases.
Lastly, a reliability assessment was done, using the Failures in Time (FIT) metric, which showed the highest numbers for the Level 2 Cache, primarily because of its size (4 MBits) with a FIT of 822.9 at the 130 nm. The FIT of the core showed a high of 918 at the same node, while we observed that for nodes smaller than 130 nm the FITs decreased primarily because of the decrease of the raw FIT factor of each technology.
Main subject category:
Technology - Computer science
Keywords:
fault tolerance, multiple bit faults, microarchitectural simulation, reliability & vulnerability assessment, interleaving
Index:
Yes
Number of index pages:
7
Contains images:
Yes
Number of references:
18
Number of pages:
125
THESIS_DOCUMENTATION.pdf (2 MB) Open in new window