Assessment of modern RISC-V microprocessors reliability using runtime hardware measurements

Graduate Thesis uoadl:3480370 13 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2025-04-25
Year:
2025
Author:
KONSTANTINIDIS ILIAS
Supervisors info:
Δημήτριος Γκιζόπουλος, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Original Title:
Assessment of modern RISC-V microprocessors reliability using runtime hardware measurements
Languages:
English
Translated title:
Assessment of modern RISC-V microprocessors reliability using runtime hardware measurements
Summary:
Assessing hardware reliability against different external or internal disturbances is a critical challenge in processor design, especially in the context of complex microarchitectures with out-of-order (O3) execution, where increased instruction-level parallelism can differentiate the impact of transient faults. This thesis explores the prediction of the Architectural Vulnerability Factor (AVF - the standard metric for transient faults measurements) and related error outcomes (Silent Data Coruptions - SDCs, Timeouts, Assertions/Crashes) across key hardware structures (Register File, L1 Data Cache, L1 Instruction Cache) of designs with a RISC-V architecture. Utilizing the popular gem5 simulator, a series of automated Python scripts were developed to create and execute checkpoints for the collection of runtime hardware metrics. A recent microarchitectural modeling and injection framework (gem5-MARVEL) was employed to calculate the corresponding AVF values through statistically injecting single-bit faults into random locations of the hardware structures and CPU cycles during program execution. Feature selection strategies based on the correlation of the performance metrics were implemented to identify the most relevant hardware metrics for each component and support efficient regression procedures. Several regression techniques (linear, polynomial, ridge and lasso models), with additional analysis performed using the Patient Rule Induction Method (PRIM), were evaluated using various scientific Python libraries. While moderately strong R2 values were observed in the case of total-AVF and SDC-AVF, our final conclusions highlight difficulties in accurate AVF prediction at runtime.
Main subject category:
Technology - Computer science
Keywords:
Architectural Vulnerability Factor (AVF), Silent Data Corruption (SDC), fault injections, AVF estimation, regression
Index:
No
Number of index pages:
0
Contains images:
Yes
Number of references:
22
Number of pages:
65
BSc_Thesis___Ilias_Konstantinidis.pdf (3 MB) Open in new window