GPGPU injector 4.0: A Framework for Architectural Vulnerability Factor (AVF) Assessments Across Nvidia GPUs Generations using GPGPU-Sim 4.0 simulator

Postgraduate Thesis uoadl:2967354 93 Read counter

Unit:
Κατεύθυνση / ειδίκευση Υπολογιστικά Συστήματα: Λογισμικό και Υλικό (ΣΥΣ)
Πληροφορική
Deposit date:
2021-11-26
Year:
2021
Author:
Sartzetakis Dimitrios
Supervisors info:
Γκιζόπουλος Δημήτριος, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Εθνικόν και Καποδιστριακόν Πανεπιστήμιον Αθηνών
Original Title:
GPGPU injector 4.0: A Framework for Architectural Vulnerability Factor (AVF) Assessments Across Nvidia GPUs Generations using GPGPU-Sim 4.0 simulator
Languages:
English
Greek
Translated title:
GPGPU injector 4.0: A Framework for Architectural Vulnerability Factor (AVF) Assessments Across Nvidia GPUs Generations using GPGPU-Sim 4.0 simulator
Summary:
A (Graphics Processing Unit) GPU is a programmable processor on which thousands of processing cores run simultaneously in massive parallelism, where each core is focused on making efficient calculations, facilitating real-time processing and analysis of enormous datasets. Due to the development of general purpose parallel programming environments and languages, all modern GPUs are general purpose GPUs (GPGPUs) as they can be programmed for non-graphics applications and they can direct their processing power towards massively parallel problems. Therefore, as in all general-purpose computing platforms, accurate reliability on GPU hardware structures is a very important factor that architects need to estimate early in the design cycle to weigh the benefits of error protection techniques against their costs.
In this thesis, we introduce GPGPU injector 4.0 which is a fault injection framework for Architectural Vulnerability Factor (AVF) assessment of hardware structures and entire GPU chips that runs over the state-of-the-art performance simulator for Nvidia GPUs architectures: GPGPU-sim. We use GPGPU injector 4.0 for fault injection of transient faults (soft errors) on CUDA enabled GPU architecture. The target hardware structures include the register file, the shared memory, the L1 data/texture cache and the L2 cache which altogether account for several tens of MBs on on-chip GPU storage. More specifically, we compute the AVF of two widely used recent graphic cards which are the RTX 2060 and Quadro GV100 by experimenting with ten different CUDA benchmarks that are simulated on the actual instruction set (SASS).
Main subject category:
Technology - Computer science
Keywords:
transient faults, AVF estimation, Failures In Time (FIT), register file, shared memory, cache memories, GPGPU-Sim
Index:
Yes
Number of index pages:
2
Contains images:
Yes
Number of references:
29
Number of pages:
54
Master's thesis Dimitris Sartzetakis.pdf (1 MB) Open in new window