Recovering LLVM-IR type information using static analysis

Graduate Thesis uoadl:3389619 48 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2024-02-15
Year:
2024
Author:
ARGYROS ANARGYROS
Supervisors info:
Γιάννης Σμαραγδάκης, Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Original Title:
Recovering LLVM-IR type information using static analysis
Languages:
English
Translated title:
Recovering LLVM-IR type information using static analysis
Summary:
The "LLVM Compiler Infrastructure Project" is an extremely popular collection of tools and technologies for building compilers. One of LLVM's central features is a Static Single Assignment (SSA) form code representation known as the LLVM Intermediate Representation (LLVM-IR).The LLVM-IR is a target language for a lot of compilers that want to make use of the LLVM framework such as Clang/Clang++, Rustc, Swift and more.

A lot of static analysis tools choose to run their analyses at the LLVM-IR level as it encodes the necessary information required to perform those analyses, while simultaneously filtering out language specific high-level concepts that may confuse or add complexity to the process. In addition, analyzing at the intermediate representation level allows them to be compatible with a plethora of languages, as the respective LLVM-IR will be generated by the language's native compiler. Cclyzer-Soufflé is such a tool, it utilizes the LLVM framework to parse the LLVM-IR generated from a language's native compiler, generates facts about the program's source code and then executes various static analysis algorithms defined in datalog.

In version 17 of LLVM framework a decision was made to move from a strongly typed pointer type system to opaque pointer types.This change obscured a lot of information about pointer types in the LLVM-IR, that many static analysis tools including Cclyzer-Soufflé required in order to work effectively. Although pointer type information is no longer directly available, a significant portion remains in the LLVM-IR and can be inferred though static analysis methods. In this work we make use of static analysis to recover missing information about pointer types from the LLVM-IR level as well as integrate this type-inference mechanism we developed, in the Cclyzer toolchain.
Main subject category:
Technology - Computer science
Keywords:
Static Analysis, Type Inference, Datalog, LLVM-IR, Compilers
Index:
Yes
Number of index pages:
1
Contains images:
No
Number of references:
5
Number of pages:
26
thesis.pdf (207 KB) Open in new window