Improving the Locality of Page Table Walks in the Cache Hierarchy of Modern Microprocessors

Graduate Thesis uoadl:3413330 88 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2024-08-05
Year:
2024
Author:
CHATZOPOULOS ANGELOS-DOROTHEOS
Supervisors info:
Βασίλειος Καρακώστας, Επίκουρος Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Improving the Locality of Page Table Walks in the Cache Hierarchy of Modern Microprocessors
Languages:
English
Greek
Translated title:
Improving the Locality of Page Table Walks in the Cache Hierarchy of Modern Microprocessors
Summary:
As the memory footprints of modern, memory-intensive workloads are increasing, conventional TLBs are often inadequate to fully cover their growing working sets, leading to frequent TLB misses that cause long latency page table walks due to accesses in the main memory.

In this thesis we propose PT-Baker, a software, system-level approach for reducing the latency of page table walks for applications that suffer from a high number of TLB misses and insufficient page table walks locality. The key idea of PT-Baker is the introduction of a helper thread that periodically iterates and accesses the workload’s page table entries to preserve them within the cache hierarchy and accelerate the address translation process. We design and implement this approach in both (i) user-level, where the helper thread touches the workload’s allocated memory using a page-size stride and triggers on purpose page walks, and (ii) kernel-level, by introducing a new system call that directly accesses the page table of the application. The user-level approach avoids any kernel modifications. However, it results in increased memory pressure because, as the helper thread iterates through the application’s allocated memory to touch the page table entries and fetch them in the cache hierarchy, it also fetches the corresponding application data. On the other hand, the kernel-level approach directly fetches the page table entries without filling the memory hierarchy with application data, but this approach requires kernel modifications, in addition to application modifications.

We evaluate our approach on a native and a virtualized system, as well as on a memory-pressured system. We conduct experiments with various parameter settings to adjust the aggressiveness of the helper thread, while we also consider thread placement on the microprocessor to utilize different parts of the cache hierarchy. Our evaluation shows that PT-Baker reduces the main memory accesses due to page walks by up to 83% and improves the performance by up to 5.9% with the user-level approach on a native system. With the kernel-level approach, PT-Baker reduces the main memory accesses due to page walks by up to 99% and improves the performance by up to 18%.
Main subject category:
Technology - Computer science
Keywords:
Virtual Memory, Address Translation, Translation Lookaside Buffer, Page Table, Cache Hierarchy, Cache Locality, Memory System
Index:
Yes
Number of index pages:
2
Contains images:
Yes
Number of references:
33
Number of pages:
51
File:
File access is restricted until 2025-02-05.

Thesis_Angelos_Chatzopoulos.pdf
3 MB
File access is restricted until 2025-02-05.