Scalable Far Memory: Balancing Faults and Evictions

Date: February 20, 2026

Title: Scalable Far Memory: Balancing Faults and Evictions

Speaker: Xilin Tang

Abstract: Page-based far memory systems transparently expand an application's memory capacity beyond a single machine without modifying application code. However, existing systems are tailored to scenarios with low application thread counts, and fail to scale on today's multi-core machines. This makes them unsuitable for data-intensive applications that both rely on far memory support and scale with increasing thread count. Our analysis reveals that this poor scalability stems from inefficient holistic coordination between page fault-in and eviction operations. As thread count increases, current systems encounter scalability bottlenecks in TLB shootdowns, page accounting, and memory allocation.
This paper presents three design principles that address these scalability challenges and enable efficient memory offloading. These principles are always-asynchronous decoupling to handle eviction operations as asynchronously as possible, cross-batch pipelined execution to avoid idle waiting periods, and scalability prioritization to avoid synchronization overheads at high thread counts at the cost of eviction accuracy. We implement these principles in both the Linux kernel and a library OS. Our evaluation shows that this approach increases throughput for batch-processing applications by up to 4.2× and reduces 99th percentile latency for a latency-critical memcached application by 94.5%.

Bio: Xilin Tang is a Ph.D. student at Department of Computer Science at Cornell University, primarily based at the NYC campus of Cornell Tech, working with Prof. Alex Conway. His research focuses on quantitatively and formally understanding system behaviors and leveraging this understanding to enhance practical efficiency with theoretical assurance.