Runtime Supports for Scalable and Efficient Big Data Processing
Big Data analytics has taken an important role in modern computing. The availability of an enormous amount of data has led to the proliferation of large-scale, data-intensive applications. Popular Big Data systems are developed in managed languages such as Java, Scala, and C#. This is primarily because these languages enable fast development cycles due to simple usage and automatic memory management. However, a managed runtime comes at a cost which is easily magnified in the context of Big Data, causing unsatisfactory performance and low scalability. Our experience with dozens of real-world systems reveals the root cause is the mismatch between the fundamental assumptions based on which the current runtime is designed and the characteristics of data-intensive workloads.
In this talk, I will present my work in developing a Big-Data-friendly runtime system, solving the mismatches in real-world systems. Specifically, I will discuss two representative components: Yak, a hybrid GC that provides high throughput and low latency, and Skyway, an efficient mechanism to connect managed heaps of different nodes in a cluster.
Khanh Nguyen is a Ph.D. candidate in the Computer Science Department at UCLA, working with Harry Xu on the intersection of systems and programming languages. He has led the development of a series of compiler and runtime system support to improve the performance of several real-world Big Data systems such as Spark and Hadoop. His work has attracted much attention from both academia and industry. He is a recipient of the Google Ph.D. Fellowship in Systems and Networking, and a Facebook Ph.D. Fellowship Finalist.
Thursday, March 21, 2019 at 12:00am to 12:00am
St Marys, 326