Accelerating Complex Data Transfer for Cluster Computing

Friday June 10th, 12-1PM @ BA5205

Speaker: Alexey

Accelerating Complex Data Transfer for Cluster Computing

The ability to move data quickly between the nodes of a distributed system is important for the performance of cluster computing frameworks, such as Hadoop and Spark. We show that in a cluster with modern networking technology data serialization is the main bottleneck and source of overhead in the transfer of rich data in systems based on high-level programming languages such as Java. We propose a new data transfer mechanism that avoids serialization altogether by using a shared cluster-wide address space to store data. The design and a prototype implementation of this approach are described. We show that our mechanism is significantly faster than serialized data transfer, and propose a number of possible applications for it.

Alexey Khrabrov is a 1st year PhD student at University of Toronto, under the supervision of prof. Eyal de Lara. His research interests lie in the area of performance of distributed systems. His current work focuses on leveraging modern network technologies and designing new programming models to improve data transfer performance in cluster computing systems.