Straggler-Resistant Distributed Matrix Computation via Coding Theory: Removing a Bottleneck in Large-Scale Data Processing

May 2020

By

Aditya Ramamoorthy; Anindya Bijoy Das; Li Tang;

The current big data era routinely requires the processing of large-scale data on massive distributed computing clusters. In these applications, data sets are often so large that they cannot be housed in the memory and/or the disk of any one computer. Thus, the data and the processing are typically distributed across multiple nodes. Distributed computation is thus a necessity rather than a luxury. The widespread use of such clusters presents several opportunities and advantages over traditional computing paradigms. However, it also presents newer challenges where coding-theoretic ideas have recently had a significant impact. Large-scale clusters (which can be heterogeneous in nature) suffer from the problem of stragglers, which are slow or failed worker nodes in the system. Thus, the overall speed of a computation is typically dominated by the slowest node in the absence of a sophisticated assignment of tasks to the worker nodes.

Publications & Resources

Conferences & Events

Professional Development

Community & Involvement

About IEEE SPS

For Volunteers