🗜️ GPU-Accelerated Compressed Collective Operations

Collective operations are essential building blocks in distributed applications, enabling efficient aggregation, distribution, or collection of data among multiple nodes. While these operations are critical for many applications, including machine learning and scientific computing, their performance can be heavily impacted by the large data volumes exchanged between nodes. One way to alleviate this bottleneck is by compressing the data before transmission, reducing communication overhead without compromising accuracy.

Recent works have explored compression techniques such as quantization, sparsification, and error-bounded lossy compression [1], but their integration into collective operations remains an open challenge. Further acceleration can be achieved by leveraging GPUs, which are already heavily used in many distributed workloads.

In this thesis, the student will design and implement GPU-accelerated compressed collective operations. The work will explore various compression approaches, including quantization, sparsification, and advanced techniques like error-bounded compression [1]. The implementation will be evaluated on a GPU cluster and compared against uncompressed collective operations to quantify the performance gains and trade-offs.

[1] gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters - J. Huang et al.

Approximate composition: 20% State of the art analysis, 30% Theory/Design, 50% Implementation/Experiments