Order-preserving triggering mechanism and data buffering method for collective communication hardware offloading
CSTR:
Author:
Affiliation:

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China

Clc Number:

TP302.2

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To further optimize the hardware offloading of collective communication based on the network interface card in the "Tianhe" network, and to support more types of collective communication algorithms and larger message sizes, the order-preserving triggering mechanism and data buffering method for collective communication hardware offloading was investigated. An order-preserving triggering mechanism for concurrent multitasking was proposed, which meets the desired semantics of collective communication and ensures the reproducibility of floating-point computation results. A dynamic network data buffering method based on Hash tables and pulsed credit flow control was proposed to alleviate the contradiction between limited hardware buffering resources and the high demand for buffering a large amount of network data from concurrent multitasking. Experimental results show that compared with software-based collective communication operations, this method can support the hardware offloading of various algorithms for several typical collective communication operations, with significant performance improvement. Meanwhile, the hardware implementation cost is low, especially with high utilization of buffering resources.

    Reference
    Related
    Cited by
Get Citation

徐金波, 董德尊, 李宝峰, 等. 面向集合通信硬件卸载的维序触发机制和数据缓存方法[J]. 国防科技大学学报, 2025, 47(6): 13-23.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 04,2025
  • Revised:
  • Adopted:
  • Online: December 02,2025
  • Published:
Article QR Code