面向集合通信硬件卸载的维序触发机制和数据缓存方法
DOI:
作者:
作者单位:

国防科技大学计算机学院

作者简介:

通讯作者:

中图分类号:

TN302.2

基金项目:

国防科技重点实验室(2022-KJWPDL-11);自主创新科学基金(22-ZZCX-002)


Order-preserving triggering mechanism and data buffering method for collective communication hardware offloading
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了对“天河”网络中基于网卡的集合通信硬件卸载功能进行进一步优化,以支持更多类型的集合通信算法以及更大的消息尺寸,研究了面向集合通信硬件卸载的维序触发机制和数据缓存方法。提出了面向多任务并发的保序触发机制,既满足了期望的集合通信语义,又确保了浮点计算操作结果的可复现性。提出了基于哈希(Hash)表和脉冲信用流控的网络数据动态缓存方法,以缓解有限的硬件缓存资源和多任务并发的大量网络数据缓存需求之间的矛盾问题。实验结果表明,与基于软件方式的集合通信操作相比,可以支持多种典型集合通信操作的多种算法的硬件卸载,且性能提升效果显著,同时,硬件实现代价较低,尤其是在缓存资源方面具有较高的利用率。

    Abstract:

    To further optimize the hardware offloading of collective communication based on the NIC(Network Interface Cad) in the "Tianhe" network, and to support more types of collective communication algorithms and larger message sizes, this study investigates the order-preserving triggering mechanism and data buffering methods for collective communication hardware offloading. An order-preserving triggering mechanism for concurrent multitasking is proposed, which meets the desired semantics of collective communication and ensures the reproducibility of floating-point computation results. A dynamic network data buffering method based on hash tables and pulsed credit flow control is proposed to alleviate the contradiction between limited hardware buffering resources and the high demand for buffering a large amount of network data from concurrent multitasking. Experimental results show that compared with software-based collective communication operations, this work can support the hardware offloading of various algorithms for several typical collective communication operations, with significant performance improvement. Meanwhile, the hardware implementation cost is low, especially with high utilization of buffering resources.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-03-04
  • 最后修改日期:2025-05-20
  • 录用日期:2025-05-22
  • 在线发布日期:
  • 出版日期:
文章二维码