Abstract:To further optimize the hardware offloading of collective communication based on the NIC(network interface card ) in the "Tianhe" network, and to support more types of collective communication algorithms and larger message sizes, this study investigated the order-preserving triggering mechanism and data buffering methods for collective communication hardware offloading was investigated. An order-preserving triggering mechanism for concurrent multitasking was proposed, which meets the desired semantics of collective communication and ensures the reproducibility of floating-point computation results. A dynamic network data buffering method based on hash tables and pulsed credit flow control was proposed to alleviate the contradiction between limited hardware buffering resources and the high demand for buffering a large amount of network data from concurrent multitasking. Experimental results show that compared with software-based collective communication operations, this workmethod can support the hardware offloading of various algorithms for several typical collective communication operations, with significant performance improvement. Meanwhile, the hardware implementation cost is low, especially with high utilization of buffering resources.