NIC-based offloading mechanism supporting reduction operation on high-speed interconnection system
Author:
Affiliation:

(College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)

Clc Number:

TN95

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Collective communication is widely used in the field of high-performance computing research and engineering. In large-scale scientific and engineering computing, collective communication overhead accounts for a large proportion, sometimes even reaching 80% of the total messaging overhead. It is the performance bottleneck of the high-performance computing system. A NIC-based offloading mechanism supporting reduction operation was proposed. By embedding reduction operation logic components on NIC, the calculation of data during transmission was implemented, and the burden on the CPU and the communication delay were reduced. A 16-node protocol operation experiment was realized through the FPGA(field programmable gate array) platform, and the protocol operation in different node size was simulated based on the xNetSimPlus simulator. Experiments show that the method can effectively reduce the time of protocol operation in collective communication, and the proposed NIC offloading mechanism that supports reduction operation hardware offload can accelerate all-reduce operations by up to 2.71 times.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 01,2020
  • Revised:
  • Adopted:
  • Online: September 28,2022
  • Published: October 28,2022
Article QR Code