Optimizing operator computation of MiniGo on high-performance heterogeneous accelerator
CSTR:
Author:
Affiliation:

(1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;2. National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China)

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    An efficient parallel computing method based on the characteristics of the high-performance heterogeneous accelerator and the training mode of MiniGo was proposed. The on-chip computing resources were reasonably planned to achieve pipelining parallel optimization between heterogeneous devices. The shared memory programming was designed according to the existence of shared storage segments between heterogeneous devices to reduce data transmission costs. According to the characteristics of multiple computing resources in a digital signal processing cluster, combined with the computing-memory access feature of the operators, different optimization strategies were designed. At the same time, this method provides an easy-use high-performance operator library for TensorFlow. The experimental results show that this method realizes the multi-core parallel computing of operators. The speedup of convolution was 24.69 compared with that was achieved on a single core. Compared with the cropped version of the 8-core FT2000+ CPU, the speedup of training and self-play execution on this method were 3.83 and 1.5, respectively.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 15,2022
  • Revised:
  • Adopted:
  • Online: January 28,2024
  • Published: February 28,2024
Article QR Code