Abstract:An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed, including task level parallelism between nodes, CPU-DSP(central processing unit-digital signal process) heterogeneous parallelism and DSP core parallelism. Efficient input/output deployment and eliminated the bottleneck of network communication were realized. A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices. Shared memory programming optimization was realized, and the dense convolution calculation operator acceleration optimization was realized by DSP. Results show that compared with 16 core CPU calculation, the maximum acceleration ratio of single core DSP operator acceleration is 16.44. In this method, the scale of computing nodes is expanded from 1 067 to 4 139, the time required to reach the given termination condition is reduced from 43.02 h to 16.05 h, and the expansion efficiency is 69.1%. Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms.