面向大规模异构计算平台的MiniGo高效训练方法

doi:10.11887/j.cn.202405022

首页 > 过刊浏览>2024年第46卷第5期 >209-218. DOI:10.11887/j.cn.202405022

面向大规模异构计算平台的MiniGo高效训练方法
DOI:
                        10.11887/j.cn.202405022
                    
作者:
                        
                        
                    
作者单位:(国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073)
作者简介:李荣春(1985—),男,安徽无为人,副研究员,博士,硕士生导师,E-mail:rongchunli@nudt.edu.cn通信作者:贺周雨(1995—),女,贵州兴义人,硕士,E-mail:he535040@163.com
通讯作者:
中图分类号:TP39
基金项目:国家自然科学基金资助项目(61902415)

High efficient training method of MiniGo on large-scale heterogeneous computing platform

Author:

Affiliation:

(National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

提出一种适用于大规模异构计算平台训练MiniGo智能体的高效多级并行训练方法,包括节点间任务级并行、中央处理器-数字信号处理器(central processing unit-digital signal processor, CPU-DSP)异构并行、DSP核内并行。实现了高效的输入/输出部署,消除网络通信瓶颈。提出了面向CPU-DSP共享内存结构的异构计算内存管理,减少异构设备间的数据搬运。实现了共享内存编程优化,并利用DSP实现密集卷积计算算子加速优化。结果表明,与16核CPU计算相比,单核DSP算子加速最大加速比达16.44；该方法实现计算节点规模从1 067扩展至4 139,得到达到给定终止条件所需时间从43.02 h降至16.05 h,可扩展效率为69.1%。评估表明,该方法能够实现MiniGo在大规模异构计算平台的高效并行训练。

Abstract:

An efficient multi-level parallel training method suitable for training MiniGo agents on large-scale heterogeneous computing platforms was proposed, including task level parallelism between nodes, CPU-DSP(central processing unit-digital signal process) heterogeneous parallelism and DSP core parallelism. Efficient input/output deployment and eliminated the bottleneck of network communication were realized. A heterogeneous computing memory management oriented to CPU-DSP shared memory structure was proposed to reduce the data handling between heterogeneous devices. Shared memory programming optimization was realized, and the dense convolution calculation operator acceleration optimization was realized by DSP. Results show that compared with 16 core CPU calculation, the maximum acceleration ratio of single core DSP operator acceleration is 16.44. In this method, the scale of computing nodes is expanded from 1 067 to 4 139, the time required to reach the given termination condition is reduced from 43.02 h to 16.05 h, and the expansion efficiency is 69.1%. Evaluation shows that this method can realize the efficient parallel training of MiniGo on large-scale heterogeneous computing platforms.

参考文献

相似文献

引证文献

引用本文

李荣春,贺周雨,乔鹏,等.面向大规模异构计算平台的MiniGo高效训练方法[J].国防科技大学学报,2024,46(5):209-218.
LI Rongchun, HE Zhouyu, QIAO Peng, et al. High efficient training method of MiniGo on large-scale heterogeneous computing platform[J]. Journal of National University of Defense Technology,2024,46(5):209-218.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-06-27
最后修改日期:
录用日期:
在线发布日期: 2024-09-29
出版日期: 2024-10-28

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码