引用本文: | 武林平,景翠萍,刘旭,等.MPI并行程序中通信等待问题的诊断方法及其应用.[J].国防科技大学学报,2020,42(2):47-54.[点击复制] |
WU Linping,JING Cuiping,LIU Xu,et al.Diagnostic methods for communication waiting in MPI parallel programs and applications[J].Journal of National University of Defense Technology,2020,42(2):47-54[点击复制] |
|
|
|
本文已被:浏览 6364次 下载 5414次 |
MPI并行程序中通信等待问题的诊断方法及其应用 |
武林平,景翠萍,刘旭,田鸿运 |
(北京应用物理与计算数学研究所, 北京 100094)
|
摘要: |
随着并行规模的扩大,现有通信等待问题的诊断方法存在内存开销大、测量时间开销大等问题。通过对现有通信等待问题诊断方法的深入分析,同时考虑测量开销可控的实际需求,建立基于热点函数的通信等待问题诊断模型。基于上述模型,总结出一种更精简、更实用的通信等待问题诊断方法。将该诊断方法分别应用到二维LARED集成、LARED-S、LAP3D等大规模MPI并行程序的通信等待问题诊断过程,应用效果表明本诊断方法可精确定位导致通信等待问题的关键代码段,给出的优化方案及性能提升空间对于后续的程序改进具有参考价值,其中根据诊断结果优化后的LARED-S程序性能提升32%,通信等待时间减少44%。 |
关键词: 通信等待 MPI并行程序 负载平衡 性能诊断 |
DOI:10.11887/j.cn.202002006 |
投稿日期:2019-09-20 |
基金项目:国家重点研发计划资助项目(2018YFB0204003);国家自然科学基金资助项目(61672003);国家自然科学基金青年科学基金资助项目(11601034) |
|
Diagnostic methods for communication waiting in MPI parallel programs and applications |
WU Linping, JING Cuiping, LIU Xu, TIAN Hongyun |
(Institute of Applied Physics and Computational Mathematics, Beijing 100094, China)
|
Abstract: |
As the increasing of the scale of parallel systems, some problems such as large measurement cost and memory overhead exist in the diagnostic methods of communication waiting phenomenon. With the deep analysis on the existing diagnostic methods, and considering the actual demand of controllable measurement, a diagnosis model for communication waiting based on hotspot function was established, and a tidy and practical diagnostic method based on the above model was presented. The above diagnostic method was applied to the diagnostic process of the communication waiting phenomenon in the large-scale MPI parallel programs, such as the LARED integration, the LARED-S, the LAP3D. The application results show that this method can accurately identify the key code segment leading to communication waiting and the proposed optimization solution and performance improvement space has reference value for the subsequent program improvement. The optimized LARED-S program, according to the diagnostic result, can increase performance by 32% and reduce communication waiting time by 44%. |
Keywords: communication waiting MPI parallel programs load balance performance diagnosis |
|
|
|
|
|