引用本文: | 田鸿运,武林平,董勇,等.面向大规模集群的并行I/O用户层配置优化策略.[J].国防科技大学学报,2020,42(2):23-30.[点击复制] |
TIAN Hongyun,WU Linping,DONG Yong,et al.User-level parallel I/O configuration optimize strategy toward large-scale cluster[J].Journal of National University of Defense Technology,2020,42(2):23-30[点击复制] |
|
|
|
本文已被:浏览 6987次 下载 5136次 |
面向大规模集群的并行I/O用户层配置优化策略 |
田鸿运1 ,武林平1,董勇2,景翠萍1,罗红兵1,莫则尧1 |
(1. 北京应用物理与计算数学研究所, 北京 100094;2. 国防科技大学 计算机学院, 湖南 长沙 410073)
|
摘要: |
影响应用I/O性能的关键因素主要有三个层次:包括应用的I/O接口实现、体系结构和文件系统组件的性能以及应用的I/O参数配置。从应用I/O配置优化的视角,分析了大规模集群并行I/O的配置调优空间,在此基础上,给出了一套大规模集群并行I/O性能特征测试分析方法。基于该方法,在某国产超级计算集群上开展了一系列I/O测试分析来刻画系统的I/O性能特征,进而指导并行应用程序的I/O配置优化。基于优化后的配置参数,在两类典型的并行I/O场景中,针对某类生产应用程序,8192进程下的重启动数据写操作时间下降了15%,4096核的程序作业加载时间从10 min缩短到了5 s。 |
关键词: 并行I/O优化策略 Lustre文件系统 大规模集群 传输数据量 条带数 |
DOI:10.11887/j.cn.202002003 |
投稿日期:2019-09-19 |
基金项目:国家重点研发计划资助项目(2018YFB0204003) |
|
User-level parallel I/O configuration optimize strategy toward large-scale cluster |
TIAN Hongyun1, WU Linping1, DONG Yong2, JING Cuiping1, LUO Hongbing1, MO Zeyao1 |
(1. Institute of Applied Physics and Computational Mathematics, Beijing 100094, China;2. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)
|
Abstract: |
Three key factors exert big influence upon the application′s I/O performance, including the I/O programming interface, the performance characteristic of I/O sub-system( both architecture and system software), and the I/O configuration parameters at user-level. From the user′s perspective, this paper discussed the user-level parallel I/O configuration optimize space toward large scale cluster. Besides, we proposed a method of testing and analyzing the I/O characteristic of large scale cluster. Based on this method, the I/O performance portrait of a domestic super computer was built up and several user-level parallel I/O optimize suggestions were put forward. With these carefully selected I/O configuration parameters, the time of restart data write operation was cut down by 15 percent under 8192 processes in a real application environment, while the program′s initial time is shortened from 10 minutes to 5 seconds at the scale of 4096 processes. |
Keywords: parallel I/O optimize strategy Lustre file system large-scale cluster transfer size stripe count |
|
|
|
|
|