引用本文: | 曹文斌,李桦,谢文佳,等.应用多GPU的可压缩湍流并行计算.[J].国防科技大学学报,2015,37(3):78-83.[点击复制] |
CAO Wenbin,LI Hua,XIE Wenjia,et al.Parallel computation of compressible turbulence using multi-GPU clusters[J].Journal of National University of Defense Technology,2015,37(3):78-83[点击复制] |
|
|
|
本文已被:浏览 9838次 下载 7789次 |
应用多GPU的可压缩湍流并行计算 |
曹文斌, 李桦, 谢文佳, 张冉 |
(国防科技大学 航天科学与工程学院,湖南 长沙 410073)
|
摘要: |
利用CUDA Fortran语言发展了基于图形处理器(GPU)的计算流体力学可压缩湍流求解器。该求解器基于结构网格有限体积法,空间离散采用AUSMPW+格式,湍流模型为k-ω SST两方程模型,采用MPI实现并行计算。针对最新的GPU架构,讨论了通量计算的优化方法及GPU计算与PCIe数据传输、MPI通信重叠的多GPU并行算法。进行了超声速进气道及空天飞机等算例的数值模拟以验证GPU在大网格量情况下的加速性能。计算结果表明:相对于Intel Xeon E5-2670 CPU单一核心的计算时间,单块NVIDIA GTX Titan Black GPU可获得107~125倍的加速比。利用四块GPU实现了复杂外形1.34亿网格的快速计算,并行效率为91.6%。 |
关键词: CUDA 图形处理器 湍流 并行计算 计算流体力学 |
DOI:10.11887/j.cn.201503013 |
投稿日期:2014-10-07 |
基金项目:国家自然科学基金资助项目(91016010,91216117) |
|
Parallel computation of compressible turbulence using multi-GPU clusters |
CAO Wenbin, LI Hua, XIE Wenjia, ZHANG Ran |
(College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China)
|
Abstract: |
Based on CUDA Fortran for compressible turbulence simulations, a finite volume computational fluid dynamics solver on the GPU(Graphical Processing Unit) was developed. The solver was implemented with an AUSMPW+ scheme for the spatial dispersion, the k-ω SST model for turbulence model, and MPI communication for parallel computing. Some optimization strategies for fluxes computation and multi-GPU parallel algorithms for overlap of PCIe data transfer and MPI communication with GPU computation have been discussed for the latest generation GPU architecture. Several test cases, such as a supersonic inlet and a space shuttle were chosen to demonstrate the acceleration performance of GPU on large-scale grid size. Results show that when using a NVIDIA GTX Titan Black GPU, the computational expense can be reduced by 107~125 times than using a single core of an Intel Xeon E5-2670 CPU. Fast computing for a complex configuration with 0.134 billion grid sizes has been achieved by using 4 GPUs and the parallel efficiency is 91.6%. |
Keywords: CUDA graphical processing unit turbulence parallel computing computational fluid dynamics |
|
|
|
|
|