符合粒子输运模拟的专用加速器体系结构

2025,47(2):155-164
张建民
国防科技大学 计算机学院, 湖南 长沙 410073
刘津津
国防科技大学 计算机学院, 湖南 长沙 410073
许炜康
国防科技大学 计算机学院, 湖南 长沙 410073
黎铁军
国防科技大学 计算机学院, 湖南 长沙 410073
摘要:
粒子输运模拟是高性能计算机的主要应用,对于其日益增长的计算规模需求,通用微处理器由于其单核结构复杂,无法适应程序特征,难以获得较高的性能功耗比。因此,对求解粒子输运非确定性数值模拟的程序特征进行提取与分析;基于算法特征,对开源微处理器内核架构进行定制设计,包括加速器流水线结构、分支预测部件、多级Cache层次与主存设计,构建一种符合粒子输运程序特征的专用加速器体系结构。在业界通用体系结构模拟器上运行粒子输运程序的模拟结果表明,与ARM Cortex-A15相比,所提出的专用加速器体系结构在同等功耗下可获得4-6倍的性能提升,在同等面积下可获得3-2倍的性能提升。
基金项目:
国家重点研发计划资助项目(2022YFB2803405);国家自然科学基金资助项目(62072464,U19A2062)

Specific accelerator architecture conforming to particle transport simulation

ZHANG Jianmin
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China
LIU Jinjin
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China
XU Weikang
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China
LI Tiejun
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073 , China
Abstract:
Particle transport simulation is one of the main applications of high performance computers. But facing to its fast growing compute requirements, the general-purpose microprocessors cannot adapt to the particle transport program features, owing to the complexity architecture of its single core, and then it is difficult to obtain high ratio of performance and power. Therefore, the program features of the particle transport non-deterministic numerical simulation were extracted and analyzed. Based on the characteristics of the algorithm, the architecture of open-source microprocessor core was designed, including pipeline structure, branch prediction unit, multi-level Cache hierarchy and main memory design. A specific accelerator architecture was designed in accordance to the particle transport program features. The simulation results of running the particle transport program on the general architecture simulator show that, as compared with ARM Cortex-A15, the proposed specific accelerator can achieve 4.6 times performance improvement under the same power consumption, and 3.2 times under the same area.
收稿日期:
     下载PDF全文