面向众核处理器的阴阳K-means算法优化

2024,46(1):93-102
周天阳
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073,zhoutianyang@nudt.edu.cn
王庆林
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
李荣春
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
梅松竹
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
尹尚飞
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
郝若晨
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
刘杰
国防科技大学 计算机学院, 湖南 长沙 410073;
国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073
摘要:
传统阴阳K-means算法处理大规模聚类问题时计算开销十分昂贵。针对典型众核处理器的体系结构特征,提出了一种阴阳K-means算法高效并行加速实现。该实现基于一种新内存数据布局,采用众核处理器中的向量单元来加速阴阳K-means中的距离计算,并面向非一致内存访问(non-unified memory access, NUMA)特性进行了针对性的访存优化。与阴阳K-means算法的开源多线程实现相比,该实现在ARMv8和x86众核平台上分别获得了最高约5.6与8.7的加速比。因此上述优化方法在众核处理器上成功实现了对阴阳K-means算法的加速。
基金项目:
国家自然科学基金资助项目(62002365)

Optimizing Yinyang K-means algorithm on many-core CPUs

ZHOU Tianyang
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China,zhoutianyang@nudt.edu.cn
WANG Qinglin
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
LI Rongchun
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
MEI Songzhu
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
YIN Shangfei
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
HAO Ruochen
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
LIU Jie
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China
Abstract:
Traditional Yinyang K-means algorithm is computationally expensive when dealing with large-scale clustering problems. An efficient parallel acceleration implementation of Yinyang K-means algorithm was proposed on the basis of the architectural characteristics of typical many-core CPUs. This implementation was based on a new memory data layout, used vector units in many-core CPUs to accelerate distance calculation in Yinyang K-means, and targeted memory access optimization for NUMA(non-uniform memory access) characteristics. Compared with the open source multi-threaded version of Yinyang K-means algorithm, this implementation can achieve the speedup of up to 5.6 and 8.7 approximately on ARMv8 and x86 many-core CPUs, respectively. Experiments show that the optimization successfully accelerate Yinyang K-means algorithm in many-core CPUs.
收稿日期:
2022-09-06
     下载PDF全文