面向AMX单元的矩阵算子优化方法

面向AMX单元的矩阵算子优化方法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:国防科技大学 计算机学院
作者简介:
通讯作者:
中图分类号:TP301.6 TP393
基金项目:国家自然科学基金委员会联合(U24B20151)

Matrix Operator Optimization method for AMX unit

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

在混合专家模型（mixture of experts, MoE）的推理过程中，矩阵算子构成了性能瓶颈，其中尤以注意力模块和专家计算所涉及的矩阵算子耗时最为显著。尽管现有方法已对GPU上的矩阵算子进行了深度优化，然而，鉴于GPU与CPU在内存架构及计算单元方面存在显著差异，这些优化方法难以直接迁移至CPU平台。为此，专门针对CPU的AMX（Advanced Matrix Extensions）单元，提出一种矩阵算子性能优化方案FlashMatrix。创新性地设计了高效的数据布局转换策略，有效规避了因数据布局转换而引发的额外内存访问开销；针对矩阵乘法运算，精心构建了计算访存比最优的微内核，以实现寄存器的高效复用。实验结果表明，相较于当前CPU平台上最先进的矩阵计算库oneDNN，FlashMatrix平均实现了2.5倍的加速效果。对于端到端的推理性能，FlashMatrix实现了约1.2倍的加速比。

Abstract:

In the inference process of mixture-of-experts (MoE) models, matrix operators constitute the primary performance bottleneck, with those in the attention module and expert computation being particularly time-consuming. Although existing approaches have extensively optimized matrix operators on GPUs, the substantial differences between GPU and CPU architectures in memory hierarchy and compute units make these optimizations difficult to transfer directly to CPU platforms. To address this limitation, FlashMatrix is introduced as a matrix-operator optimization scheme tailored for CPUs equipped with Advanced Matrix Extensions (AMX). FlashMatrix incorporates an efficient data layout transformation strategy that avoids additional memory-access overhead caused by layout conversions, and employs a carefully designed micro-kernel for matrix multiplication that achieves an optimal compute-to-memory ratio through effective register reuse. Experimental results show that, compared with the state-of-the-art CPU matrix-computation library oneDNN, FlashMatrix delivers an average 2.5× speedup. For end-to-end inference performance, FlashMatrix achieves a speedup of approximately 1.2×.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-09-30
最后修改日期:2026-04-15
录用日期:2025-12-22
在线发布日期: 2026-04-10
出版日期:

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码