注意力机制量化剪枝优化方法

doi:10.11887/j.cn.202401012

首页 > 过刊浏览>2024年第46卷第1期 >113-120. DOI:10.11887/j.cn.202401012

注意力机制量化剪枝优化方法
DOI:
                        10.11887/j.cn.202401012
                    
作者:
                        
                        
                    
作者单位:(1. 国防科技大学 计算机学院, 湖南 长沙 410073;2. 国防科技大学 并行与分布计算全国重点实验室, 湖南 长沙 410073)
作者简介:何源宏(1998—),男,湖南宁乡人,硕士研究生,E-mail:heyuanhongcs@nudt.edu.cn
通讯作者:
中图分类号:TP18
基金项目:重点实验室稳定支持重点资助项目(WDZC20215250103)

Quantization and pruning optimization method for attention mechanism

Author:

Affiliation:

(1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;2. National Key Laboratory of Paralle and Distributed Computing, National University of Defense Technology, Changsha 410073, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

面向基于注意力机制模型的巨大计算和访存开销问题,研究量化和剪枝协同优化的模型压缩技术,提出针对注意力机制中查询、键、值、概率共四个激活值矩阵的对称线性定点量化方法。同时,提出概率矩阵剪枝方法和渐进式剪枝策略,有效降低剪枝精度损失。在不同数据集上的实验结果表明,针对典型基于注意力机制模型BERT,在较低或者没有精度损失的情况下该优化方法可达到4位或8位定点量化、0.93~0.98的稀疏度,大幅度降低模型计算量,为加速量化稀疏模型的推理奠定良好的基础。

Abstract:

To address the significant computation and memory overhead of models based on attention mechanism, model compression techniques, such as collaborative optimization of quantization and pruning, were studied. A symmetric linear fixed point quantization method was proposed for four activation matrices of query, key, value and probability in the attention mechanism. Meanwhile, a probability matrix pruning method and a progressive pruning strategy were proposed to effectively reduce the pruning accuracy loss. Experimental results on different datasets show that for the typical attention-based model BERT, this optimization method can achieve 4 bit or 8 bit fixed point quantization and 0.93~0.98 sparsity with little or no accuracy loss, which greatly reduces the model computation and lays a strong foundation for accelerating the inference of quantized sparse models.

参考文献

相似文献

引证文献

引用本文

何源宏,姜晶菲,许金伟.注意力机制量化剪枝优化方法[J].国防科技大学学报,2024,46(1):113-120.
HE Yuanhong, JIANG Jingfei, XU Jinwei. Quantization and pruning optimization method for attention mechanism[J]. Journal of National University of Defense Technology,2024,46(1):113-120.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-10-17
最后修改日期:
录用日期:
在线发布日期: 2024-01-28
出版日期: 2024-02-28

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码