Quantization and pruning optimization method for attention mechanism

doi:10.11887/j.cn.202401012

Home > Archive>Volume 46, Issue 1, 2024 >113-120. DOI:10.11887/j.cn.202401012

Quantization and pruning optimization method for attention mechanism
DOI:
                        10.11887/j.cn.202401012
                    
CSTR:
                        
Author:
                        
Affiliation:(1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;2. National Key Laboratory of Paralle and Distributed Computing, National University of Defense Technology, Changsha 410073, China)
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To address the significant computation and memory overhead of models based on attention mechanism, model compression techniques, such as collaborative optimization of quantization and pruning, were studied. A symmetric linear fixed point quantization method was proposed for four activation matrices of query, key, value and probability in the attention mechanism. Meanwhile, a probability matrix pruning method and a progressive pruning strategy were proposed to effectively reduce the pruning accuracy loss. Experimental results on different datasets show that for the typical attention-based model BERT, this optimization method can achieve 4 bit or 8 bit fixed point quantization and 0.93~0.98 sparsity with little or no accuracy loss, which greatly reduces the model computation and lays a strong foundation for accelerating the inference of quantized sparse models.

Reference

Cited by

Get Citation

HE Yuanhong, JIANG Jingfei, XU Jinwei. Quantization and pruning optimization method for attention mechanism[J]. Journal of National University of Defense Technology,2024,46(1):113-120.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 17,2022
Revised:
Adopted:
Online: January 28,2024
Published: February 28,2024

Home

About Journal

Guide for Authors

Editorial Board

Publication Statement

Open Access Statement

Contact

Journal Subscription

Rss

AI assistant

Chinese

Get Citation

Share

Article Metrics

History

Article QR Code