稀疏卷积计算高效数据加载与输出缓存策略
作者:
作者单位:

(国防科技大学 电子科学学院, 湖南 长沙 410073)

作者简介:

刘彪(1998—),男,湖南娄底人,硕士研究生,E-mail:liubiao_nudt@163.com; 于红旗(通信作者),男,河南开封人,副教授,博士,硕士生导师,E-mail:13755132901@163.com

通讯作者:

中图分类号:

TN492

基金项目:

国家自然科学基金资助项目(61804181,62074166);国家重点研发计划资助项目(2019YFB2205102)


High-efficiency data loading and output buffering strategy for sparse convolutional computing
Author:
Affiliation:

(College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有神经网络加速器在处理稀疏神经网络时存在的数据加载效率低、乘加资源利用率低、输出缓存寻址逻辑复杂等问题,提出了稀疏卷积计算高效数据加载与输出缓存策略。将属于同一输入通道的非零输入特征图像数据和非零权重进行全对全乘累加运算,降低了非零数据配对难度,提高了乘加资源利用率;通过采用输入驻留计算,以及密集型循环加载特征图像数据,大幅减少了数据片外调取次数;优化了输出缓存设计,解决了现有方案中存在的输出缓存地址访问争用、存储拥塞等问题。实验表明,与采用类似架构的细粒度脉动加速器相比,在处理单元面积上减少了21.45%;在数据加载速度方面平均提高了117.71%;在平均乘法器利用率方面提高了11.25%,达到89%。

    Abstract:

    In view of the problems such as inefficient data loading, insufficient utilization of multiply-accumulates resources, complex output buffering and addressing logic in existing neural network accelerators when processing sparse neural networks, a high-efficiency data loading and output buffering strategy for sparse convolutional computing was proposed. It performed an all-to-all multiply-accumulates operation on the non-zero input feature map data and the non-zero weights belonging to the same input channel, which reduces the difficulty of non-zero data pairing and improves the utilization of multiply-accumulates resources. By using input stationary calculation and intensive cyclic loading of input feature map data, it significantly reduced the number of data off-chip fetches. It optimized the output buffer design and solved the problems of address access contention and storage congestion during output buffering in existing solutions. Experimental results show that, when compare to fine-grained systolic accelerator with similar architectures, the process element area of the proposed architecture is decreased by 21.45%; the data loading speed is increased by 117.71% on average; the average utilization of multiplier is increased by 11.25%, reaching 89%.

    参考文献
    相似文献
    引证文献
引用本文

刘彪,陈长林,张宇飞,等.稀疏卷积计算高效数据加载与输出缓存策略[J].国防科技大学学报,2023,45(5):212-221.
LIU Biao, CHEN Changlin, ZHANG Yufei, et al. High-efficiency data loading and output buffering strategy for sparse convolutional computing[J]. Journal of National University of Defense Technology,2023,45(5):212-221.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-06-08
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-09-26
  • 出版日期: 2023-10-28
文章二维码