Efficient RNN inference engine on very long vector processor
CSTR:
Author:
Affiliation:

(1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China;2. National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410073, China)

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the increasing depth and the inconsistent length of processing sequences, the performance optimization of RNN(recurrent neural network) on different processors makes it difficult to researchers. An efficient RNN acceleration engine was implemented for the self-developed long vector processor FT-M7032. This engine proposed a row-first matrix vector multiplication algorithm and a data-aware multi-core parallel method to improve the computational efficiency of matrix vector multiplication. It proposed a two-level kernel fusion optimization method to reduce the overhead of temporary data transmission. Optimized handwritten assembly codes for multiple operators were integrated to further tap the performance potential of long vector processors. Experiments show that the RNN engine for long-vector processors is efficient, when compared with the multi-core ARM CPU and Intel Golden CPU, the RNN-like model long short term memory networks can achieve a performance acceleration of up to 62.68 times and 3.12 times, respectively.

    Reference
    Related
    Cited by
Get Citation

SU Huayou, CHEN Kangkang, YANG Qianming. Efficient RNN inference engine on very long vector processor[J]. Journal of National University of Defense Technology,2024,46(1):121-130.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 07,2022
  • Revised:
  • Adopted:
  • Online: January 28,2024
  • Published: February 28,2024
Article QR Code