存算一体架构与类脑计算

随着以卷积神经网络(convolutional neural networks,CNN)、深度神经网络(deep neural networks,DNN)、递归神经网络(recurrent neuralnetworks,RNN)等为代表的神经网络算法中的不断成熟,人工智能(artificial intelligence,AI)在自动驾驶、语音与图像识别、知识搜索、语义理解等众多应用领域获得了广泛应用。然而,目前大多数AI芯片本质上仍然采用了存算分离的冯·诺伊曼架构,在计算时需要将数据在存储单元与运算单元之间进行频繁搬移,因而仍然具有较高的延迟和能耗。目前,开发高性能的人工神经突触是突破冯·诺伊曼架构瓶颈实现高能效存算一体类脑计算的研究热点。

本专题聚焦于存算一体的类脑计算架构,从忆阻器架构上基于矩阵-向量乘运算的映射方法、校准方法和存内原位训练的神经形态计算方法的研究,到忆阻器阵列结构设计和集成工艺、多忆阻器阵列互连结构设计等多种存算一体芯片技术方案。这些研究不仅为高能效存算一体架构的忆阻类脑芯片提供了先进技术支撑和创新解决方案,还展现其走向实际应用亟需解决的问题和挑战。

关键词:暂无

  • Display Type:
  • Text List
  • Abstract List
  • 1  Review on the memristor based neuromorphic chips
    CHEN Changlin LUO Changhang LIU Sen LIU Haijun
    2023, 45(1):1-14. DOI: 10.11887/j.cn.202301001
    [Abstract](7694) [HTML](240) [PDF 14.15 M](5271)
    Abstract:
    In order to master the current development status and development trends of memristor based neuromorphic chips, the existing memristor based neuromorphic chips and architectures were investigated. The memristor array structure and integration process, anterior and posterior neuron circuits, multi-array interconnection topology and data transmission strategy used in the chip, as well as the system simulation and evaluation methods used in the chip design process were compared and analyzed. It is concluded that the current circuit design of memristor based neuromorphic chips still need to solve the problems of limited resistance states, large device parameter fluctuation, complex array peripheral circuits, small integration scale, etc. It is pointed out that the actual application of this type of chip still faces challenges such as the improvement of memristor production process, improvement of development tool support, special instruction set development, and determination of typical traction applications.
    2  Memristive neuromorphic computing approach combining calibration method and in-memory training
    DU Xiangyu PENG Jie LIU Haijun
    2023, 45(5):202-206. DOI: 10.11887/j.cn.202305023
    [Abstract](4982) [HTML](354) [PDF 1.28 M](3369)
    Abstract:
    Memristor based neuromorphic computing architecture has achieved good results in image classification, speech recognition and other fields, but when the memristor array has the problem of low yield, the performance declines significantly. A method combining memristive neuromorphic computing based calibration method with in-situ training was proposed, which increased the accuracy of multiplicative accumulation calculation by using the calibration method and reduced the training error by using the in-situ training method. In order to verify the performance of the proposed method, a multi-layer perceptron architecture was used for simulation. From the simulation results, the accuracy of the neural network is improved obviously (nearly 40%). Experimental results show that compared with the single calibration method, the precision of the network trained by the proposed method is improved by about 30%, and the precision of the network trained by the proposed method is improved by 0.29% when compared with other mainstream methods.
    3  Multi-memristor-array interconnection structure design for large scale CNN acceleration
    TANG Liqin DIAO Jietao CHEN Changlin LUO Changhang LIU Biao LIU Sitong ZHANG Yufei WANG Qin
    2023, 45(5):222-230. DOI: 10.11887/j.cn.202305026
    [Abstract](8104) [HTML](386) [PDF 2.85 M](3013)
    Abstract:
    To address the problems of inefficient data loading and readout and poor flexibility of array collaboration in existing multi-memristor-array, a highly efficient and flexible multi-array interconnection architecture was proposed. The data loading strategy of the architecture supports data reuse in multiple weight mapping modes, reducing the need for off-chip data access; the readout network supports flexible combination of multiple processing units to achieve different scales of convolutional operations, as well as fast accumulation and readout of computation results, thus improving chip flexibility and overall computing power. Simulation experiments performed on the NeuroSim platform with running VGG-8 networks indicate a 146% increase in processing speed than that of the MAX2 neural network accelerator, with only a 6% increase in area overhead.
    4  Accelerating parallel reduction and scan primitives on ReRAM-based architectures
    JIN Zhou DUAN Yiru YI Enxin JI Haonan LIU Weifeng
    2022, 44(5):80-91. DOI: 10.11887/j.cn.202205009
    [Abstract](5076) [HTML](232) [PDF 19.75 M](3715)
    Abstract:
    Reduction and scan are two critical primitives in parallel computing. Thus, accelerating reduction and scan shows great importance. However, the Von Neumann architecture suffers from performance and energy bottlenecks known as “memory wall” due to the unavoidable data migration. Recently, NVM (non-volatile memory) such as ReRAM (resistive random access memory), enables in-situ computing without data movement and its crossbar architecture can perform parallel GEMV (matrix-vector multiplication) operation naturally in one step. ReRAM-based architecture has demonstrated great success in many areas, e.g. accelerating machine learning and graph computing applications, etc. Parallel acceleration methods were proposed for reduction and scan primitives on ReRAM-based PIM(processing in memory) architecture, the computing process in terms of GEMV and the mapping method on the ReRAM crossbar were focused, and the co-design of software and hardware was realized to reduce power consumption and improve performance. Compared with GPU, the proposed reduction and scan algorithm achieved substantial speedup by two orders of magnitude, and the average acceleration ratio can also reach two orders of magnitude. The case of segmentation can achieve up to five (four on average) orders of magnitude. Meanwhile, the power consumption decreased by 79%.