Abstract:Mapping matrix operations on SIMD processors brings a large amount of data rearrangement that lowers the system performance. In this study, a customized Multi-Grain Matrix Register File (MMRF), which supports multi-grained parallel row-wise and column-wise access, was proposed to eliminate these data rearrangement and increase the performance of matrix operations. The MMRF could be configured into different parallel access modes, in which one or several sub-matrices can be accessed in parallel. Experimental results show that, compared with the traditional Vector Register File (VRF) and the MRF, the MMRF can respectively achieve about 2.21x and 1.6x average performance improvement, where the area of MMRF increases by 14.3% and 3.7% respectively, and the power of MMRF increases by 14.6% and 2.2% respectively. Compared with TMS320C64x+, the SIMD processor of FT-Matrix can achieve about 5.65x to 7.71x performance improvement by employing the MMRF. By hierarchical customized design technology, the area and critical-path delay of MMRF can be reduced by 17.9% and 39.1% respectively.