Abstract:There exists a large gap between the data input speed and processing speed in large-size sliding-window applications. To shorten this gap, a parallel processing scheme was proposed, which achieved high data reusability and parallelism with memory resources as few as possible and memory access control logics as simple as possible. The scheme combined the advantages of parallelism among different sliding-windows and parallelism among different data in a single window. For different windows, they were divided into groups and mapped into multiple processing elements. For the data in a single window, multi-module memory structure was introduced to buffer them, where module assignment and addressing scheme was designed for conflict-free parallel access. Experimental results on FPGA show that this approach can improve the processing speed significantly without incurring too much memory resources and too complicated memory access control logics.