面向多核数字信号处理器的高效OpenCL异构并行编程系统
作者:
作者单位:

国防科技大学 计算机学院

作者简介:

通讯作者:

中图分类号:

TP314

基金项目:

国家重点研发计划(2023YFB3001503)


Efficient OpenCL Programming System for Multi-Core DSPs
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着多核数字信号处理器在高性能计算和人工智能等领域的广泛应用,如何在其异构架构上实现高效且可移植的并行编程成为重要挑战。面向国产异构多核DSP平台(FT-M7032),设计并实现了高效的OpenCL异构并行编程系统MOCL4。该系统通过运行时与编译器协同优化,将OpenCL的SPMD执行模型高效映射到DSP的SIMD向量单元,并支持基于DMA的片上数据搬运与存储层次管理。实验结果表明,MOCL4在保证OpenCL语义正确性的同时,显著提升了内核函数的执行性能,在PolyBench测试集上平均加速比达到10.12倍;在典型密集计算任务(如GEMM)中性能接近手工优化水平。MOCL4为多核DSP提供了一种兼具高性能与可编程性的并行编程解决方案。

    Abstract:

    With the widespread application of multi-core digital signal processors in high-performance computing and artificial intelligence, achieving efficient and portable parallel programming on these heterogeneous architectures has become a significant challenge. An efficient OpenCL-based heterogeneous parallel programming system, MOCL4, is designed and implemented for the domestically developed heterogeneous multi-core DSP platform (FT-M7032). MOCL4 collaborates runtime and compiler optimizations to efficiently map OpenCL"s SPMD execution model onto the DSP"s SIMD vector units, while supporting efficient DMA-based data transfers across memory hierarchies. Experimental results show that MOCL4, while ensuring correctness of OpenCL semantics, significantly improves kernel execution performance. The average speedup on the PolyBench benchmark suite is 10.12x, and its performance on typical compute-intensive tasks (e.g., GEMM) is close to that of manually optimized code. MOCL4 provides a parallel programming solution for multi-core DSPs that balances high performance with programmability.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-01-22
  • 最后修改日期:2026-03-31
  • 录用日期:2026-04-01
  • 在线发布日期:
  • 出版日期:
文章二维码