面向图计算应用的处理器访存通路优化设计与实现
作者:
作者单位:

(1. 中国科学院计算技术研究所, 北京 100190;2. 中国科学院大学, 北京 100049;3. 鹏城实验室, 广东 深圳 518000)

作者简介:

张旭(1996—),男,河南濮阳人,博士研究生,E-mail:zhangxu19s@ict.ac.cn; 张科(通信作者),男,副研究员,博士,E-mail:zhangke@ict.ac.cn

通讯作者:

中图分类号:

TN95

基金项目:

国家重点研发计划资助项目(2017YFB1001602);国家自然科学基金资助项目(61702485);中国科学院青年创新促进会资助项目(2017143)


Design and implementation of a novel off-chip memory access path for graph computing
Author:
Affiliation:

(1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100049, China;3. Peng Cheng Laboratory, Shenzhen 518000, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对图计算应用的访存特点,提出并实现一种支持高并发、乱序和异步访存的高并发访存模块(High Concurrency and high Performance Fetcher, HCPF)。通过软-硬件协同的设计方法,HCPF可同时处理192条共8种类型的内存访问请求,且访存粒度可由用户定义,满足图计算应用对海量低延迟细粒度数据访问的需求。同时,HCPF扩展了基于内存语义的跨计算节点定制互连技术,支持远程内存的细粒度直接访问,为后续实现分布式图计算框架提供技术基础。结合上述两个核心研究内容,基于流水线RISC-V处理器核,设计并实现了可支持HCPF的RISC-V片上系统(System-on-Chip,SoC)架构,搭建基于FPGA的原型验证平台,并使用自研测试程序对HCPF进行初步性能评测。实验结果表明,HCPF相比原有访存通路,最高可将基于数组和随机地址的两种随机内存访问性能分别提升至3.5倍和2.7倍。远程内存直接访问4 Byte数据的延时仅为1.63 μs。

    Abstract:

    A novel asynchronous memory access path, which supports highly concurrent and out-of-order off-chip memory requests was proposed. In order to satisfy the requirements of graph applications, a software-defined interface in our proposed memory access path to handle hundreds of kinds of off-chip memory requests with arbitrary granularity via hardware-software co-design methodology was implemented. A custom memory semantic interconnect was designed for fine-grained remote memory access among various computing nodes leveraged in future distributed graph processing scenarios. Last but not least, we integrate our proposed novel memory access path into a RISC-V instruction set architecture-based SoC(system-on-chip) architecture and implement an FPGA prototype. Based on our custom random access microbenchmarks, preliminary evaluation results show that performance of array-based and random address-based off-chip memory access is improved by 3.5x and 2.7x respectively using our proposed asynchronous memory access path, and accessing 4 bytes data from remote memory only takes 1.63 μs.

    参考文献
    相似文献
    引证文献
引用本文

张旭,常轶松,张科,等.面向图计算应用的处理器访存通路优化设计与实现[J].国防科技大学学报,2020,42(2):13-22.
ZHANG Xu, CHANG Yisong, ZHANG Ke, et al. Design and implementation of a novel off-chip memory access path for graph computing[J]. Journal of National University of Defense Technology,2020,42(2):13-22.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-09-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-04-29
  • 出版日期: 2020-04-28
文章二维码