面向图计算应用的处理器访存通路优化设计与实现

doi:10.11887/j.cn.202002002

首页 > 过刊浏览>2020年第42卷第2期 >13-22. DOI:10.11887/j.cn.202002002

面向图计算应用的处理器访存通路优化设计与实现
DOI:
                        10.11887/j.cn.202002002
                    
作者:
                        
                        
                    
作者单位:(1. 中国科学院计算技术研究所, 北京 100190;2. 中国科学院大学, 北京 100049;3. 鹏城实验室, 广东 深圳 518000)
作者简介:张旭(1996—),男,河南濮阳人,博士研究生,E-mail:zhangxu19s@ict.ac.cn； 张科(通信作者),男,副研究员,博士,E-mail:zhangke@ict.ac.cn
通讯作者:
中图分类号:TN95
基金项目:国家重点研发计划资助项目(2017YFB1001602)；国家自然科学基金资助项目(61702485)；中国科学院青年创新促进会资助项目(2017143)

Design and implementation of a novel off-chip memory access path for graph computing

Author:

Affiliation:

(1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100049, China;3. Peng Cheng Laboratory, Shenzhen 518000, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

针对图计算应用的访存特点,提出并实现一种支持高并发、乱序和异步访存的高并发访存模块(High Concurrency and high Performance Fetcher, HCPF)。通过软-硬件协同的设计方法,HCPF可同时处理192条共8种类型的内存访问请求,且访存粒度可由用户定义,满足图计算应用对海量低延迟细粒度数据访问的需求。同时,HCPF扩展了基于内存语义的跨计算节点定制互连技术,支持远程内存的细粒度直接访问,为后续实现分布式图计算框架提供技术基础。结合上述两个核心研究内容,基于流水线RISC-V处理器核,设计并实现了可支持HCPF的RISC-V片上系统(System-on-Chip,SoC)架构,搭建基于FPGA的原型验证平台,并使用自研测试程序对HCPF进行初步性能评测。实验结果表明,HCPF相比原有访存通路,最高可将基于数组和随机地址的两种随机内存访问性能分别提升至3.5倍和2.7倍。远程内存直接访问4 Byte数据的延时仅为1.63 μs。

Abstract:

A novel asynchronous memory access path, which supports highly concurrent and out-of-order off-chip memory requests was proposed. In order to satisfy the requirements of graph applications, a software-defined interface in our proposed memory access path to handle hundreds of kinds of off-chip memory requests with arbitrary granularity via hardware-software co-design methodology was implemented. A custom memory semantic interconnect was designed for fine-grained remote memory access among various computing nodes leveraged in future distributed graph processing scenarios. Last but not least, we integrate our proposed novel memory access path into a RISC-V instruction set architecture-based SoC(system-on-chip) architecture and implement an FPGA prototype. Based on our custom random access microbenchmarks, preliminary evaluation results show that performance of array-based and random address-based off-chip memory access is improved by 3.5x and 2.7x respectively using our proposed asynchronous memory access path, and accessing 4 bytes data from remote memory only takes 1.63 μs.

参考文献

相似文献

引证文献

引用本文

张旭,常轶松,张科,等.面向图计算应用的处理器访存通路优化设计与实现[J].国防科技大学学报,2020,42(2):13-22.
ZHANG Xu, CHANG Yisong, ZHANG Ke, et al. Design and implementation of a novel off-chip memory access path for graph computing[J]. Journal of National University of Defense Technology,2020,42(2):13-22.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-09-15
最后修改日期:
录用日期:
在线发布日期: 2020-04-29
出版日期: 2020-04-28

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码