<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005">
<channel xmlns:cfi="http://www.microsoft.com/schemas/rss/core/2005/internal" cfi:lastdownloaderror="None">
<title cf:type="text"><![CDATA[Editorial department of the Journal of National University of Defense Technology -->专题：高性能计算]]></title>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[A survey of the techniques of volume rendering for large-scale scientific data]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002001]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[Volume rendering is an effective method to visualize complex physical features in large-scale scientific data with high expressiveness, the difficulties in processing huge amount of data and capturing complex features, however, are still a great challenge to volume rendering. To address the challenges and improve efficiency and effect of volume rendering, researchers conducted in-depth research on volume rendering algorithms from three aspects. On the one hand, it is an effective way to improve the efficiency of volume rendering by sharing computation with lots of processor cores so as to reduce the computational amount of one processor core. On the other hand, by fully exploring the intrinsic characteristics of three-dimensional data fields, data reduction methods can greatly decrease the amount of data in the rendering process and thus reduce the overhead of a volume rendering algorithm. In addition, feature analysis and enhancement techniques can also be integrated into volume rendering algorithms, thus complex physical features are highlighted from the data fields and high-quality rendering of scientific data is achieved. A survey of recent advances on volume rendering techniques was presented and various research methods were analyzed. In the end, this paper makes prospection for future research directions on volume rendering of large-scale scientific data, including application-driven feature volume rendering, feature-based data reduction in volume rendering, hardware-adapted multi-level acceleration of volume rendering and in-situ intelligent volume rendering.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[WANG Huawei, HE Liu, CAO Yi, XIAO Li]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>WANG Huawei, HE Liu, CAO Yi, XIAO Li</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002001]]></guid><cfi:id>8</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Design and implementation of a novel off-chip memory access path for graph computing]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002002]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[A novel asynchronous memory access path, which supports highly concurrent and out-of-order off-chip memory requests was proposed. In order to satisfy the requirements of graph applications, a software-defined interface in our proposed memory access path to handle hundreds of kinds of off-chip memory requests with arbitrary granularity via hardware-software co-design methodology was implemented. A custom memory semantic interconnect was designed for fine-grained remote memory access among various computing nodes leveraged in future distributed graph processing scenarios. Last but not least, we integrate our proposed novel memory access path into a RISC-V instruction set architecture-based SoC(system-on-chip) architecture and implement an FPGA prototype. Based on our custom random access microbenchmarks, preliminary evaluation results show that performance of array-based and random address-based off-chip memory access is improved by 3.5x and 2.7x respectively using our proposed asynchronous memory access path, and accessing 4 bytes data from remote memory only takes 1.63 μs.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[ZHANG Xu, CHANG Yisong, ZHANG Ke, CHEN Mingyu]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>ZHANG Xu, CHANG Yisong, ZHANG Ke, CHEN Mingyu</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002002]]></guid><cfi:id>7</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[User-level parallel I/O configuration optimize strategy toward large-scale cluster]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002003]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[Three key factors exert big influence upon the application′s I/O performance, including the I/O programming interface, the performance characteristic of I/O sub-system( both architecture and system software), and the I/O configuration parameters at user-level. From the user′s perspective, this paper discussed the user-level parallel I/O configuration optimize space toward large scale cluster. Besides, we proposed a method of testing and analyzing the I/O characteristic of large scale cluster. Based on this method, the I/O performance portrait of a domestic super computer was built up and several user-level parallel I/O optimize suggestions were put forward. With these carefully selected I/O configuration parameters, the time of restart data write operation was cut down by 15 percent under 8192 processes in a real application environment, while the program′s initial time is shortened from 10 minutes to 5 seconds at the scale of 4096 processes.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[TIAN Hongyun, WU Linping, DONG Yong, JING Cuiping, LUO Hongbing, MO Zeyao]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>TIAN Hongyun, WU Linping, DONG Yong, JING Cuiping, LUO Hongbing, MO Zeyao</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002003]]></guid><cfi:id>6</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Inner-out subdomain dividing heterogeneous parallel algorithm for high order CFD solver]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002004]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[An Offload-mode heterogeneous parallel algorithm via inner-out subdomain dividing was proposed for CFD(computational fluid dynamics) program CNS. Combined with the characteristics of finite difference computing and fourth order Runge-Kutta method in structure mesh, the scheme of ghost region was introduced, based on which a Ghost-Region-Shrinking computing scheme was designed, significantly reducing the overhead of data movement between heterogeneous computing resources, making the computing and MPI communication on CPU absolutely overlap with the accelerator computing under load balance condition, bringing better heterogeneous synergetic parallelism. Parameter of the ghost region for the computing validity was given and load balance tuning was demonstrated. On a server with CPU (Intel Haswell Xeon E5-2670 12 cores×2)+MIC (Xeon Phi 7120A ×2), an averaged performance improvement of 5.9× was gained over the algorithm of using accelerator with task blocks integrally. Compared with MPI/OpenMP two-level parallel algorithm running on 24 Intel Haswell CPU cores, the proposed method achieved an accelerating of 1.27× with one MIC and 1.45× with two MICs. Finally the bottleneck and disadvantage were discussed.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[WANG Wei, XU Chuanfu, CHE Yonggang]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>WANG Wei, XU Chuanfu, CHE Yonggang</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002004]]></guid><cfi:id>5</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Design and implementation of pipelined floating-point reciprocal approximation operation unit]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002005]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[In some low precision applications, pipelined floating-point reciprocal operation is required actually. Based on SRT-4 algorithm, a pipelined floating-point reciprocal operation unit was designed and implemented, which is constructed as a 6-stage pipeline unit, resulting in an 8-bit valid fractions. In order to support hardware process of denormal numbers, the unit was improved to get higher performance, which is constructed as a 8-stage pipeline unit, adding source operand pre-normalization and result post-normalization function components and supporting hardware process of denormal numbers. After logic synthesis, the area of the unit was increased by 19.23%, which is reasonable. The timing of the unit was not affected obviously and met the expected frequency goal of 1.6 GHz.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[HE Jun, WANG Li]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>HE Jun, WANG Li</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002005]]></guid><cfi:id>4</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Diagnostic methods for communication waiting in MPI parallel programs and applications]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002006]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[As the increasing of the scale of parallel systems, some problems such as large measurement cost and memory overhead exist in the diagnostic methods of communication waiting phenomenon. With the deep analysis on the existing diagnostic methods, and considering the actual demand of controllable measurement, a diagnosis model for communication waiting based on hotspot function was established, and a tidy and practical diagnostic method based on the above model was presented. The above diagnostic method was applied to the diagnostic process of the communication waiting phenomenon in the large-scale MPI parallel programs, such as the LARED integration, the LARED-S, the LAP3D. The application results show that this method can accurately identify the key code segment leading to communication waiting and the proposed optimization solution and performance improvement space has reference value for the subsequent program improvement. The optimized LARED-S program, according to the diagnostic result, can increase performance by 32% and reduce communication waiting time by 44%.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[WU Linping, JING Cuiping, LIU Xu, TIAN Hongyun]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>WU Linping, JING Cuiping, LIU Xu, TIAN Hongyun</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002006]]></guid><cfi:id>3</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Pattern mining of gale warning for high-speed railway]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002007]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[The traditional method of alarming high-speed rail traffic in gale is based on an instantaneous threshold. Although it covers all alarm events, there are a lot of unnecessary alarms, which affect the efficiency of high-speed rail traffic. An early warning method based on sequence pattern was proposed. It aimed at mining frequent patterns in the preorder data and finding out the changing rules of alarm events. The unique sequence characteristics of early warning sequences were obtained by filtering out the public frequent patterns of non-early warning sequences, and a database of early warning patterns was constructed. Through the verification of monitoring data along Lanzhou-Urumchi high-speed railway, the method can improve the accuracy of prediction, and reduce the rate of missing reports concurrently. It reduces the time required for pattern matching effectively, and reserves sufficient time windows for early warning, which can accord more with the practical application requirements.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[TENG Fei, LIU Jianzhu, ZHU Jinye, GOU Hongye]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>TENG Fei, LIU Jianzhu, ZHU Jinye, GOU Hongye</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002007]]></guid><cfi:id>2</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Software and hardware co-design of data placement in heterogeneous hybrid store]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002008]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[The analysis of the existing typical hybrid store architectures in big data center found that they cannot fully take advantage of both hybrid storage management systems and hybrid storage devices. Thus a hardware and software co-design data placement strategy was proposed, which simultaneously considers the hybrid storage management systems at software level and the hybrid storage devices at hardware level, and figures out the trail of data placement on both storage systems and devices regarding application characteristics. Moreover, the static placement pattern before running and the dynamic placement pattern during running were proposed on the basis of different application scenes. An experiment was implemented by running three kinds of workloads on simulated data placement strategies based on the model according to the performance parameters of storage management systems and storage devices. The results show that the proposed design outperforms traditional ones that either consider storage management systems or storage devices separately by up to 30%.]]></description>
<pubDate>2020/4/29 0:00:00</pubDate>
<category><![CDATA[专题：高性能计算]]></category>
<author><![CDATA[LI Hongfei, DU Yimo, ZENG Yi, WANG Lei]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>LI Hongfei, DU Yimo, ZENG Yi, WANG Lei</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/202002008]]></guid><cfi:id>1</cfi:id><cfi:read>true</cfi:read></item>
</channel>
</rss>