<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005">
<channel xmlns:cfi="http://www.microsoft.com/schemas/rss/core/2005/internal" cfi:lastdownloaderror="None">
<title cf:type="text"><![CDATA[Editorial department of the Journal of National University of Defense Technology -->Computer System and technology]]></title>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Research progress on new computing-controlled network architecture and key technologies]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250601]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[For the new network communication challenges of efficient data interaction between components in open interactive environments, a novel C2N (computing and control network) was proposed. Aiming at the extreme requirements for efficiency, real-time performance, flexibility, and security, C2N adopts intelligent and simplified designs in protocol architecture, planning, application, and security design, providing high-performance and highly flexible basic network support for strong real-time collaborative fusion among heterogeneous resources. Based on a detailed investigation of relevant research work, key technologies of C2N were discussed, such as data link layer enhancement, remote direct memory access for sensor-controllers, and service-oriented sensing and control middleware. It also introduced the key technology research and test evaluation carried out by the network chip and system team of the National University of Defense Technology, and prospected future challenges and research directions to help China gain leading advantages in high-end equipment systems and innovative ecosystems.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[YANG Hui, DONG Dezun, XUN Peng, LIU Rulin, LI Junnan, TANG Zhu, LYU Gaofeng, QUAN Wei, ZHONG Jincheng, LI Tao]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>YANG Hui, DONG Dezun, XUN Peng, LIU Rulin, LI Junnan, TANG Zhu, LYU Gaofeng, QUAN Wei, ZHONG Jincheng, LI Tao</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250601]]></guid><cfi:id>7</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Order-preserving triggering mechanism and data buffering method for collective communication hardware offloading]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250602]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[To further optimize the hardware offloading of collective communication based on the network interface card in the "Tianhe" network, and to support more types of collective communication algorithms and larger message sizes, the order-preserving triggering mechanism and data buffering method for collective communication hardware offloading was investigated. An order-preserving triggering mechanism for concurrent multitasking was proposed, which meets the desired semantics of collective communication and ensures the reproducibility of floating-point computation results. A dynamic network data buffering method based on Hash tables and pulsed credit flow control was proposed to alleviate the contradiction between limited hardware buffering resources and the high demand for buffering a large amount of network data from concurrent multitasking. Experimental results show that compared with software-based collective communication operations, this method can support the hardware offloading of various algorithms for several typical collective communication operations, with significant performance improvement. Meanwhile, the hardware implementation cost is low, especially with high utilization of buffering resources.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[XU Jinbo, DONG Dezun, LI Baofeng, ZHANG Wei, XING Jianying, ZHANG Peng]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>XU Jinbo, DONG Dezun, LI Baofeng, ZHANG Wei, XING Jianying, ZHANG Peng</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250602]]></guid><cfi:id>6</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[A model for base station network traffic prediction using an enhanced random ensemble-based mixed kernel <i>K</i> nearest neighbor algorithm]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250603]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[An ER-MKKNN (enhanced random mixed kernel <i>K</i> nearest neighbors algorithm) was developed to meet the requirements of base station network traffic prediction in ultra-dense 5G/6G environments. A hybrid kernel function was formed by combining a radial basis function kernel with a white-noise kernel, thereby overcoming the trade-off between nonlinear relationship modeling and noise suppression that plagues single-kernel methods. Dual random subsampling of both samples and features, together with a randomized hyperparameter-interval strategy, was employed to bolster generalization stability in high-dimensional, sparse settings. A dynamic weight-allocation mechanism based on inversion of out-of-bag errors was introduced to improve robustness against abrupt traffic fluctuations. Finally, a multi-level parallel architecture was implemented to deliver a scalable prediction framework for ultra-dense network topologies. Experimental evaluations show that ER-MKKNN outperformed deep-learning models in root mean square error, mean absolute percentage error and mean absolute error, respectively, establishing a new technical pathway for intelligent network operations and maintenance.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[SUN Ning, LI Zhuoxuan, SHI Xinli, SUN Peichong, XU Mingjie, CAO Jinde]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>SUN Ning, LI Zhuoxuan, SHI Xinli, SUN Peichong, XU Mingjie, CAO Jinde</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250603]]></guid><cfi:id>5</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Probability tunable random number generator for random simulation of accelerated particle transport]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250604]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[Particle transport simulations using stochastic methods face significant challenges on conventional von Neumann architectures, particularly due to random branching events and irregular memory access patterns. These limitations stem from the fundamental mismatch between probabilistic algorithms and deterministic computing paradigms. To bridge the gap between architecture and algorithms, a probabilistically tunable true random number generator was developed based on spintronic and ferroelectric devices. The physical randomness of spintronic devices was leveraged to provide a physical random source for the architecture, and the throughput of random bits was enhanced through optimized control logic and writing mechanisms. Next, programmable synapses were designed based on the memristive properties of ferroelectric devices, enabling non-volatile continuous weight storage with tunable probabilities. The experimental results indicate that the proposed approach achieves performance improvements ranging from 171 to 1 028 times compared to a general-purpose CPU when solving a sample transport problem. Furthermore, compared to existing spin-transfer torque magnetic tunnel junction based true random number generators, the developed method not only enables tunable probability random sampling but also achieves a throughput of 303 Mbit/s when generating uniformly distributed random sequences.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[FU Siqing, LI Tiejun, WU Lizhou, ZHANG Chunyuan, MA Sheng, ZHANG Jianmin, REN Ruixuan]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>FU Siqing, LI Tiejun, WU Lizhou, ZHANG Chunyuan, MA Sheng, ZHANG Jianmin, REN Ruixuan</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250604]]></guid><cfi:id>4</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Segment routing optimization algorithm fusing deep reinforcement learning and load centrality theory]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250605]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[Combining software defined networking and SR (segment routing) can optimize network performance, but in large-scale dynamic networks, excessive link utilization at key nodes can lead to a surge in queue delays. To address this, a SROD-LC (segment routing optimization algorithm based on deep reinforcement learning and load centrality theory) was proposed. By quantifying the importance of network nodes using load centrality theory, key nodes are identified and their link load states are monitored; utilizing a multi-agent reinforcement learning framework, distributed deep reinforcement learning  agents are deployed at key nodes, coordinating routing decisions through a shared reward mechanism to achieve proactive optimization of link loads. At the same time, leveraging the flexibility of SR, segment identifier lists are dynamically adjusted to quickly reroute partial traffic, reducing local link utilization and avoiding potential congestion. Simulation experiments based on real network topologies show that when the proportion of SR key nodes is in the range of 0.3~0.5, the SROD-LC algorithm exhibits significant optimization effects, reducing the networks maximum link utilization by 21%~35% compared to baseline algorithms.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[CAO Jijun, WU Zongming, TANG Qiang, LI Xiaoyu]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>CAO Jijun, WU Zongming, TANG Qiang, LI Xiaoyu</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250605]]></guid><cfi:id>3</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Operator-aware tensor offloading approach for large language model inference in resource-constrained scenarios]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250606]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[Efficient inference deployment of large language models faces severe challenges in resource-constrained scenarios. Although current mainstream inference optimization techniques have improved model inference efficiency to some extent, they still suffer from issues like coarse-grained deployment and poor inference accuracy.Based on the discovery that different operators exhibit varying degrees of GPU affinity, an OATO (operator-aware tensor offloading) approach was proposed. OATO could extract operators′semantic knowledge and used it to design an intelligent scheduling algorithm, which further yielded a globally optimal model-deployment plan. Meanwhile, the OATO approach was integrated into the latest large model inference framework Llama.cpp to implement an operator-aware tensor offloading enhanced inference engine, referred to as OALlama.cpp. Experimental results show that compared with the state-of-the-art inference engines Llama.cpp and FlexGen, OALlama.cpp achieves the best inference performance on three large models. Notably, in the scenario where 75% of the LlaMA3-8B model weights are loaded on the GPU, the first-token generation speed of OALlama.cpp is nearly doubled compared with FlexGen and Llama.cpp.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[ZHANG Jianfeng, XIE Dong, JIAN Songlei, LI Bao, WANG Xiaochuan, GUO Yong, YU Jie]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>ZHANG Jianfeng, XIE Dong, JIAN Songlei, LI Bao, WANG Xiaochuan, GUO Yong, YU Jie</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250606]]></guid><cfi:id>2</cfi:id><cfi:read>true</cfi:read></item>
<item>
<title xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="text"><![CDATA[Memory optimization method for control flow computation graph]]></title>
<link><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250607]]></link>
<description xmlns:cf="http://www.microsoft.com/schemas/rss/core/2005" cf:type="html"><![CDATA[AI chips face on-chip memory limits in deep learning. Current optimization methods focus on static computation graphs, leaving room to improve memory efficiency for dynamic graphs. To overcome this limitation, a memory optimization framework for control-flow computation graphs was developed. The framework realized operator-level memory reuse within subgraphs and further achieved recursive reuse across subgraphs by exploiting control-flow characteristics. In addition, a ping-pong buffering strategy for weight data was introduced to mitigate the memory wall between on-chip and off-chip memory, thereby allowing overlapping of memory access and computation operations within subgraphs. Validation on the domestic LUNA AI chip has demonstrated that the proposed framework improves on-chip memory utilization by 5.9% compared with existing methods. Moreover, the strategy effectively alleviates the memory wall problem by reducing data transfer time between on-chip and off-chip memory, resulting in execution efficiency improvements of up to 29%.]]></description>
<pubDate>2025/12/2 0:00:00</pubDate>
<category><![CDATA[Computer System and technology]]></category>
<author><![CDATA[WANG Xiangqian, SHEN Yuhao, JING Kun, LYU Yafei]]></author>
<atom:author xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>WANG Xiangqian, SHEN Yuhao, JING Kun, LYU Yafei</atom:name>
</atom:author>
<guid><![CDATA[http://journal.nudt.edu.cn/gfkjdxxben/article/abstract/20250607]]></guid><cfi:id>1</cfi:id><cfi:read>true</cfi:read></item>
</channel>
</rss>