引用本文: | 刘云生,张童,张传富,等.基于网格的分布式仿真系统容错机制.[J].国防科技大学学报,2005,27(1):35-38.[点击复制] |
LIU Yunsheng,ZHANG Tong,ZHANG Chuanfu,et al.The Fault-tolerance Mechanism in Grid-based Distributed Simulation System[J].Journal of National University of Defense Technology,2005,27(1):35-38[点击复制] |
|
|
|
本文已被:浏览 7274次 下载 5677次 |
基于网格的分布式仿真系统容错机制 |
刘云生, 张童, 张传富, 查亚兵 |
(国防科技大学 机电工程与自动化学院, 湖南 长沙 410073)
|
摘要: |
针对分布式仿真的需求,在网格的基础上构建了通用的分布式仿真容错系统。该系统由三部分组成:仿真资源状态监控模块、数据保存模块及错误恢复模块。其中仿真资源状态监控基于网格的MDS实现;数据保存(包括进程空间、进程间交互关系的保存)及错误恢复基于检查点机制在用户空间实现。就所增加的容错机制跟仿真系统原有功能模块的关系进行了分析。最后,基于网格及上述容错模块设计并实现了一个C/S模式的容错代理,用来实现仿真系统的自动容错。 |
关键词: HLA 容错 网格 |
DOI: |
投稿日期:2004-09-06 |
基金项目:国家部委基金资助项目(51404010403KG0155) |
|
The Fault-tolerance Mechanism in Grid-based Distributed Simulation System |
LIU Yunsheng, ZHANG Tong, ZHANG Chuanfu, ZHA Yabing |
(College of Mechatronics Engineering and Automation, National Univ. of Defense Technology, Changsha 410073, China)
|
Abstract: |
Aiming at the demand of the distributed simulation system, this paper has built a common grid-based fault tolerance system. The system consists of three parts: simulation resource monitoring module, data saving module, and error recovery module. The implementation of monitoring module is built on top of grid's MDS, while data saving module, including the saving of the process space and the iterative relationship between processes, and fault recovery are realized based on checkpoint mechanism in the user space. In addition, we analyze the relationship between these three modules and the existing function modules in simulation system. In the end, we design and implement a fault tolerance broker in Client/Sever mode to automate the fault tolerance. |
Keywords: HLA fault-tolerance grid |
|
|
|
|
|