引用本文: | 雷霆,朱承,张维明.基于马尔科夫决策的目标选择策略.[J].国防科技大学学报,2014,36(2):161-167.[点击复制] |
LEI Ting,ZHU Cheng,ZHANG Weiming.Research on the method of target selecting policy based on the Markov decision process[J].Journal of National University of Defense Technology,2014,36(2):161-167[点击复制] |
|
|
|
本文已被:浏览 12646次 下载 8846次 |
基于马尔科夫决策的目标选择策略 |
雷霆1,2, 朱承1, 张维明1 |
(1.国防科技大学 信息系统工程重点实验室, 湖南 长沙 410073;2.军事科学院 运筹所, 北京 100091)
|
摘要: |
目标选择是军事计划的关键要素之一。基于马尔科夫决策方法,解决具有复杂目标间关联的多阶段目标选择问题。使用与或树描述目标体系各层状态间的影响关联,并以目标体系整体失效为求解目的,建立了基于离散时间MDP的多阶段打击目标选择模型。在LRTDP算法基础上提出一种启发式方法,通过判断从当前目标体系状态到达体系失效状态的演化过程中的可能资源消耗和失败概率,来提供对当前状态的评估值,该方法能有效排除问题搜索空间中不能到达体系失效目的的中间状态,压缩了由于目标间复杂关联而增长的巨大状态空间。用实验验证了该方法有效性,实验结果表明,该方法直观实用,对目标间具有复杂关联关系的目标打击决策有一定参考价值。 |
关键词: 目标选择 目标体系 与或树 离散时间马尔科夫决策过程 |
DOI:10.11887/j.cn.201402027 |
投稿日期:2013-07-16 |
基金项目:国家自然科学基金资助项目(61273322,71001105, 91024006) |
|
Research on the method of target selecting policy based on the Markov decision process |
LEI Ting1,2, ZHU Cheng1, ZHANG Weiming1 |
(1.Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China;2.
2.Institute of Military Operation Research, Academy of the Military Science, Beijing 100091, China)
|
Abstract: |
Target selecting is an important aspect of military operational planning. The Markov Decision Process(MDP) method was used to solve the multi-phase target selecting problem which has complex relations among targets. Firstly, the and-or tree was used to describe the relations among the layers of the target system of system(TSoS), and a Discrete Time Markov Decision Process(DTMDP) method was proposed for modeling target selecting whose objective was to neutralize the TSoS. Secondly, an LRTDP algorithm based heuristic was proposed to give the estimate value of the current state of the TSoS, which was calculated by considering the potential resource consumption and failure probability of the evolution process from the current state to the lapse state of the TSoS, and the heuristic can effectively exclude the intermediate states which cannot be transferred to the lapse state, in order to reduce the huge search space of the model because of the complex relations among targets. Finally, a case was proposed to validate the method. The results show that the method is intuitive and practical, and can facilitate the target selecting decision making when there are complex relations among the targets. |
Keywords: target selecting target system of system and-or tree discrete time Markov decision process. |
|
|
|
|
|