强化学习在多阶段装备组合规划问题中的应用

doi:10.11887/j.cn.202105015

首页 > 过刊浏览>2021年第43卷第5期 >127-136. DOI:10.11887/j.cn.202105015

强化学习在多阶段装备组合规划问题中的应用
DOI:
                        10.11887/j.cn.202105015
                    
作者:
                        
                        
                    
作者单位:(1. 国防科技大学 第六十三研究所, 江苏 南京 210007;2. 浙江财经大学 经济学院, 浙江 杭州 310018;3. 国防科技大学 系统工程学院, 湖南 长沙 410073;4. 西南电子电信技术研究所, 四川 成都 610041)
作者简介:张骁雄(1990—),男,江苏淮安人,高级工程师,博士,硕士生导师,E-mail:zxxandxx@163.com
通讯作者:
中图分类号:O22; N94
基金项目:国家自然科学基金资助项目(71901215,71901191)；国防科技大学校科研计划资助项目(ZK20-46)

Application of reinforcement learning in multi-period weapon portfolio planning problems

Author:

Affiliation:

(1. The Sixty-third Research Institute, National University of Defense Technology, Nanjing 210007, China;2. School of Economics, Zhejiang University of Finance & Economics, Hangzhou 310018, China;3. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China;4. Southwest Electronics and Telecommunication Technology Research Institute, Chengdu 610041, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对多阶段武器装备组合规划中的选择难、规划难问题,提出基于多目标优化算法以及强化学习技术的混合优化方法。在各个阶段以装备组合效能最大和成本最小为准则,构建单阶段多目标优化模型,并设计基于非支配排序遗传算法的求解算法以生成各阶段的Pareto解,在此基础上建立多阶段的组合优化模型。通过强化学习的Q-Learning方法,在各阶段的Pareto解中采用探索或者利用两种模式,生成各阶段的装备组合,并指导下一阶段的装备选型,从而生成整个周期内的规划方案。通过对比实验分析,验证了所提模型和算法的有效性,能够为多阶段武器装备组合规划提供辅助决策。

Abstract:

Aiming at the difficulties in the choosing and planning in multi-period weapon systems development problems, an optimization simulation approach combining multi-objective optimization algorithm and reinforcement learning technique was proposed. A multi-objective optimization model was built to maximize the capability and minimize the cost of weapon portfolios in each period. Moreover, a solving algorithm based on the non-dominated sorting genetic algorithm-Ⅲ was presented to obtain the Pareto set in each period, based on which an optimization model for multi-period problem was built. The Q-Learning method, one of the reinforcement learning algorithms, searches within the Pareto set using two different ways for the selection of weapon portfolios in each period, whose outcome is used for the selection in the next period and the optimization of the portfolios over the entire planning horizon. An illustrative example was studied to demonstrate the effectiveness of the proposed model and hybrid algorithm, which can support the decision making on the weapons development and planning.

参考文献

相似文献

引证文献

引用本文

张骁雄,丁松,李明浩,等.强化学习在多阶段装备组合规划问题中的应用[J].国防科技大学学报,2021,43(5):127-136.
ZHANG Xiaoxiong, DING Song, LI Minghao, et al. Application of reinforcement learning in multi-period weapon portfolio planning problems[J]. Journal of National University of Defense Technology,2021,43(5):127-136.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-01-18
最后修改日期:
录用日期:
在线发布日期: 2021-09-29
出版日期: 2021-10-28

首页

期刊介绍

投稿指南

编委会

期刊订阅

联系我们

Email订阅

Rss

English

引用本文

分享

文章指标

历史

文章二维码