沙普利值分解的动态兵力部署策略规划方法

doi:10.11887/j.issn.1001-2486.2409002

首页 > 过刊浏览>2025年第47卷第4期 >123-131. DOI:10.11887/j.issn.1001-2486.2409002

沙普利值分解的动态兵力部署策略规划方法
DOI:
                        10.11887/j.issn.1001-2486.2409002
                    
作者:
                        
                        
                    
作者单位:国防科技大学 智能科学学院, 湖南 长沙 410073
作者简介:罗俊仁（1989—），男，湖北大冶人，博士研究生，E-mail：luojunren17@nudt.edu.cn
通讯作者:
中图分类号:TP183
基金项目:国家自然科学基金资助项目（61806212）；湖南省研究生科研创新基金资助项目（CX20210011）

Shapley value decomposition method in dynamic force deployment strategy planning

Author:

Affiliation:

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073 , China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

针对动态兵力部署问题，提出了一种基于沙普利值分解多智能体强化学习的策略规划方法。借助沙普利值分解来解释协作多智能体之间的奖励分配，利用基于沙普利分解强化学习方法求解马尔可夫凸博弈策略；针对海空跨域协同对抗场景，分析异构多实体协同对抗中空间域作战资源的分配，构建动态兵力部署策略规划模型，设计问题的状态空间、动作空间和奖励函数。围绕典型应用场景，利用兵棋推演系统对动态兵力部署问题组织了仿真实验验证，结果表明，与多类基线算法相比，所提方法在动态兵力部署策略规划方面性能优异，同时理论上具备可解释性，学到了“层层拦截、分区对抗，掩护核心、分层破击”长时域动态兵力部署策略。

Abstract:

Aiming at the dynamic force deployment problem, a multi-agent reinforcement learning strategy planning method based on SVD (Shapley value decomposition)was proposed. The reward distribution among cooperative multi-agents was explained by SVD, and the reward distribution was analysed by SVD reinforcement learning method to solve Markov convex game strategy. Secondly, based on the scenario of naval and air cross-domain cooperative confrontation, the allocation of space domain combat resources in heterogeneous multi-entity cooperative confrontation was analysed, a dynamic force deployment strategy planning model was built, and the state space, action space and reward function of the problem were designed. Finally, based on typical application scenarios, simulation experiments were organized to verify the dynamic force deployment problem with the military chess deduction system. Results show that compared with the multi-class baseline algorithm, the proposed method has excellent performance in strategic planning of dynamic force deployment, and it is theoretically interpretable. The proposed method learns the strategy of "layer upon layer interception, zone confrontation, core cover, and hierarchical breaking".

参考文献

相似文献

引证文献

引用本文

罗俊仁, 张万鹏, 苏炯铭, 等. 沙普利值分解的动态兵力部署策略规划方法[J]. 国防科技大学学报, 2025, 47(4): 123-131.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-09-14
最后修改日期:
录用日期:
在线发布日期: 2025-07-23
出版日期:

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码