面向动态兵力部署策略规划的沙普利值分解强化学习方法
DOI:
作者:
作者单位:

国防科技大学 智能科学学院

作者简介:

通讯作者:

中图分类号:

TP183

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Cooperative game via Shapley value decomposition reinforcement learning for dynamic force deployment strategy planning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在复杂的强对抗环境中,实体感知信息不完整、实时响应要求高,对长时域、前瞻性动态兵力部署决策提出了挑战。如何通过可解释的有效奖励激励,实现策略的高效探索,是利用学习类方法驱动动态兵力部署策略规划的关键。本文针对动态兵力部署问题,首先提出了一种基于沙普利值分解(Shapley Value Decomposition, SVD)多智能体强化学习的策略规划方法,借助沙普利值分解来解释协作多智能体之间的奖励分配,利用基于沙普利分解强化学习方法求解马尔可夫凸博弈策略;其次,围绕海空跨域协同对抗场景,分析异构多实体协同对抗中空间域作战资源的分配,构建动态兵力部署策略规划模型,设计问题的状态空间、动作空间和奖励函数。最后,围绕典型应用场景,利用兵棋推演系统对动态兵力部署问题组织了仿真实验验证,结果表明本文所提方法与多类基线算法相比在动态兵力部署策略规划方面性能优异,同时理论上具备可解释性,学到了“层层拦截、分区对抗,掩护核心、分层破击”长时域动态兵力部署策略。该方法的项目地址:https://gitee.com/jrluo2049/shapleymarl。

    Abstract:

    In the complex environment of strong confrontation, the entity perception information is incomplete and the real-time response is required, which poses a challenge to the long-term and forward-looking dynamic force deployment decision. How to realize efficient exploration of strategies through explainable effective rewards and incentives is the key to drive strategic planning of dynamic force deployment by using learning methods. Aiming at the dynamic force deployment problem, this paper first proposes a multi-agent reinforcement learning strategy planning method based on SVD (Shapley value decomposition). The reward distribution among cooperative multi-agents is explained by SVD, and the reward distribution is analysed by SVD reinforcement learning method to solve Markov convex game strategy; Secondly, based on the scenario of naval and air cross-domain cooperative confrontation, this paper analyses the allocation of space domain combat resources in heterogeneous multi-entity cooperative confrontation, builds a dynamic force deployment strategy planning model, and designs the state space, action space and reward function of the problem. Finally, based on typical application scenarios, simulation experiments are organized to verify the dynamic force deployment problem with the military chess deduction system. The results show that compared with the multi-class baseline algorithm, the proposed method in this paper has excellent performance in strategic planning of dynamic force deployment, and it is theoretically interpretable. The proposed method learned the strategy of "layer upon layer interception, zone confrontation, cover core, and layered breaking". The method of project address: https://gitee.com/jrluo2049/shapleymarl.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-09-14
  • 最后修改日期:2025-02-24
  • 录用日期:2025-02-25
  • 在线发布日期:
  • 出版日期:
文章二维码