首页期刊介绍编委会来稿须知期刊订阅联系我们留言板Email订阅Rss
引用本文:高显忠,项磊,王宝来,等.针对无人机集群对抗的规则与智能耦合约束训练方法[J].国防科技大学学报,2023,45(1):157-166.[点击复制]
GAO Xianzhong,XIANG Lei,WANG Baolai,et al.Rule and intelligence coupling constraint training method for UAV swarm confrontation[J].Journal of National University of Defense Technology,2023,45(1):157-166[点击复制]
【打印本页】   【在线阅读全文】    【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  【关闭】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 209次   下载 195次
针对无人机集群对抗的规则与智能耦合约束训练方法
高显忠1,项磊2,王宝来2,贾高伟1,侯中喜1
(1. 国防科技大学 空天科学学院, 湖南 长沙 410073;2. 国防科技大学 计算机学院, 湖南 长沙 410073)
摘要:
    基于无人机集群智能攻防对抗构想,建立了无人机集群智能攻防对抗仿真环境。针对传统强化学习算法中难以通过奖励信号精准控制对抗过程中无人机的速度和攻击角度等问题,提出一种规则与智能耦合约束训练的多智能体深度确定性策略梯度(rule and intelligence coupling constrained multi-agent deep deterministic policy gradient, RIC-MADDPG)算法,该算法采用规则对强化学习中无人机的动作进行约束。实验结果显示,基于RIC-MADDPG方法训练的无人机集群对抗模型能使得红方无人机集群在对抗中的胜率从53%提高至79%,表明采用“智能体训练—发现问题—编写规则—再次智能体训练—再次发现问题—再次编写规则”的方式对优化智能体对抗策略是有效的。研究结果对建立无人机集群智能攻防策略训练体系、开展规则与智能相耦合的集群战法研究具有一定参考意义。
关键词:  无人机集群  MADDPG算法  智能体决策  对抗模型  规则约束
DOI:10.11887/j.cn.202301018
投稿日期:2021-02-20  
基金项目:国家自然科学基金资助项目(11602298)
Rule and intelligence coupling constraint training method for UAV swarm confrontation
GAO Xianzhong1, XIANG Lei2, WANG Baolai2, JIA Gaowei1, HOU Zhongxi1
(1. College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China;2. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)
Abstract:
    Based on the concept of the intelligent combat of UAV (unmanned aerial vehicle) swarms, the UAV swarms intelligent combat simulation environment was established. Aiming at the problem that it is difficult to accurately control the speed and attack angle of UAVs in the confrontation process through reward signals in traditional reinforcement learning algorithms, the RIC-MADDPG (rule and intelligence coupling constrained multi-agent deep deterministic policy gradient) algorithm was proposed. The algorithm uses rules to constrain the actions of UAVs in reinforcement learning. The simulation results show that the wining-rate of red UAV swarm, trained by the method based on the RIC-MADDPG, can be improved from 53% to 79%. This proves that the strategy of "agent training—problem finding—rule making—agent training again—problem finding again—rule making again" is effective for the optimization of agent combat strategy. The research results can be a reference for establishing the training system of the intelligent combat strategy of UAV swarms and conducting the research of swarm tactics coupling rule and intelligence.
Keywords:  UAV swarms  MADDPG algorithm  agent decision making  countermeasure model  rule-constrained
| 手机端
湘ICP备09019258号    版权所有:《国防科技大学学报》编辑部
地址:湖南省长沙市开福区德雅路109号(410073)    电话:0731-87027737     E-mail:journal@nudt.edu.cn
技术支持:北京勤云科技发展有限公司