引用本文: | 高显忠,项磊,王宝来,等.针对无人机集群对抗的规则与智能耦合约束训练方法.[J].国防科技大学学报,2023,45(1):157-166.[点击复制] |
GAO Xianzhong,XIANG Lei,WANG Baolai,et al.Rule and intelligence coupling constraint training method for UAV swarm confrontation[J].Journal of National University of Defense Technology,2023,45(1):157-166[点击复制] |
|
|
|
本文已被:浏览 12800次 下载 3943次 |
针对无人机集群对抗的规则与智能耦合约束训练方法 |
高显忠1,项磊2,王宝来2,贾高伟1,侯中喜1 |
(1. 国防科技大学 空天科学学院, 湖南 长沙 410073;2. 国防科技大学 计算机学院, 湖南 长沙 410073)
|
摘要: |
基于无人机集群智能攻防对抗构想,建立了无人机集群智能攻防对抗仿真环境。针对传统强化学习算法中难以通过奖励信号精准控制对抗过程中无人机的速度和攻击角度等问题,提出一种规则与智能耦合约束训练的多智能体深度确定性策略梯度(rule and intelligence coupling constrained multi-agent deep deterministic policy gradient, RIC-MADDPG)算法,该算法采用规则对强化学习中无人机的动作进行约束。实验结果显示,基于RIC-MADDPG方法训练的无人机集群对抗模型能使得红方无人机集群在对抗中的胜率从53%提高至79%,表明采用“智能体训练—发现问题—编写规则—再次智能体训练—再次发现问题—再次编写规则”的方式对优化智能体对抗策略是有效的。研究结果对建立无人机集群智能攻防策略训练体系、开展规则与智能相耦合的集群战法研究具有一定参考意义。 |
关键词: 无人机集群 MADDPG算法 智能体决策 对抗模型 规则约束 |
DOI:10.11887/j.cn.202301018 |
投稿日期:2021-02-20 |
基金项目:国家自然科学基金资助项目(11602298) |
|
Rule and intelligence coupling constraint training method for UAV swarm confrontation |
GAO Xianzhong1, XIANG Lei2, WANG Baolai2, JIA Gaowei1, HOU Zhongxi1 |
(1. College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China;2. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China)
|
Abstract: |
Based on the concept of the intelligent combat of UAV (unmanned aerial vehicle) swarms, the UAV swarms intelligent combat simulation environment was established. Aiming at the problem that it is difficult to accurately control the speed and attack angle of UAVs in the confrontation process through reward signals in traditional reinforcement learning algorithms, the RIC-MADDPG (rule and intelligence coupling constrained multi-agent deep deterministic policy gradient) algorithm was proposed. The algorithm uses rules to constrain the actions of UAVs in reinforcement learning. The simulation results show that the wining-rate of red UAV swarm, trained by the method based on the RIC-MADDPG, can be improved from 53% to 79%. This proves that the strategy of "agent training—problem finding—rule making—agent training again—problem finding again—rule making again" is effective for the optimization of agent combat strategy. The research results can be a reference for establishing the training system of the intelligent combat strategy of UAV swarms and conducting the research of swarm tactics coupling rule and intelligence. |
Keywords: UAV swarms MADDPG algorithm agent decision making countermeasure model rule-constrained |
|
|
|
|
|