引用本文: | 朱建文,赵长见,李小平,等.多约束强化学习最优智能滑翔制导方法.[J].国防科技大学学报,2022,44(4):116-124.[点击复制] |
ZHU Jianwen,ZHAO Changjian,LI Xiaoping,et al.Multi constraint optimal intelligent gliding guidance via reinforcement learning[J].Journal of National University of Defense Technology,2022,44(4):116-124[点击复制] |
|
|
|
本文已被:浏览 4653次 下载 3581次 |
多约束强化学习最优智能滑翔制导方法 |
朱建文1,赵长见2,李小平1,包为民1,3 |
(1. 西安电子科技大学 空间科学与技术学院, 西安 710126;2. 中国运载火箭技术研究院, 北京 100076;3. 中国航天科技集团公司, 北京 100048)
|
摘要: |
为提升复杂飞行任务下滑翔制导的自主性,提出一种基于最优制导与强化学习的多约束智能滑翔制导策略。引入三维最优制导以满足终端经纬度、高度以及速度倾角约束。提出基于侧向正弦机动的速度控制策略,研究考虑机动飞行的终端速度解析预测方法。针对速度控制中机动幅值无法离线确定的问题,研究基于强化学习的智能调参方法。该方法基于终端速度设计状态空间,以机动幅值设计动作空间,设计综合终端速度误差与滑翔制导任务的回报函数,采用Q-Learning实现机动幅值的智能调整。仿真结果表明,智能滑翔制导方法能够高精度满足终端多种约束,并能有效提升复杂任务下的自主决策能力。 |
关键词: 滑翔飞行 最优制导 智能调参 强化学习 Q-Learning |
DOI:10.11887/j.cn.202204012 |
投稿日期:2020-10-13 |
基金项目:国家自然科学基金资助项目(61703409);中国博士后科学基金资助项目(2019M66364) |
|
Multi constraint optimal intelligent gliding guidance via reinforcement learning |
ZHU Jianwen1, ZHAO Changjian2, LI Xiaoping1, BAO Weimin1,3 |
(1. School of Aerospace Science and Technology, Xidian University, Xi′an 710126, China;2. China Academy of Launch Vehicle Technology, Beijing 100076, China;3. China Aerospace Science and Technology Corporation, Beijing 100048, China)
|
Abstract: |
In order to improve the autonomy of gliding guidance for complex flight missions, a multi-constrained intelligent gliding guidance strategy based on optimal guidance and RL (reinforcement learning) was proposed. Three-dimensional optimal guidance was introduced to meet the terminal latitude, longitude, altitude and flight-path-angle constraints. A velocity control strategy through lateral sinusoidal maneuver was proposed, and an analytical terminal velocity prediction method considering maneuvering flight was studied. Aiming at the problem that the maneuvering amplitude in velocity control cannot be determined offline, an intelligent parameter adjustment method based on RL was studied. This method designed a state space via terminal velocity and an action space with maneuvering amplitude. In addition, it constructed a reward function that integrated the terminal velocity error and gliding guidance tasks, and used Q-Learning to achieve the intelligent adjustment of maneuvering amplitude. The simulation results show that the intelligent gliding guidance method can meet various terminal constraints with high accuracy, and can improve the autonomous decision-making ability under complex tasks effectively. |
Keywords: gliding flight optimal guidance intelligent parameter adjustment reinforcement learning Q-Learning |
|
|
|
|
|