国防科技大学学报

引用本文:	李耀宇,朱一凡,杨峰,等.基于逆向强化学习的舰载机甲板调度优化方案生成方法.[J].国防科技大学学报,2013,35(4):171-175.[点击复制]
	LI Yaoyu,ZHU Yifan,YANG Feng,et al.Inverse reinforcement learning based optimal schedule generation approach for carrier aircraft on flight deck[J].Journal of National University of Defense Technology,2013,35(4):171-175[点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 8310次下载 7662次

基于逆向强化学习的舰载机甲板调度优化方案生成方法

李耀宇, 朱一凡, 杨峰, 贾全

(国防科技大学信息系统与管理学院，湖南长沙 410073)

摘要:

针对计算机辅助指挥调度舰载机甲板作业的决策过程无法脱离人参与这一特点，引入基于逆向学习的强化学习方法，将指挥员或专家的演示作为学习对象，通过分析舰载机的甲板活动，建立舰载机甲板调度的马尔可夫决策模型(MDP)框架；经线性近似，采用逆向学习方法计算得到回报函数，从而能够通过强化学习方法得到智能优化策略，生成舰载机甲板调度方案。经仿真实验验证，本文所提方法能够较好地学习专家演示，结果符合调度方案优化需求，为形成辅助决策提供了基础。

关键词: 逆向强化学习强化学习舰载机甲板调度优化方案生成

DOI：

投稿日期：2012-10-25

基金项目:国家自然科学基金资助项目（71031007）

Inverse reinforcement learning based optimal schedule generation approach for carrier aircraft on flight deck

LI Yaoyu, ZHU Yifan, YANG Feng, JIA Quan

(College of Information and System and Management, National University of Defense Technology, Changsha 410073, China)

Abstract:

Traditional aircraft scheduling on carrier flight deck relies heavily on human commander decisions. To improve the computer aided decision making, an inverse reinforcement learning method was proposed. Learning from the commander or expert's demonstration, a Markov decision process (MDP) based aircraft scheduling model by analyzing the aircraft operations on deck was proposed. Then, the optimal policy and schedule were generated by using the linear approximating and inverse reinforcement learning method. Simulation results show that our method can learn expert's demonstration well, satisfy the requirement of scheduling optimization, and facilitate the computer aided decision making.

Keywords: inverse reinforcement learning reinforcement learning aircraft scheduling on flight deck optimal schedule generation