面向长序列自主作业的非对称Actor-Critic强化学习方法
作者:
作者单位:

1.国防科技大学 智能科学学院, 湖南 长沙 410073 ; 2.装备状态感知与敏捷保障全国重点实验室, 湖南 长沙 410073 ;3.国防科技大学 空天科学学院, 湖南 长沙 410073

作者简介:

任君凯(1991—),男,河北石家庄人,副教授,博士,硕士生导师,E-mail:jk.ren@nudt.edu.cn

通讯作者:

中图分类号:

TP249

基金项目:

国家自然科学基金资助项目(62373201);国防科技大学自主创新科学基金资助项目(ZK2023-30,24-ZZCX-GZZ-11)


Asymmetric Actor-Critic reinforcement learning for long-sequence autonomous manipulation
Author:
Affiliation:

1.College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073 , China ;2.National Key Laboratory of Equipment State Sensing and Smart Support, Changsha 410073 , China ; 3.College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073 , China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    长序列自主作业能力已成为制约智能机器人走向实际应用的问题之一。针对机器人在复杂场景中面临的多样化长序列操作技能需求,提出了一种高效鲁棒的非对称Actor-Critic强化学习方法,旨在解决长序列任务学习难度大与奖励函数设计复杂的挑战。通过整合多个Critic网络协同训练单一Actor网络,并引入生成对抗模仿学习为Critic网络生成内在奖励,从而降低长序列任务学习难度。在此基础上,设计两阶段学习方法,利用模仿学习为强化学习提供高质量预训练行为策略,在进一步提高学习效率的同时,增强策略的泛化性能。面向化学实验室长序列自主作业的仿真结果表明,该方法显著提高了机器人长序列操作技能的学习效率与行为策略的鲁棒性。

    Abstract:

    Long-sequence autonomous manipulation capability becomes one of the bottlenecks hindering the practical application of intelligent robots. To address the diverse long-sequence operation skill requirements faced by robots in complex scenarios, an efficient and robust asymmetric Actor-Critic reinforcement learning method was proposed. This approach aims to solve the challenges of high learning difficulty and complex reward function design in long-sequence tasks. By integrating multiple Critic networks to collaboratively train a single Actor network, and introducing GAIL (generative adversarial imitation learning) to generate intrinsic rewards for the Critic network, the learning difficulty of long-sequence tasks was reduced. On this basis, a two-stage learning method was designed, utilizing imitation learning to provide high-quality pre-trained behavior policies for reinforcement learning, which not only improves learning efficiency but also enhances the generalization performance of the policy. Simulation results for long-sequence autonomous task execution in a chemical laboratory demonstrate that the proposed method significantly improves the learning efficiency of robot long-sequence skills and the robustness of behavior policies.

    参考文献
    相似文献
    引证文献
引用本文

任君凯, 瞿宇珂, 罗嘉威, 等. 面向长序列自主作业的非对称Actor-Critic强化学习方法[J]. 国防科技大学学报, 2025, 47(4): 111-122.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-12-16
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-07-23
  • 出版日期:
文章二维码