面向长序列自主作业的非对称Actor-Critic强化学习方法

面向长序列自主作业的非对称Actor-Critic强化学习方法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:1.中国人民解放军国防科技大学智能科学学院;2.中国人民解放军国防科技大学空天科学学院
作者简介:
通讯作者:
中图分类号:TP249
基金项目:国家自然科学基金项目（面上项目，重点项目，重大项目）(U1913202,U22A2059,62203460)

Asymmetric Actor-Critic Reinforcement Learning Method for Long-Sequence Autonomous Manipulation

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

长序列自主作业能力已成为制约智能机器人走向实际应用的瓶颈问题之一。针对机器人在复杂场景中面临的多样化长序列操作技能需求，提出了一种高效鲁棒的非对称Actor-Critic强化学习方法，旨在解决长序列任务学习难度大与奖励函数设计复杂的挑战。通过整合多个Critic网络协同训练单一Actor网络，并引入生成对抗模仿学习为Critic网络生成内在奖励，从而降低了长序列任务学习难度。在此基础上，设计了两阶段学习方法，利用模仿学习为强化学习提供高质量预训练行为策略，在进一步提高学习效率的同时，增强了策略的泛化性能。在面向化学实验室长序列自主作业的仿真结果表明，该方法显著提高了机器人长序列操作技能的学习效率与行为策略的鲁棒性。

Abstract:

Long-sequence autonomous manipulation capability becomes one of the bottlenecks hindering the practical application of intelligent robots. To address the diverse long-sequence operation skill requirements faced by robots in complex scenarios, an efficient and robust asymmetric Actor-Critic reinforcement learning method was proposed. This approach aims to solve the challenges of high learning difficulty and complex reward function design in long-sequence tasks. By integrating multiple Critic networks to collaboratively train a single Actor network, and introducing GAIL (generative adversarial imitation learning) to generate intrinsic rewards for the Critic network, the learning difficulty of long-sequence tasks was reduced. On this basis, a two-stage learning method was designed, utilizing imitation learning to provide high-quality pre-trained behavior policies for reinforcement learning, which not only improves learning efficiency but also enhances the generalization performance of the policy. Simulation results for long-sequence autonomous task execution in a chemical laboratory demonstrate that the proposed method significantly improves the learning efficiency of robot long-sequence skills and the robustness of behavior policies.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-12-16
最后修改日期:2025-06-19
录用日期:2025-04-24
在线发布日期: 2025-06-05
出版日期:

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码