Asymmetric Actor-Critic reinforcement learning for long-sequence autonomous manipulation

doi:10.11887/j.issn.1001-2486.24120032

Home > Archive>Volume 47, Issue 4, 2025 >111-122. DOI:10.11887/j.issn.1001-2486.24120032

Asymmetric Actor-Critic reinforcement learning for long-sequence autonomous manipulation
DOI:
                        10.11887/j.issn.1001-2486.24120032
                    
CSTR:
                        
Author:
                        
Affiliation:1.College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073 , China ;2.National Key Laboratory of Equipment State Sensing and Smart Support, Changsha 410073 , China ; 3.College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073 , China
Clc Number:TP249
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Long-sequence autonomous manipulation capability becomes one of the bottlenecks hindering the practical application of intelligent robots. To address the diverse long-sequence operation skill requirements faced by robots in complex scenarios, an efficient and robust asymmetric Actor-Critic reinforcement learning method was proposed. This approach aims to solve the challenges of high learning difficulty and complex reward function design in long-sequence tasks. By integrating multiple Critic networks to collaboratively train a single Actor network, and introducing GAIL (generative adversarial imitation learning) to generate intrinsic rewards for the Critic network, the learning difficulty of long-sequence tasks was reduced. On this basis, a two-stage learning method was designed, utilizing imitation learning to provide high-quality pre-trained behavior policies for reinforcement learning, which not only improves learning efficiency but also enhances the generalization performance of the policy. Simulation results for long-sequence autonomous task execution in a chemical laboratory demonstrate that the proposed method significantly improves the learning efficiency of robot long-sequence skills and the robustness of behavior policies.

Reference

Cited by

Get Citation

任君凯, 瞿宇珂, 罗嘉威, 等. 面向长序列自主作业的非对称Actor-Critic强化学习方法[J]. 国防科技大学学报, 2025, 47(4): 111-122.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 16,2024
Revised:
Adopted:
Online: July 23,2025
Published: August 28,2025

Home

About Journal

Guide for Authors

Editorial Board

Publication Statement

Open Access Statement

Contact

Journal Subscription

Rss

AI assistant

Chinese

Get Citation

Share

Article Metrics

History

Article QR Code