Abstract:In the field of robotics, long-sequence autonomous manipulation capability has become one of the bottlenecks hindering the practical application of intelligent robots. To address the diverse long-sequence operation skill requirements faced by robots in complex scenarios, an efficient and robust asymmetric Actor-Critic reinforcement learning method was proposed. This approach aims to solve the challenges of high learning difficulty and complex reward function design in long-sequence tasks. By integrating multiple Critic networks to collaboratively train a single Actor network, and introducing Generative Adversarial Imitation Learning (GAIL) to generate intrinsic rewards for the Critic network, the learning difficulty of long-sequence tasks is reduced. On this basis, a two-stage learning method was designed, utilizing imitation learning to provide high-quality pre-trained behavior policies for reinforcement learning, which not only improves learning efficiency but also enhances the generalization performance of the policy. Simulation results for long-sequence autonomous task execution in a chemical laboratory demonstrate that the proposed method significantly improves the learning efficiency of robot long-sequence skills and the robustness of behavior policies.