Reinforcement learning method via meta-learning the exploring latent variable
CSTR:
Author:
Affiliation:

1.National Innovation Institute of Defense Technology, Academy of Military, Beijing 100071 , China ; 2.The PLA Unit 32806, Beijing 100091 , China ; 3.Xi′an Satellite Control Center, Xi′an 710043 , China

Clc Number:

TP181

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aiming at the issues of low utilization of interaction data or the need for additional task data in traditional agent exploration work, an online-learnable exploration latent variable that characterizes the current task features to assist the policy network in behavioral decision-making was innovatively introduced. There was no need for additional multi-task data or additional environmental interaction steps in the current task. The exploring latent variable was updated in the learnable environment model, and the environment model underwent supervised updates based on the intelligent agent and real environment interaction data. The exploration in advance in the simulated environment model was assisted by the exploring latent variable, and thus the performance of agents in the real environment was improved. The performance in typical continuous control tasks was raised by about 30% in the experiments, which was of guiding significance for the single-task exploration and meta reinforcement learning research.

    Reference
    Related
    Cited by
Get Citation

李艺颖, 周伟. 元学习探索隐变量的强化学习方法[J]. 国防科技大学学报, 2025, 47(5): 197-205.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 21,2023
  • Revised:
  • Adopted:
  • Online: October 08,2025
  • Published:
Article QR Code