元学习探索隐变量的强化学习方法
作者:
作者单位:

1.军事科学院国防科技创新研究院人工智能研究中心;2.中国人民解放军32806部队;3.西安卫星测控中心

作者简介:

通讯作者:

中图分类号:

TP181

基金项目:

国家自然科学基金青年基金资助项目 62206307


Reinforcement learning method via meta-learning the exploring latent variable
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对智能体传统探索工作对交互数据利用率低、或需要额外的其他任务数据的问题,创新性地引入一个表征当前任务特点的可在线学习的探索隐变量,辅助策略网络进行行为决策。无需额外的其他多任务数据,也无需多余的当前任务的环境交互步,探索隐变量在所引入的可学习的环境模型中进行更新;而环境模型又进而通过智能体与真实环境的交互数据进行监督式更新。因此,探索隐变量即在模拟真实环境的模型中提前帮助“探路”,该任务探索信息可帮助智能体在真实环境中加强探索、提高性能。实验在强化学习典型连续控制任务上有约30%性能提升,对单任务探索工作和元强化学习研究具有指导和借鉴意义。

    Abstract:

    The efficient online exploration of intelligent agents is important in reinforcement learning tasks, but still faces the problem of low utilization of interactive data with environment, or the need for additional tasks’ data. To solve this problem, an online exploration latent variable that obtained the characteristics of current task to assist the agents to behave was introduced. There was no need for additional multi-task data or additional environmental interaction steps in the current task. The exploring latent variable was updated in the learnable environment model, and the environment model underwent supervised updates based on the intelligent agent and real environment interaction data. The exploration in advance in the simulated environment model was assisted by the exploring latent variable, and thus the performance of agents in the real environment was improved. The performance in typical continuous control tasks was raised by about 30% in the experiments, which was of guiding significance for the single-task exploration and meta reinforcement learning research.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-05-21
  • 最后修改日期:2025-07-05
  • 录用日期:2023-10-20
  • 在线发布日期: 2025-07-08
  • 出版日期:
文章二维码