Abstract:The efficient online exploration of intelligent agents is important in reinforcement learning tasks, but still faces the problem of low utilization of interactive data with environment, or the need for additional tasks’ data. To solve this problem, an online exploration latent variable that obtained the characteristics of current task to assist the agents to behave was introduced. There was no need for additional multi-task data or additional environmental interaction steps in the current task. The exploring latent variable was updated in the learnable environment model, and the environment model underwent supervised updates based on the intelligent agent and real environment interaction data. The exploration in advance in the simulated environment model was assisted by the exploring latent variable, and thus the performance of agents in the real environment was improved. The performance in typical continuous control tasks was raised by about 30% in the experiments, which was of guiding significance for the single-task exploration and meta reinforcement learning research.