Abstract:This paper focuses on the efficient online exploration of intelligent agents in current single reinforcement learning tasks. In response to the problem of low utilization of interactive data with environment or the need for additional tasks’ data in traditional exploration work, this paper innovatively introduces an exploration latent variable in the online learning way that obtains the characteristics of the current task to assist the agents to act in the environment. There is no need for additional multi-task data or additional environmental interaction steps in the current task. The exploring latent variable is updated in the learnable environment model introduced in this paper; and the environment model then undergoes supervised updates based on the real environment interaction data. Therefore, the exploring latent variable helps to "explore" in a model that simulates the real environment dynamics in advance. This exploring information can help agents strengthen exploration and improve performance in the real environment. The experiment shows that the performance of this work in typical continuous control tasks of reinforcement learning is improved by about 30%, which is of guiding and reference significance for the research of exploration in single reinforcement learning tasks and meta reinforcement learning.