Abstract:A multi-satellite cooperative planning problem model was given considering the characteristics of the task requests and satellite constraints. Then the original performance function of each satellite agent was modified by introducing both the constraint punishing operator and the multi-satellite joint punishing operator. Next, a multi-satellite reinforcement learning algorithm (MUSARLA) was proposed to derive the coordinated task allocation strategy. Furthermore, the interaction among multiple satellites was designed based on blackboard architecture to reduce the communication cost while learning. Finally, simulated experiments are carried out which verified the effectiveness of the proposed algorithm.