Abstract:In order to solve the problem of trajectory maneuver adjustment caused by large deviation of flight trajectory after midcourse penetration of ballistic missile, an optimization model of maneuver adjustment timing strategy was established. A reverse sequence Q learning algorithm for maneuver adjustment was designed, and a Tile coding approximator encoding was used to encode the state characteristics space, and the space was linearly approximated. A reverse-order update strategy mechanism combining Q learning algorithm and Monte Carlo method was constructed, the optimal timing of missile maneuvering adjustment was trained. The simulation results show that the strategy obtained by training 10 000 generations of reinforcement learning algorithm can reliably control the adjustment decision of flight trajectory after missile penetration with the minimum maneuver times under given scenario parameters, which verifies the effectiveness of the method.