Abstract:In the context of dynamic countermeasures between radar and active jammer, this paper models the working frequencies of radar and adversarial jammer as the combat action space based on the multi-arm bandit (MAB) model in online learning theory. By exploring the uncertainty of the jamming environment state through multiple-round pulses transmission, a frequency channel jamming recognizer based on a convolutional neural network is constructed to obtain the posterior probability estimation of the belief state of each frequency channel. The Thompson sampling algorithm is used to efficiently solve the built MAB model, achieving a balance between exploration and exploitation. Simulation results show that compared with random frequency agility and deep reinforcement learning algorithms, the method has higher convergence performance and is more adaptable to dynamic fast-changing jamming environments, which can give full potential to the antagonism advantage of radar active waveform transmission.