Abstract:To better address the problem of insufficient adversarial robustness of object detection models under adversarial attacks, an adversarial robustness enhancement method for object detection models based on gradient alignment is proposed. During the adversarial training phase, we construct a composite loss function based on adversarial loss and alignment loss, and by introducing a gradient alignment strategy, the gradient differences between adversarial examples and clean examples can be constrained. Furthermore, combining the supervisory signals of knowledge distillation and the representational capability of self-supervised learning, our approach maximizes the feature similarity between adversarial and clean examples. Experimental results on the PASCAL VOC and MS COCO datasets demonstrate that the proposed method effectively improves the model"s adversarial robustness accuracy against adversarial examples.