Abstract:Most of the existing face forgery detection methods usually achieve acceptable detection performance on known attacks, but still face the risk of overfitting and fail to maintain good detection capability when dealing with unknown scenes. To solve this problem, an effective face forgery detection framework based on multi-view learning and consistent representation was proposed. To capture more comprehensive forgery traces, the input image was transformed into two complementary views and a dual-stream backbone network was used for multi-view feature learning. The consistency metric was introduced to explicitly constrain the similarity of local features output from different viewpoints in a patch-level supervised manner. To improve the detection accuracy of the model, the feature decomposition strategy further optimized the forgery-relevant feature to reduce the interference of irrelevant factors, and the decision made from the forgery-relevant feature space was used as the final prediction. Extensive experiments on benchmark datasets show that the proposed method outperforms the existing mainstream approaches with good cross-domain generalization capability.