Abstract:Automatic image annotation is a challenge task due to the well-known semantic gap. Considering the difference between low-level visual features and high-level semantic concepts, the framework of automatic image annotation from the two aspects, image representation and semantic modeling, was constructed. For image representation, a new method of regularized nonnegative tensor representation (RNTP) was presented to abstract the detailed high-order tensor structures according to human’s intuitive recognition. A three-level hierarchical Bayesian model, extended latent Dirichlet allocation (ELDA), was developed for semantic modeling. In ELDA, each item of multiple image factors was modeled as a finite mixture over latent variables. Meanwhile, an efficient expectation-maximization algorithm based on variational inference was proposed for parameter estimation. Extensive experimental results are reported on the NUS-WIDE dataset to validate the effectiveness of our proposed solution to the automatic image annotation problem by comparing with other state-of-the-art methods.