Abstract:State-of-the-art cross collections topic models suffer from the serious flaw that it cannot capture the tendency of words to appear in bursts. Based on LDA (Latent Dirichlet Allocation),a topic model CDCMLDA(Cross-collection Dirichlet compound multinomial Latent Dirichlet Allocation), which models the burstiness phenomena of words using Dirichlet compound multinomial (DCM) distribution, was proposed. A Monte Carlo Expectation Maximization algorithm for model inference was presented. A variety of qualitative and quantitative evaluations of CDCMLDA were performed, which shows that CDCMLDA not only discovers the common and unique aspects on topics, but also improves the model perplexity compared with the two cross-collection topic models.