A comparative topic model for words burstiness
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    State-of-the-art cross collections topic models suffer from the serious flaw that it cannot capture the tendency of words to appear in bursts. Based on LDA (Latent Dirichlet Allocation),a topic model CDCMLDA(Cross-collection Dirichlet compound multinomial Latent Dirichlet Allocation), which models the burstiness phenomena of words using Dirichlet compound multinomial (DCM) distribution, was proposed. A Monte Carlo Expectation Maximization algorithm for model inference was presented. A variety of qualitative and quantitative evaluations of CDCMLDA were performed, which shows that CDCMLDA not only discovers the common and unique aspects on topics, but also improves the model perplexity compared with the two cross-collection topic models. 

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 18,2012
  • Revised:
  • Adopted:
  • Online: August 22,2013
  • Published:
Article QR Code