基于自训练EM算法的半监督文本分类
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金重大研究计划资助项目(90604006);教育部高校博士点基金资助项目(20049998027)


Semi-supervised Text Classification Based on Self-trainingEM Algorithm
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为了提高计算效率,提出基于自训练的改进EM算法STEM。在每步迭代的E-step中,将中间分类器最有把握对其类别进行预测的未标注样本转移至标注样本集,并应用到M-step中进行下一个中间分类器的训练,从而引入了利用中间结果的自训练机制。文本分类实验表明STEM算法在大部分情况下的分类准确性都高于EM,并通过减少迭代提高了分类器学习的计算效率。

    Abstract:

    To improve computation efficiency, an enhanced EM algorithm based on self-training named STEM is proposed. In the E-step of each iteration, the unlabeled sample, whose class can be predicted by the current intermediate classifier with the most confidence, is moved to the labeled set and used in the M-step to train the next intermediate classifier. Therefore the mechanism of self-training by inter-result employing is introduced. Experimentation on text classification indicates that STEM outperforms EM in classification accuracy most of the time and improves the learning efficiency by reducing iterations.

    参考文献
    相似文献
    引证文献
引用本文

张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类[J].国防科技大学学报,2007,29(6):65-69.
ZHANG Bofeng, BAI Bing, SU Jinshu. Semi-supervised Text Classification Based on Self-trainingEM Algorithm[J]. Journal of National University of Defense Technology,2007,29(6):65-69.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-04-18
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2013-02-28
  • 出版日期:
文章二维码