引用本文: | 张博锋,白冰,苏金树.基于自训练EM算法的半监督文本分类.[J].国防科技大学学报,2007,29(6):65-69.[点击复制] |
ZHANG Bofeng,BAI Bing,SU Jinshu.Semi-supervised Text Classification Based on Self-training EM Algorithm[J].Journal of National University of Defense Technology,2007,29(6):65-69[点击复制] |
|
|
|
本文已被:浏览 7124次 下载 5981次 |
基于自训练EM算法的半监督文本分类 |
张博锋, 白冰, 苏金树 |
(国防科技大学 计算机学院,湖南 长沙 410073)
|
摘要: |
为了提高计算效率,提出基于自训练的改进EM算法STEM。在每步迭代的E-step中,将中间分类器最有把握对其类别进行预测的未标注样本转移至标注样本集,并应用到M-step中进行下一个中间分类器的训练,从而引入了利用中间结果的自训练机制。文本分类实验表明STEM算法在大部分情况下的分类准确性都高于EM,并通过减少迭代提高了分类器学习的计算效率。 |
关键词: 半监督学习 EM算法 自训练 文本分类 naive Bayes |
DOI: |
投稿日期:2007-04-18 |
基金项目:国家自然科学基金重大研究计划资助项目(90604006);教育部高校博士点基金资助项目(20049998027) |
|
Semi-supervised Text Classification Based on Self-training EM Algorithm |
ZHANG Bofeng, BAI Bing, SU Jinshu |
(College of Computer, National Univ. of Defense Technology, Changsha 410073, China)
|
Abstract: |
To improve computation efficiency, an enhanced EM algorithm based on self-training named STEM is proposed. In the E-step of each iteration, the unlabeled sample, whose class can be predicted by the current intermediate classifier with the most confidence, is moved to the labeled set and used in the M-step to train the next intermediate classifier. Therefore the mechanism of self-training by inter-result employing is introduced. Experimentation on text classification indicates that STEM outperforms EM in classification accuracy most of the time and improves the learning efficiency by reducing iterations. |
Keywords: semi-supervised learning EM algorithm self-training text classification naive Bayes |
|
|