首页期刊介绍编委会来稿须知期刊订阅联系我们留言板Email订阅Rss
引用本文:车蕾,杨小平.多特征融合文本聚类的新闻话题发现模型[J].国防科技大学学报,2017,39(3):85-90.[点击复制]
CHE Lei,YANG Xiaoping.News topic discovery model of multi feature fusion text clustering[J].Journal of National University of Defense Technology,2017,39(3):85-90[点击复制]
【打印本页】   【在线阅读全文】    【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  【关闭】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1411次   下载 2726次
多特征融合文本聚类的新闻话题发现模型
(1. 中国人民大学 信息学院, 北京 100872;2. 北京信息科技大学 信息管理学院, 北京 100192)
摘要:
    融合新闻命名实体、新闻标题、新闻重要段落、文本语义等多特征影响,提出基于多特征融合文本聚类的新闻话题发现模型。模型根据新闻的多特征影响,提出一种多特征融合文本聚类方法。该方法针对新闻标题、新闻重要段落等特征因素构建向量空间模型及相似度算法,基于潜在狄利克雷分配模型构建主题空间模型及相似度算法,针对命名实体构建命名实体模型及相似度算法,并将三种相似度算法形成最优融合。基于多特征融合文本聚类方法,模型改进了用于新闻话题发现的Single-Pass算法。实验是在真实新闻数据集上开展的,实验结果表明:该模型有效地提高了新闻话题发现的准确率、召回率和综合评价指标,并具有一定的自适应能力。
关键词:  新闻话题  多特征融合  潜在狄利克雷分配  向量空间模型  主题空间模型
DOI:10.11887/j.cn.201703014
投稿日期:2016-02-10  
基金项目:国家自然科学基金资助项目(61272513);北京市教育委员会科技计划面上资助项目(KM201511232016,SM201511232004);北京高等学校青年英才计划资助项目(YETP1503)
News topic discovery model of multi feature fusion text clustering
CHE Lei1,2, YANG Xiaoping1,3
(1. School of Information, Renmin University of China, Beijing 100872, China;2. 2. School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China;3.)
Abstract:
    The news topic discovery model based on multi feature fusion text clustering was proposed fusing multi features of news, such as named entities, news headlines, important paragraphs, text semantics and so on. Based on the multi feature influence of news, a multi feature fusion text clustering method was put forward in this model. In this way, vector space model and similarity algorithm based on feature words, news headlines, important paragraphs were constructed, subject space model and similarity algorithm based on latent Dirichlet allocation were constructed, named entity model and similarity algorithm based on named entities were constructed, and those three similarity algorithms were fused optimally. Based on multi feature fusion text clustering method, the Single-Pass algorithm used in the news topic discovery was improved. Experiments were carried out on the real news data set, and the experimental results show that the model can improve the accuracy rate, recall rate and comprehensive evaluation index of the news topic discovery, and have some ability of self-adaption.
Key words:  news topic  multi feature fusion  latent Dirichlet allocation  vector space model  subject space model
| 手机端
湘ICP备09019258号    版权所有:《国防科技大学学报》编辑部
地址:湖南省长沙市开福区德雅路109号(410073)    电话:0731-84572637     E-mail:xuebao@nudt.edu.cn
技术支持:北京勤云科技发展有限公司