引用本文: | 曹建军,常宸,陶嘉庆,等.多源文本数据真值发现方法.[J].国防科技大学学报,2022,44(4):172-179.[点击复制] |
CAO Jianjun,CHANG Chen,TAO Jiaqing,et al.Truth discovery method for multi-source text data[J].Journal of National University of Defense Technology,2022,44(4):172-179[点击复制] |
|
|
|
本文已被:浏览 5437次 下载 3750次 |
多源文本数据真值发现方法 |
曹建军1,常宸2,陶嘉庆1,3,翁年凤1,蒋国权1 |
(1. 国防科技大学 第六十三研究所, 江苏 南京 210007;2. 陆军工程大学 指挥控制工程学院, 江苏 南京 210007;3. 南京工业大学 工业工程系, 江苏 南京 211800)
|
摘要: |
针对传统真值发现算法无法直接应用于文本数据的问题,提出基于深度神经网络面向多源文本数据的真值发现算法(NN_Truth)。根据文本答案多因素性、词语使用多样性以及文本数据稀疏性等特点,将“数据源-答案”向量作为网络输入,识别答案真值向量作为网络输出,依据真值发现的一般假设,无监督学习各数据源答案向量间关联关系,并最终获得答案真值。实验结果表明,该算法适用于文本数据真值发现场景,较基于检索的方法及传统真值发现算法效果更优。 |
关键词: 数据质量 真值发现 神经网络 文本挖掘 |
DOI:10.11887/j.cn.202204019 |
投稿日期:2020-11-24 |
基金项目:国家自然科学基金资助项目(61371196);中国博士后科学基金资助项目(20090461425);中国博士后科学基金特别资助项目(201003797) |
|
Truth discovery method for multi-source text data |
CAO Jianjun1, CHANG Chen2, TAO Jiaqing1,3, WENG Nianfeng1, JIANG Guoquan1 |
(1. The Sixty-third Research Institute, National University of Defense Technology, Nanjing 210007, China;2. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China;3. Department of Industrial Engineering, Nanjing Tech University, Nanjing 211800, China)
|
Abstract: |
In order to solve the problem that the traditional truth discovery algorithm cannot be applied to text data directly, a truth discovery algorithm(NN_Truth) for text data based on deep neural network was proposed. For the features of multifactorial property of text answers, the diversity of word usages, and the sparsity of the text data, the “source-answer” vector was used as the network input, and the truth vector was recognized as the network output. The relationship between answers from each source could be unsupervised learned according to general hypothesis of truth discovery, and finally obtained the truth. The experiment results show that the proposed algorithm is suitable for text data truth discovery, and it is better than the retrieval methods and traditional truth discovery algorithm. |
Keywords: data quality truth discovery neural network text mining |
|
|
|
|
|