首页期刊介绍编委会来稿须知期刊订阅联系我们留言板Email订阅Rss
引用本文:黄晓辉,乔立升,余文涛,李京,薛寒.中文分词与命名实体识别的联合学习[J].国防科技大学学报,2021,43(1):86-94.[点击复制]
HUANG Xiaohui,QIAO Lisheng,YU Wentao,LI Jing,XUE Han.Joint learning of Chinese word segmentation and named entity recognition[J].Journal of National University of Defense Technology,2021,43(1):86-94[点击复制]
【打印本页】   【在线阅读全文】    【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  【关闭】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 229次   下载 203次
中文分词与命名实体识别的联合学习
黄晓辉1,2,乔立升1,余文涛2,李京1,薛寒2
(1. 中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230026;2. 战略支援部队信息工程大学洛阳校区, 河南 洛阳 471003)
摘要:
    将卷积结构引入循环神经网络, 从而构建卷积循环神经网络。以此为基础,研究构建了面向中文分词与实体识别联合学习的序列标注模型。该模型依托卷积循环神经网络构建特征编码层,实现中文字序列局部空间特征和长距离时序依赖特征的联合提取;依托改进的循环神经网络构建标签解码层,实现标签序列长距离时序依赖的有效建模;依托统一的分词与实体识别序列标注模式实现分词信息与实体信息的联合学习,避免传统流水线法的误差传播问题。在人民日报语料和微软标注语料上的实验结果显示,该框架较传统统计模型和神经网络模型有显著的性能提升,尤其是在识别字数较多的命名实体时,其效果明显优于其他方法。
关键词:  卷积循环神经网络  局部空间特征  时序依赖特征  分词与实体识别
DOI:10.11887/j.cn.202101012
投稿日期:2019-08-27  
基金项目:国家重点研发计划资助项目(2016YFB0201402)
Joint learning of Chinese word segmentation and named entity recognition
HUANG Xiaohui1,2, QIAO Lisheng1, YU Wentao2, LI Jing1, XUE Han2
(1. College of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;2. Luoyang Campus of the Information Engineering University of the Strategic Support Force, Luoyang 471003, China)
Abstract:
    The convolutional structure was introduced into the recurrent neural network to construct a convolutional recurrent neural network. Based on this network, a sequence annotation model for joint learning of Chinese word segmentation and entity recognition was constructed. The model relies on the convolutional recurrent neural network to construct feature-encoding layer, which realizes the joint extraction of local spatial features and long-distance time-dependent features of Chinese character sequences; the improved recurrent neural network was relies on the constructing of tag-decoding layer, which realizes the effective modeling of timing-dependent features in the tag sequences; the unified word segmentation and entity recognition annotation mode relies on the achieving of joint learning of word segmentation information and entity information, which avoids the error propagation problem of traditional pipeline methods. Experimental results on the People′s Daily corpus and Microsoft′s annotated corpus show that the framework has significant performance improvement over traditional statistical models and neural network models, especially when identifying entities with multiple characters, and its effect is significantly better than other methods.
Key words:  convolutional recurrent neural network  local spatial features  time-dependent features  word segmentation and entity recognition
| 手机端
湘ICP备09019258号    版权所有:《国防科技大学学报》编辑部
地址:湖南省长沙市开福区德雅路109号(410073)    电话:0731-87000367     E-mail:xuebao@nudt.edu.cn
技术支持:北京勤云科技发展有限公司