中文分词与命名实体识别的联合学习

doi:10.11887/j.cn.202101012

首页 > 过刊浏览>2021年第43卷第1期 >86-94. DOI:10.11887/j.cn.202101012

中文分词与命名实体识别的联合学习
DOI:
                        10.11887/j.cn.202101012
                    
作者:
                        
                        
                    
作者单位:(1. 中国科学技术大学 计算机科学与技术学院, 安徽 合肥 230026;2. 战略支援部队信息工程大学洛阳校区, 河南 洛阳 471003)
作者简介:黄晓辉(1986—),男,河南洛阳人,讲师,博士研究生,E-mail:huangxia@mail.ustc.edu.cn
通讯作者:
中图分类号:TP183
基金项目:国家重点研发计划资助项目(2016YFB0201402)

Joint learning of Chinese word segmentation and named entity recognition

Author:

Affiliation:

(1. College of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;2. Luoyang Campus of the Information Engineering University of the Strategic Support Force, Luoyang 471003, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

将卷积结构引入循环神经网络, 从而构建卷积循环神经网络。以此为基础,研究构建了面向中文分词与实体识别联合学习的序列标注模型。该模型依托卷积循环神经网络构建特征编码层,实现中文字序列局部空间特征和长距离时序依赖特征的联合提取；依托改进的循环神经网络构建标签解码层,实现标签序列长距离时序依赖的有效建模；依托统一的分词与实体识别序列标注模式实现分词信息与实体信息的联合学习,避免传统流水线法的误差传播问题。在人民日报语料和微软标注语料上的实验结果显示,该框架较传统统计模型和神经网络模型有显著的性能提升,尤其是在识别字数较多的命名实体时,其效果明显优于其他方法。

Abstract:

The convolutional structure was introduced into the recurrent neural network to construct a convolutional recurrent neural network. Based on this network, a sequence annotation model for joint learning of Chinese word segmentation and entity recognition was constructed. The model relies on the convolutional recurrent neural network to construct feature-encoding layer, which realizes the joint extraction of local spatial features and long-distance time-dependent features of Chinese character sequences; the improved recurrent neural network was relies on the constructing of tag-decoding layer, which realizes the effective modeling of timing-dependent features in the tag sequences; the unified word segmentation and entity recognition annotation mode relies on the achieving of joint learning of word segmentation information and entity information, which avoids the error propagation problem of traditional pipeline methods. Experimental results on the People′s Daily corpus and Microsoft′s annotated corpus show that the framework has significant performance improvement over traditional statistical models and neural network models, especially when identifying entities with multiple characters, and its effect is significantly better than other methods.

参考文献

相似文献

引证文献

引用本文

黄晓辉,乔立升,余文涛,等.中文分词与命名实体识别的联合学习[J].国防科技大学学报,2021,43(1):86-94.
HUANG Xiaohui, QIAO Lisheng, YU Wentao, et al. Joint learning of Chinese word segmentation and named entity recognition[J]. Journal of National University of Defense Technology,2021,43(1):86-94.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-08-27
最后修改日期:
录用日期:
在线发布日期: 2021-01-26
出版日期: 2021-02-28

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码