卫星领域语料库构建与命名实体识别
作者:
作者单位:

(1. 中国科学院国家空间科学中心 复杂航天系统电子信息技术重点实验室, 北京 100190;2. 中国科学院大学, 北京 100049;3. 国家无线电监测中心检测中心, 北京 100041)

作者简介:

徐聪(1997—),女,山东济宁人,博士研究生,E-mail:xucong19@mails.ucas.edu.cn

通讯作者:

中图分类号:

V419; TP391.1

基金项目:

中国科学院复杂航天系统电子信息技术重点实验室择优基金资助项目(Y42613A32S)


Satellite domain corpus construction and named entity recognition
Author:
Affiliation:

(1. Key Laboratory of Electronics and Information Technology for Space Systems, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100049, China;3.The State Radio_monitoring_center Testing Center, Beijing 100041, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对卫星领域命名实体语料匮乏、现有算法识别性能较低的问题,提出一种考虑模糊边界的卫星领域实体标注方法,构建包含8类常见卫星领域实体的语料库,与该领域现有语料库相比粒度更细、覆盖更广,并以此为基础提出迁移学习和多网络融合的卫星领域实体识别算法。该算法采用预训练双向编码器对语料语义平滑迁移获得子词级别特征,采用双向长短期记忆(bi-directional long-short term memory, BiLSTM)神经网络捕捉上下文信息确定边界,以条件随机场作为解码器实现标签预测。实验结果表明:相比于BiLSTM等传统模型具有更优的识别性能,算法在8种实体上的F1值均在92%以上,微平均F1值达到96.10%。

    Abstract:

    Aiming at the lack of named entity corpus in the satellite domain and the low recognition performance of existing algorithms, a satellite domain entity labeling method considering fuzzy boundaries was proposed, constructed a corpus containing 8 common satellite domain entities where the granularity was finer and the coverage was wider in comparison with the existing corpora in this field. Based on this, a transfer learning and multi-network fusion satellite domain entity recognition algorithm was proposed. Algorithm used pretrained bidirectional encoder representations for transformers to smoothly transfer the semantics of the corpus for subword-level features, a BiLSTM (bi-directional long-short term memory) network for capturing contextual information to determine boundaries, and label prediction was achieved using a conditional random field as a decoder. Experimental results show that, compared with traditional models such as BiLSTM, the proposed algorithm has better recognition performance where the F1-score in 8 entities is all above 92% and the micro-average F1-score reaches 96.10%.

    参考文献
    相似文献
    引证文献
引用本文

徐聪,石会鹏,陈志敏,等.卫星领域语料库构建与命名实体识别[J].国防科技大学学报,2024,46(4):175-183.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-04-15
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-07-19
  • 出版日期: 2024-08-28
文章二维码