引用本文: | 吴建盛,唐诗迪,梅德进,等.面向蛋白质功能预测中有向无环图标记结构的多示例多标记学习.[J].国防科技大学学报,2022,44(3):23-30.[点击复制] |
WU Jiansheng,TANG Shidi,MEI Dejin,et al.Multi-instance multi-label learning for labels with directed acyclic graph structures in protein function prediction[J].Journal of National University of Defense Technology,2022,44(3):23-30[点击复制] |
|
|
|
本文已被:浏览 5055次 下载 3896次 |
面向蛋白质功能预测中有向无环图标记结构的多示例多标记学习 |
吴建盛1,唐诗迪1,梅德进2,朱燕翔3,刁业敏4 |
(1. 南京邮电大学 地理与生物信息学院, 江苏 南京 210023;2. 南京邮电大学 通信与信息工程学院, 江苏 南京 210003;3. 南京仁面集成电路技术有限公司, 江苏 南京 210088;4. 南京叁角加文化发展中心, 江苏 南京 210005)
|
摘要: |
在多示例多标记学习问题中,标记之间往往是相互关联的,其中有向无环图结构是一种常见的层次关联结构,可见于蛋白质的基因本体学生物学功能预测的应用场景中。针对其标记间的有向无环图结构,提出了一种新的多示例多标记学习算法。算法从原始数据的特征空间训练出所有标记共享的低维子空间,通过随机梯度下降方法来降低模型排序损失,并融入标记间有向无环图结构关系对预测标记进行优化。将该算法应用于多个数据集的蛋白质功能预测中,实验结果表明,该算法具有更高的效率及预测性能。 |
关键词: 多示例多标记学习 蛋白质功能预测 有向无环图标记结构 标记相关性 |
DOI:10.11887/j.cn.202203004 |
投稿日期:2021-06-22 |
基金项目:国家自然科学基金资助项目(61872198,61971216);江苏省科技厅基础研究计划面上资助项目(BK20201378) |
|
Multi-instance multi-label learning for labels with directed acyclic graph structures in protein function prediction |
WU Jiansheng1, TANG Shidi1, MEI Dejin2, ZHU Yanxiang3, DIAO Yemin4 |
(1. School of Geographic and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;2. School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;3. Nanjing Renmian Integrated Circuit Technology Limited Company, Nanjing 210088, China;4. Nanjing Triangular Plus Culture Development Centre, Nanjing 210005, China)
|
Abstract: |
In MIML (multi-instance multi-label learning) tasks, labels are often correlated with each other, and DAG (directed acyclic graph) is a common hierarchically structure which often occurs in the prediction of gene ontology biological functions of proteins. Considering the labels with directed acyclic graph structures in MIML, a novel algorithm named MIMLDAG (multi-instance multi-label directed acyclic graph) was proposed. MIMLDAG trained a low-dimensional subspace of shared labels from the feature space of original datasets, minimized the rank loss by a stochastic gradient descent method, and then incorporated the inner DAG hierarchical structure of labels for optimizing the output labels. MIMLDAG was applied to predict the protein functions in multiple datasets, and the results show that MIMLDAG possesses higher efficiency and predictive performance. |
Keywords: multi-instance multi-label learning protein function prediction labels with directed acyclic graph structure label relationship |
|
|
|
|
|