引用本文: | 黎刚果,王正志,王广云,等.一种基于线性流形的基因表达数据的聚类方法.[J].国防科技大学学报,2010,32(4):150-156.[点击复制] |
LI Gangguo,WANG Zhengzhi,WANG Guangyun,et al.A Clustering Method for Gene Expression Data Based on Linear Manifold[J].Journal of National University of Defense Technology,2010,32(4):150-156[点击复制] |
|
|
|
本文已被:浏览 6581次 下载 6287次 |
一种基于线性流形的基因表达数据的聚类方法 |
黎刚果, 王正志, 王广云, 倪青山, 强波 |
(国防科技大学 机电工程与自动化学院,湖南 长沙 410073)
|
摘要: |
由于基因表达数据的稀疏性和噪声性,传统聚类算法对其聚类时不能取得好的效果。针对这一问题,一种新的线性流形方法被提出,它的基本思想是搜索数据集中的线流形聚类,再将其中某些线流形聚类融合构造高维流形聚类。该算法将切向距离和法向距离作为线性流形的距离度量,运用空间近邻信息,采用聚类基因的平均表达水平作为转移向量,提高了聚类的准确度。实验结果表明,该算法的聚类准确性优于其它聚类算法,并且对带有噪声的数据可以保持较高的聚类准确度;在对Hela基因表达数据聚类时,算法得到了具有显著生物学意义的聚类。这些都说明提出的算法对基因表达数据聚类的适用性和有效性。 |
关键词: 基因表达数据 线性流形 子空间聚类 线流形 |
DOI: |
投稿日期:2010-01-05 |
基金项目:国家自然科学基金资助项目(60835005) |
|
A Clustering Method for Gene Expression Data Based on Linear Manifold |
LI Gangguo, WANG Zhengzhi, WANG Guangyun, NI Qingshan, QIANG Bo |
(College of Mechatronics Engineering and Automation, National Univ. of Defense Technology, Changsha 410073, China)
|
Abstract: |
Conventional clustering methods fail to obtain good clustering performances for gene expression data due to the inherent sparsity of data and the existence of noise. A new linear manifold clustering method was proposed to address this problem. The basic idea of this method is to search the line manifold clusters hidden in datasets and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters. The method considers the orthogonal distance and the tangent distance as the linear manifold distance metrics, utilizes spatial neighbor information and takes the real gene expression profile as the transition vector. The experimental results show the superiority of this method over other competing clustering methods in terms of clustering accuracy and the anti-noise capability of this method. Moreover, the proposed method is able to obtain some clusters with significant biological meaning for Hela gene expression data. All these demonstrate the method proposed is suitable and valid for the gene expression data clustering. |
Keywords: gene expresion data linear manifold subspace clustering line manifold |
|
|
|
|
|