Abstract:Conventional clustering methods fail to obtain good clustering performances for gene expression data due to the inherent sparsity of data and the existence of noise. A new linear manifold clustering method was proposed to address this problem. The basic idea of this method is to search the line manifold clusters hidden in datasets and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters. The method considers the orthogonal distance and the tangent distance as the linear manifold distance metrics, utilizes spatial neighbor information and takes the real gene expression profile as the transition vector. The experimental results show the superiority of this method over other competing clustering methods in terms of clustering accuracy and the anti-noise capability of this method. Moreover, the proposed method is able to obtain some clusters with significant biological meaning for Hela gene expression data. All these demonstrate the method proposed is suitable and valid for the gene expression data clustering.