引用本文: | 杜耀华,敖伟,倪青山,等.一种基于组合特征的大肠杆菌σ70启动子识别算法.[J].国防科技大学学报,2005,27(6):113-119.[点击复制] |
DU Yaohua,AO Wei,NI Qingshan,et al.A Combined Features Algorithm for Prediction of E.coli σ70Promoter Regions[J].Journal of National University of Defense Technology,2005,27(6):113-119[点击复制] |
|
|
|
本文已被:浏览 6247次 下载 5932次 |
一种基于组合特征的大肠杆菌σ70启动子识别算法 |
杜耀华, 敖伟, 倪青山, 王正志 |
(国防科技大学 机电工程与自动化学院,湖南 长沙 410073)
|
摘要: |
启动子识别是研究基因转录调控的重要环节,但目前算法的识别正确率偏低。在深入分析启动子生物特征的基础上,提出了一种基于多种特征组合的大肠杆菌σ70启动子识别算法,在启动子序列的组成特征、信号特征和结构特征中选取10种典型特征,以此为依据,对位于非编码区和编码区内部的启动子分别加以识别。首先通过特征描述模型分别计算各种特征在启动子序列和非启动子序列中的得分,将特征得分组合成10维特征向量,再利用二次判别分析法在特征向量集上进行训练和识别。在实际数据集中进行的刀切法测试验证了算法的有效性。对位于非编码区的启动子,平均正确率达到了86.7%,明显优于其它算法;对位于编码区内部的启动子,平均正确率也达到了82.4%。算法还具有良好的可扩展性,能够方便地容纳新特征,使识别性能不断提高。 |
关键词: 大肠杆菌 σ70启动子识别 组合特征 二次判别分析法 刀切法 |
DOI: |
投稿日期:2005-06-13 |
基金项目:国家自然科学基金资助项目(60471003) |
|
A Combined Features Algorithm for Prediction of E.coli σ70Promoter Regions |
DU Yaohua, AO Wei, NI Qingshan, WANG Zhengzhi |
(College of Mechatronics Engineering and Automation,National Univ. of Defense Technology, Changsha 410073, China)
|
Abstract: |
Promoter identification is an essential task in the research of transcription regulation, but computational prediction of promoters has been one of the most elusive problems despite considerable effort devoted to the study. A new prediction algorithm based on the combined features for E. coli σ70promoters is proposed. According to their location, all promoters can be classified into two classes: promoters in non-coding regions and promoters in gene regions, and will be processed respectively. In each region, the features of primary sequence,including 1 content feature、5 signal features and 4 structure features, are combined and defined as a 10 dimensional vector, then the vector of combined features is further used by quadratic discriminant analysis to predict the potential promoter regions. The algorithm has been trained and tested on E.coli σ70 promoter dataset by the jackknife method. The average prediction accuracies for “non-coding” promoters and “coding” promoters are 86.7% and 82.4%, respectively. The results indicate that our algorithm outperforms most of the existing approaches based on several performance measurements. Furthermore, algorithm framework is extendable and can accept more new features to improve the prediction results efficiently. |
Keywords: escherichia coli σ70promoter prediction combined features quadratic discriminant analysis (QDA) jackknife method |
|
|
|
|
|