Abstract:Analysis of short sequence motifs is an important component of gene sequence analysis. Information of motifs is usually used for identifying biological signals. However, the number of short sequence motifs is very large. If all of them are used for signal identification, there will be too many parameters, thus covering the main characteristics of the signal. To find out the key short sequence motifs for signal identification, in this paper, a stepwise strategy was adopted to rank motifs by their information gain. As a result, the motifs were selected orderly for signal identification. In so doing, good results were achieved with fewer motifs. Consisted with the selected motifs, maximum entropy model was used as the approximation of the true distribution of the signal sequences, thus realizing the identification of a given sequence. Finally, the model was used to identify 5'splice sites, and approving experiment results were achieved.