Abstract:Firstly, the MSERs (maximum stable extremal regions) which corresponded to Chinese strokes was extracted. The morphological close operation was used to connect the nearby MSERs. The fused MSER corresponded to Chinese characters. Gray level co-occurrence matric was used to describe the textural characteristics of the fused MSER rectangle. They were the input of CNN (convolutional neural network). The MSER rectangles were classified by CNN in order to filter none Chinese character rectangle. Then, Chinese text candidates were constructed by clustering MSER rectangles based on the features such as the color histogram Bhattacharyya distance of MSER rectangles. CNN was reused to classify Chinese text candidates to filter none Chinese text clusters. Finally, the rectangle of the remaining clusters was the Chinese text regions of natural scene image. Experiment shows that the proposed algorithm is desirable in localizing the Chinese text in natural scene images.