Abstract:Named Entity Recognition (NER) is a basic task in information extraction, and it is an important research direction in this domain to use the abundant unlabeled corpus to improve the performance of NER system. An approach combining self-training with active learning based on CRF (SACRF) is proposed. It selected samples by setting the threshold of confidence and 2-Gram frequency, and expanded the training set by annotating the unlabeled corpus manually and automatically. The experiments revealed that this approach can not only improve the precision and recall of NER system, but also reduce the manually annotation efforts greatly.