Abstract:In data engineering, the enhancement of processing effectiveness for real-world massive raw data, characterized by being multi-source, heterogeneous, and high-noise, via the application of artificial intelligence methods is currently regarded as a research hotspot. Based on the general research framework of data engineering, the latest research progress in intelligent data engineering methods was systematically reviewed in accordance with the design of three key stages: data cleaning, data linking, and data discovery. Additionally, the principles and effectiveness of the methods related to each key stage were analyzed in detail. Furthermore, an outlook on future research in data engineering is provided in combination with the development trends of intelligence.