引用本文: | 刘艺,秦伟,李庚松,等.面向混合特征数据的粒子群填补方法.[J].国防科技大学学报,2024,46(6):107-112.[点击复制] |
LIU Yi,QIN Wei,LI Gengsong,et al.Particle swarm optimization based data imputation method for mixed features[J].Journal of National University of Defense Technology,2024,46(6):107-112[点击复制] |
|
|
|
本文已被:浏览 354次 下载 178次 |
面向混合特征数据的粒子群填补方法 |
刘艺,秦伟,李庚松,刘坤,王强,郑奇斌,任小广 |
(军事科学院, 北京 100091)
|
摘要: |
针对传统数据填补方法难以有效利用标签信息和缺失数据的随机信息的不足,提出面向混合型特征的粒子群优化填补算法。将连续型特征取值建模为高斯分布,均值和标准差作为优化参数。将离散型特征的取值概率作为参数进行优化。使用分类正确率作为优化目标,充分利用标签信息和缺失数据的随机信息。采用4种基于统计的方法和2种基于演化算法的填补方法作为对比,在6个典型的分类数据集上进行实验。结果表明,提出的方法在分类正确率指标上显著优于其他对比算法,同时具有较优的时间开销,能够有效解决混合特征数据缺失的问题。 |
关键词: 缺失数据 数据填补 粒子群优化 混合特征 分类 |
DOI:10.11887/j.cn.202406011 |
投稿日期:2022-07-15 |
基金项目:国家自然科学基金资助项目(91948303);国家自然科学基金青年科学基金资助项目(61802426) |
|
Particle swarm optimization based data imputation method for mixed features |
LIU Yi, QIN Wei, LI Gengsong, LIU Kun, WANG Qiang, ZHENG Qibin, REN Xiaoguang |
(Academy of Military Sciences, Beijing 100091, China)
|
Abstract: |
Aiming at the deficiency of traditional data imputation methods in effectively using the label information and random characteristics of missing data, a particle swarm optimization based imputation method for mixed features was proposed. The value of continuous feature was modeled as Gaussian distribution, and the mean and standard deviation were used as optimization parameters. The value probability of categorical features was optimized as a parameter. The classification accuracy rate was used as the optimization target to make full use of random information of label information and missing data. Four statistical methods and two evolutionary algorithm based imputation methods were used to compare the results on six typical classification datasets. The results show that the proposed method significantly outperforms other comparison algorithms in terms of classification accuracy indicator, and has better time overhead at the same time, which can effectively solve the data missing problems of mixed features. |
Keywords: missing data data imputation particle swarm optimization mixed features classification |
|
|
|
|
|