[期刊论文]


Missing value imputation using a fuzzy clustering-based EM approach

作   者:
Md. Geaur Rahman;Md Zahidul Islam;

出版年:2016

页     码:389 - 422
出版社:Springer Nature


摘   要:

Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and \(t\) test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.



关键字:

Data preprocessing ;Data cleansing ;Data quality ;Missing value imputation ;Fuzzy clustering


所属期刊
Knowledge and Information Systems
ISSN: 0219-1377
来自:Springer Nature