Volume 19, Issue 2 (9-2022)                   JSDP 2022, 19(2): 39-60 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Daneshpour N, mirabolghasemi S F. Missing Data Imputation in Multivariate Time Series Data. JSDP 2022; 19 (2) :39-60
URL: http://jsdp.rcisp.ac.ir/article-1-1104-en.html
Shahid Rajaee Teacher Training University
Abstract:   (378 Views)
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics,
astronomy, geography and finance. Many time series datasets contain missing data. Multivariate
time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of different techniques for time series missing data imputation, which usually include simple analytic methods and modeling in specific applications or univariate time series.

In this paper, a hybrid approach to obtain missing data is proposed. An improved version of inverse distance weighting (IDW) interpolation is used to missing data imputation. The IDW interpolation method has two major limitations: 1) finding closest points to missing data 2) Choosing the optimal effect power for missing data neighbors. Clustering has been used to remove the first constraint and find closest points to the missing data. With the help of clustering, the search radius and the number of input points that are supposed to be used in interpolation calculations are limited and controlled, and it is possible to determine which points are used to determine the value of a missing data.Therefore, most similar data to the missing data are found. In this paper, the k-maens clustering method is used to find similar data. This method has been more accurate than other clustering methods in multivariate time series.
Evolutionary algorithms are used to find the optimal effect power of each data point to remove the second constraint. Considering that each sample within each cluster has a different effect on the estimation of missing data, cuckoo search is used to find the effect on missing data. The cuckoo search algorithm is applied to the data of each cluster, and each data sample that has more similarity with the missing data has more influence, and each data sample that has less similarity has less influence and has less influence in determining the amount of missing data. Among evolutionary algorithms, evolutionary cuckoo search algorithm is used due to high convergence speed, much less probability of being trapped in local optimal points, and ability to quickly solve high dimensional optimization problems in multivariate time series problems.
To evaluate the performance of the proposed method, RMS, MAE, , MSE and MAPE criteria are used. Experimental results are investigated on four UCI datasets with different percentages of missingness and in general, the proposed algorithm performs better than the other three comparative methods with an average RMSE error of 0.05, MAE error of 0.04, MSE error of 0.003, and MAPE error of 5. The correlation between the actual data and the estimated value in the proposed method is about 99%.
Article number: 4
Full-Text [PDF 1666 kb]   (134 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2019/12/30 | Accepted: 2020/10/13 | Published: 2022/09/30 | ePublished: 2022/09/30

References
1. [1] R. H. Shumway and D. S. Stoffer, "Time series analysis and its applications: with R examples", Springer Science & Business Media, Fourth edition, 2017. [DOI:10.1007/978-3-319-52452-8]
2. [2] Ratanamahatana C., "Multimedia retrieval using time series representation and relevance feedback", in: Proceedings of 8th International Conference on Asian Digital Libraries, 2005, pp. 400-405. [DOI:10.1007/11599517_48]
3. [3] C.Ratanamahatana, V. Niennattrakul, "Clustering multimedia data using time series", in: Proceedings of the International Conference on Hybrid InformationTechnology, ICHIT '06, 2016, pp.372-379.
4. [4] M.S. Mahmoud, M.F. Emzir, "State estimation with asynchronous multi-rate multi-smart sensors", Information Sciences, vol.196, pp.15-27, 2012. [DOI:10.1016/j.ins.2012.01.034]
5. [5] S. Mohamed, T. Marwala, "Neural network based techniques for estimating missing data in databases", pp. 27-32, 2005.
6. [6] W. Qiao, Z. GAO, R.G. Harley, "Continuous on-line identification of nonlinear plants in power systems with missing sensor measurements", IEEE, pp. 1729-1734, 2005.
7. [7] J. Honaker, G. King, "What to do about missing values in time-series cross-section data", American Journal of Political Science, vol.54 (2), pp.561-581, 2010. [DOI:10.1111/j.1540-5907.2010.00447.x]
8. [8] J. Lin, E. Keogh, S. Lonardi, J. Lankford, D. Nystrom, "Visually mining and monitoring massive time series", in: Proceedings of 2004ACM SIGKDD International Conference on Knowledge Discovery and data Mining - KDD '04, 2004, 460-475. [DOI:10.1145/1014052.1014104]
9. [9] R.J.A. Little, D.B. Rubin, "Statistical analysis with missing data", 3rd Edition, 2014.
10. [10] M. Amiri, R. Jensen, "Missing data imputation using fuzzy-rough methods", Neurocomputing, vol.196, pp.15-27, 2016. [DOI:10.1016/j.neucom.2016.04.015]
11. [11] C.K. Enders, "Applied Missing Data Analysis", Guilford Press. ISBN 978-1-60623-639-0 .2010.
12. [12] D. M. Kreindler, C. J. Lumsden, "The effects of the irregular sample and missing data in time series analysis", Nonlinear Dynamics Psychology and Life Sciences, vol.10(2), pp.187-214, 2012.
13. [13] C.De Boor, E. Mathématicien, "A practical guide to splines", Mathematical Sciences, vol. 27, 2005.
14. [14] D. Mondal, D. B. Percival, "Wavelet variance analysis for gappy time series", Annals Inst. Stat. Math, vol.62, pp. 943-966, 2010. [DOI:10.1007/s10463-008-0195-z]
15. [15] K. Rehfeld, N. Marwan, J. Heitzig, "Comparison of correlation analysis techniques for irregularly sampled time series", Nonlinear Process. Geophys, vol.18, 2011. [DOI:10.5194/npg-18-389-2011]
16. [16] P.J. Garca-Laencina, J.-L Sancho-Gómez. "Pattern classification with missing data: a review", Neural Comput, vol.19, 2010. [DOI:10.1007/s00521-009-0295-6]
17. [17] R. Mazumder, T. Hastie, R. Tibshirani, "Spectral regularization algorithms for learning large incomplete matrices", Machine learning research, vol.11, pp. 2287-2322, 2010.
18. [18] Y. Koren, R. Bell, C. Volinsky, "Matrix factorization techniques for recommender systems", Comput, vol.42, 2009. [DOI:10.1109/MC.2009.263]
19. [19] I. R. White, P. Royston, A. M. Wood, "Multiple imputation using chained equations: issues and guidance for practice", Stat. medicine, vol.30, pp. 377-399, 2011. [DOI:10.1002/sim.4067] [PMID]
20. [20] B. J. Wells, K. M. Chagin, A. S. Nowacki, M. W. Kattan, "Strategies for handling missing data in electronic health record derived data," EGEMS 1, 2013. [DOI:10.13063/2327-9214.1035] [PMID] [PMCID]
21. [21] C. Lipton, C. Kale," Modeling Missing Data in Clinical Time Series with RNNs", Machine Learning for Healthcare, pp.56, 2016.
22. [22] Li. Li, J. Zhang, Y. Wang, "Missing Value Imputation for Traffic-Related Time Series Data Based on a Multi-View Learning Method", IEEE Transactions on Intelligent Transportation Systems, vol.20, pp. 2933 - 2943, 2019. [DOI:10.1109/TITS.2018.2869768]
23. [23] A. McLinden, V. Fioletov, W. Shephard, N. Krotkov, "Space-based detection of missing sulfur dioxide sources of global air pollution", Nature Geoscience, vol.9, pp. 496-500, 2016. [DOI:10.1038/ngeo2724]
24. [24] R. Mahmoudvand, P. Canas, "Missing value imputation in time series using Singular Spectrum Analysis", International Journal of Energy and Statistics, vol. 04,165005, 2016. [DOI:10.1142/S2335680416500058]
25. [25] N. Bokde, W. Beck, "A novel imputation methodology for time series based on pattern sequence forecasting", Pattern Recognition Letters, vol.116, pp.88-96, 2018. [DOI:10.1016/j.patrec.2018.09.020] [PMID] [PMCID]
26. [26] T.T. Hong Phan, E. Poisson Caillault, A. Lefebvre, A. Bigand, "Dynamic Time Warping-based imputation for univariate time series data", Pattern Recognition Letters, 2017.
27. [27] Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu."Recurrent Neural Networks for Multivariate Time Series with Missing Values", Scientific reports, vol.6085, pp.85-99, 2018. [DOI:10.1038/s41598-018-24271-9] [PMID] [PMCID]
28. [28] W.S. David, Wong. "Interpolation: Inverse‐Distance Weighting", The International Encyclopedia of Geography, pp.156-173, 2017.
29. [29] G. Mei, N. Xu & L. Xu, "Improving GPU-accelerated adaptive IDW interpolation algorithm using fast kNN search", pp.1389, 2016. [DOI:10.1186/s40064-016-3035-2] [PMID] [PMCID]
30. [30] J. Pratama, H. Pramoedyo, R. Fitriani, "comparison of inverse distance weighted and natural neighbor interpolation method at air temperature data in malang region", Cauchy, vol.5, pp.48-54, 2018. [DOI:10.18860/ca.v5i2.4722]
31. [31] S. Aghabozorgi, A. SeyedShirkhorshidi, T. YingWah,"Time-seriesclustering-A decadereview", Information Systems, vol.53, pp.16-38, 2015. [DOI:10.1016/j.is.2015.04.007]
32. [32] P. Roelofsen, "Time series clustering", Master thesis Business Analytic, Vrije Universiteit Amsterdam, 2018.
33. [33] Z. Bankó, J. Abonyi, "Correlation based dynamic time warping of multivariate time series", Expert Systems with Applications, vol.39, no.17, pp.12814-12823, 2012. [DOI:10.1016/j.eswa.2012.05.012]
34. [34] Guanyu Wang, "A Comparative Study of Cuckoo Algorithm and Ant Colony Algorithm in Optimal Path Problems", MATEC Web of Conferences 232, 03003, 2018. [DOI:10.1051/matecconf/201823203003]
35. [35] M. Jalal, M. Goharzay, "Cuckoo search algorithm for applied structural and design optimization: float system for experimental setups", Computational Design and Engineering, vol.6, no.159-172, 2018. [DOI:10.1016/j.jcde.2018.07.001]
36. [36] J. Tang, G. Zhang, Y. Wang, H. Wang, F. Liu, "A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation", Transportation Research Part C: Emerging Technologies, vol.51, no. 29-40, 2015. [DOI:10.1016/j.trc.2014.11.003]
37. [37] W.L. Junger, A.P. de Leon, "Imputation of missing data in time series for air pollutants", Atmospheric Environment, vol. 102, pp. 96-104.2015. [DOI:10.1016/j.atmosenv.2014.11.049]
38. [38] L. Folguera, J. Zupan, D. Cicerone, J.F. Magallanes, "Self-organizing maps for imputation of missing data in incomplete data matrices", Chemometrics and Intelligent Laboratory Systems, vol. 143, pp.146-151,2015. [DOI:10.1016/j.chemolab.2015.03.002]
39. [39] T.T. Hong Phan, E. Poisson Caillault, A. Lefebvre, A. Bigand, "Dynamic Time Warping-based imputation for univariate time series data", Pattern Recognition Letters S0167-8655(17)30275-1, 2017.
40. [40] N. Bokde, M. W. Beck, F. Marttinez," A novel imputation methodology for time series based on pattern sequence forecasting", Pattern Recognition Letters, pp. 88-96, 2018. [DOI:10.1016/j.patrec.2018.09.020] [PMID] [PMCID]
41. [41] T.T. Hong Phan, A. Bigand, É. P. Caillault," A New Fuzzy Logic-Based Similarity Measure Applied to Large Gap Imputation for Uncorrelated Multivariate Time Series", Computational Intelligence and Soft Computing, pp. 1-15, 2018. [DOI:10.1155/2018/9095683]
42. [42] J. Tang, G. Zhang, Y. Wang, H. Wang," A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation", Transportation Research, vol. 51, pp. 29-40, 2015. [DOI:10.1016/j.trc.2014.11.003]
43. [43] S. Sridevi, S. Rajaram, C. Parthiban, S. SibiArasan, C. Swadhikar, "Imputation for the Analysis of Missing Values and Prediction of Time Series Data", International Conference on Recent Trends in Information Technology,2011. [DOI:10.1109/ICRTIT.2011.5972466]
44. [44] C. O. Resende, A. Santana, F. Lobato, "Time series imputation using genetic programming and Lagrange interpolation", 5th Brazilian Conference on Intelligent Systems, 2016. [DOI:10.1109/BRACIS.2016.040]
45. [45] Y. Jane Nancya, H.Nehemiah Khannaa, K.Arputharaj, "Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework", vol.112, pp. 63-79, 2017. [DOI:10.1016/j.csda.2017.02.012]
46. [46] S. Aghabozorgi, A. SeyedShirkhorshidi, T. YingWah,"Time-seriesclustering-A decadereview", Information Systems, vol.53, pp.16-38, 2015. [DOI:10.1016/j.is.2015.04.007]
47. [47] P. Roelofsen, "Time series clustering", Master thesis Business Analytic, Vrije Universiteit Amsterdam, 2018.
48. [48] C. Cassisi, P. Montalto, M. Aliotta, A. Cannata, A. Pulvirenti, "Similarity measures and dimensionality reduction techniques for time series data mining", In A. Karahoca (Ed.), Advances in Data Mining Knowledge Discovery and Applications, Chapter 03, pp.71 - 96, 2012. [DOI:10.5772/49941]
49. [49] Aach, J. and G. M. Church. "Aligning gene expression time series with time warping algorithms", Bioinformatics, vol.17, no.6, pp. 495-508, 2001. [DOI:10.1093/bioinformatics/17.6.495] [PMID]
50. [50] T. Gorecki, "Classification of time series using combination of dtw and lcss dissimilarity measures", Communications in Statistics, Simulation and Computation, pp.1-14, 2017. [DOI:10.1080/03610918.2017.1280829]
51. [51] Y. Chen, X. Liu, X. Li, X. Liu, Y. Yao, G. Hu, F. Pei. "A dynamic time warping (DTW) distance based k -medoids method". Landscape and Urban Planning, vol.160, pp.48-60, 2017. [DOI:10.1016/j.landurbplan.2016.12.001]
52. [52] Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y." An efficient k-means clustering algorithm: analysis and implementation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, pp.881-892.2002. [DOI:10.1109/TPAMI.2002.1017616]
53. [53] Celebi, M. E., H. A. Kingravi, and P. A. Vela "A comparative study of efficient initialization methods for the k-means clustering algorithm". Expert Syst, Appl. 40(1), 200-210.2003. [DOI:10.1016/j.eswa.2012.07.021]
54. [54] R. Rajabioun, "Cuckoo Optimization Algorithm", Applied Soft Computing, Vol.11, No.8, pp. 5508- 5518, 2011. [DOI:10.1016/j.asoc.2011.05.008]
55. [55] F. Petitjean, A. Ketterlin, P. Gancarski, "A global averaging method for dynamic time warping, with applications to clustering", Pattern Recognition, vol. 44, no.3, pp. 678-693,2011. [DOI:10.1016/j.patcog.2010.09.013]
56. [56] M.Lichman, "UCI machine learning repository". http://archive.ics.uci.edu/ml .2013.
57. [57] W.L. Junger, A.P. de Leon, "Imputation of missing data in time series for air pollutants", Atmospheric Environment, vol.102, pp. 96-104,2015. [DOI:10.1016/j.atmosenv.2014.11.049]
58. [58] S.A Rahman, Y. Huang, J. Claassen, "Combining Fourier and Lagged k-Nearest Neighbor Imputation for Biomedical Time Series Data", Nathaniel Heintzman, and Samantha Kleinberg", J Biomed Inform, vol.58, pp.198-207, 2016. [DOI:10.1016/j.jbi.2015.10.004] [PMID] [PMCID]
59. [59] M. G. Rahman, M. Z. Islam, "Missing value imputation using a fuzzy clustering-based EM approach", Knowledge and Information Systems, vol.46 (2), pp. 389-422, 2016. [DOI:10.1007/s10115-015-0822-y]
60. [60] R. Deb, A. Liew. "Missing value imputation for the analysis of incomplete traffic accident data, " Information Sciences, vol.339, pp274-289 .2016. [DOI:10.1016/j.ins.2016.01.018]
61. [61] M.E. Quinteros, S. Lu, C. BlazquezCárdenas-R, J.P., X. Ossa, J.-M. DelgadoSaborit, R.M. Harrison, P. Ruiz-Rudolph, "Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile", Atmospheric Environment, 2018. [DOI:10.1016/j.atmosenv.2018.11.053]
62. [62] B. Golden, B. Grand, F. Rossi. "Mean Absolute Percentage Error for regression models", Neurocomputing, vol.192, pp.38-48. 2016. [DOI:10.1016/j.neucom.2015.12.114]
63. [63] M. Misuraca, M. Spano, S. Balbi, "BMS: An improved Dunn index for Document Clustering validation", Communications in Statistics, pp. 0361-0926, 2018. [DOI:10.1080/03610926.2018.1504968]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing