Signal and Data Processing

fa پرکردن داده‌های گمشده در داده‌های سری زمانی چندمتغیره Missing Data Imputation in Multivariate Time Series Data مقالات پردازش داده‌های رقمی Paper پژوهشي Research <div style="text-align: justify;">دادههای سری زمانی چندمتغیره در زمینه‌های مختلف مانند بیوانفورماتیک، زیست‌شناسی، ژنتیک، نجوم، علوم جغرافیایی و امور مالی یافت می‌شوند. بسیاری از این مجموعه‌داده‌ها دارای داده گمشده هستند. جایگذاری داده‌های گمشده سری زمانی چندمتغیره، یکی از مباحث چالش برانگیز است و قبل از فرایند یادگیری یا پیشبینی سری‌های زمانی باید با دقت مورد توجه و بررسی قرار گیرد. تحقیقات فراوانی در استفاده از روش‌های مختلف برای جایگذاری داده‌های گمشده سری زمانی انجام شده است که به‌طورمعمول شامل روش‌های تجزیه و تحلیل و مدل‌سازی‌های ساده در کاربردهای خاص و یا سری‌های زمانی تک‌متغیره هستند. در این مقاله یک نسخه بهبود‌یافته از درون‌یابی معکوس فاصله وزن‌دار برای جایگذاری داده‌های گمشده پیشنهاد شده‌ است. روش درون‌یابی معکوس فاصله وزن‌دار دو محدودیت اساسی دارد: 1) یافتن بهترین نقاط نزدیکتر به داده‌های گمشده 2) انتخاب توان تأثیر بهینه برای همسایگان داده گمشده. برای بهبود روش درون‌یابی، از خوشه‌بندی k-means استفاده شده ‌است، تا همسایه‌های با بیشترین شباهت به الگوی دادهای انتخاب شوند. از آنجا که میزان تأثیر هر یک از همسایه‌ها بر روی داده گمشده متفاوت است، از الگوریتم جستجوی فاخته برای تعیین توان تأثیر همسایگی استفاده میشود. برای ارزیابی عملکرد روش پیشنهادی، از پنج معیار ارزیابی شناخته‌شده ‌استفاده میشود. نتایج تجربی بر روی چهار مجموعه‌داده UCI با درصدهای مختلف گمشدگی مورد بررسی قرار گرفته و در‌مجموع الگوریتم پیشنهادی نسبت به سه روش مقایسه‌ای دیگر عملکرد بهتر و به‌طور میانگین حدود 05/0 خطای RMSE، 04/0 خطای MAE، 003/0 خطای MSE و  5 درصد خطای MAPE داشته است. میزان همبستگی داده‌های واقعی و مقدار برآورد‌شده در روش پیشنهادی بسیار مطلوب و در حدود 99 درصد است.</div> <div style="text-align: justify;">Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of different techniques for time series missing data imputation, which usually include simple analytic methods and modeling in specific applications or univariate time series. In this paper, a hybrid approach to obtain missing data is proposed. An improved version of inverse distance weighting (IDW) interpolation is used to missing data imputation. The IDW interpolation method has two major limitations: 1) finding closest points to missing data 2) Choosing the optimal effect power for missing data neighbors. Clustering has been used to remove the first constraint and find closest points to the missing data. With the help of clustering, the search radius and the number of input points that are supposed to be used in interpolation calculations are limited and controlled, and it is possible to determine which points are used to determine the value of a missing data.Therefore, most similar data to the missing data are found. In this paper, the k-maens clustering method is used to find similar data. This method has been more accurate than other clustering methods in multivariate time series. Evolutionary algorithms are used to find the optimal effect power of each data point to remove the second constraint. Considering that each sample within each cluster has a different effect on the estimation of missing data, cuckoo search is used to find the effect on missing data. The cuckoo search algorithm is applied to the data of each cluster, and each data sample that has more similarity with the missing data has more influence, and each data sample that has less similarity has less influence and has less influence in determining the amount of missing data. Among evolutionary algorithms, evolutionary cuckoo search algorithm is used due to high convergence speed, much less probability of being trapped in local optimal points, and ability to quickly solve high dimensional optimization problems in multivariate time series problems. To evaluate the performance of the proposed method, RMS, MAE,<img alt="" id="_x0000_i1025" o:ole="" src="file:///C:UsersGHASED~1.WANAppDataLocalTempmsohtmlclip1�1clip_image001.wmz" style="width:18.75pt; height:15.75pt" > , MSE and MAPE criteria are used. Experimental results are investigated on four UCI datasets with different percentages of missingness and in general, the proposed algorithm performs better than the other three comparative methods with an average RMSE error of 0.05, MAE error of 0.04, MSE error of 0.003, and MAPE error of 5. The correlation between the actual data and the estimated value in the proposed method is about 99%. </div> جایگذاری داده‌های گمشده, درون‌یابی IDW, الگوریتم جستجوی فاخته, خوشه‌بندی k‌-means, سری‌های زمانی چندمتغیره Missing Data imputation, IDW Interpolation, Cuckoo Search Algorithm, k-means Clustering, Multivariate Time Series 39 60 http://jsdp.rcisp.ac.ir/browse.php?a_code=A-10-815-6&slc_lang=fa&sid=1 Negin Daneshpour نگین دانشپور ndaneshpour@sru.ac.ir 100319475328460011251 100319475328460011251 Yes Shahid Rajaee Teacher Training University دانشکده مهندسی کامپیوتر، دانشگاه تربیت دبیر شهید رجایی Seyedeh fatemeh mirabolghasemi سیده فاطمه میرابوالقاسمی fmirabolghasemi@yahoo.com 100319475328460011252 100319475328460011252 No Shahid Rajaee Teacher Training University دانشکده مهندسی کامپیوتر، دانشگاه تربیت دبیر شهید رجایی