Predicting employee turnover using tree-based ensemble ‎learning algorithms ‎

Mazarei, Seyede Mahboobe; pouramini, Jafar

doi:10.61186/jsdp.20.3.73

Volume 20, Issue 3 (12-2023) JSDP 2023, 20(3): 73-86 | Back to browse issues page

‎ 10.61186/jsdp.20.3.73

Mendeley

Zotero

RefWorks

Mazarei S M, pouramini J. Predicting employee turnover using tree-based ensemble ‎learning algorithms ‎. JSDP 2023; 20 (3) : 6
URL: http://jsdp.rcisp.ac.ir/article-1-1315-en.html

Predicting employee turnover using tree-based ensemble ‎learning algorithms ‎

Seyede Mahboobe Mazarei ^*

, Jafar Pouramini

Payame Noor University

Abstract: (1638 Views)

Abstract
Key employee's turnover is one of the most important concerns of Human Resource Managers (HRM); Because the organization by losing its valuable staff, suffers from the loss of skills and experience gained over the years, so predicting employee turnover helps HRMs to hire and retain permanent employees. One of the effective tools in this regard is the use of different data mining methods. Many researchers have done research in this field. This study reviewes recently published articles based on machine learning models, using Kaggle Human Resource (HR) databases [1-5] to compare them with this proposed models. In the article [9], the authors have selected 11 of the most important features by collecting common features from previous articles and filtering them using feature review and selection algorithms. After converting non-numerical variables to numerical and normalizing the data in the range [0,1], those attrition prediction approach is based on machine, deep and ensemble learning models and is experimented on a large-sized and a medium-sized simulated HR datasets and then a real small-sized dataset from a total of 450 responses. Those approach achieves higher Accuracy (0.96, 0.98 and 0.99 respectively) for the three datasets when compared previous solutions. In 2021, authors examined the relationship between features using Pearson correlation coefficient and selected 11 features with the highest correlation coefficient. Then used from six different machine learning algorithms including Random Forest (RF), Logistic Regression (LR), …, to predict employee turnover. The highest accuracy they obtained was 0.85 for RF [3]. In the article[1], the authors used two IBM datasets and a database containing HR information from a regional bank in the USA to predict employees turnover. After cleaning and preprocessing the data, the performance of 10 different machine learning algorithms such as Decision Tree (DT), RF, LR, Neural Network, …, was evaluated using ROC criteria on 10 small, medium, and large subsets of randomly selected, unassigned primary datasets. The average accuracy of algorithms is 0.83 in small datasets, 0.81 in medium datasets and 0.86 in large datasets. The authors of the paper [4] used three main experiments on IBM Watson simulated datasets to predict employees turnover. The first experiment involved training the original class-imbalanced dataset with the following machine learning models: support vector machine with several kernel functions, random forest and K-nearest neighbour (KNN). The second experiment focused on using adaptive synthetic (ADASYN) approach to overcome class imbalance, then retraining on the new dataset using the abovementioned machine learning models. As a result, training an ADASYN-balanced dataset with KNN (K = 3) achieved the highest performance, with 0.93 F1-score. this turnover prediction approach is based on tree-based ensemble learning models and is experimented on a large-sized standard simulated HR dataset (hr_data), including 15,000 samples with 10 features and a medium-sized (IBM) including 1470 samples with 34 features. The employees turnover rate in the IBM is 16.1% and in the hr_data is 23.8%, so datasets are unbalanced. To balance the data, the random-under-sampling technique and its combination of random-over-sampling with a ratio of 0.5965 for the IBM and 0.6558 for the hr_data has been used. In the preprocessing stage, Features with zero variance and samples containing the missing value were also removed. Then categorical (non-numeric) values were converted to binary fields and then All features were scaled using data normalization in [0,1]. In order to reduce the feature dimensions in the IBM dataset, we used the "Non-negative Matrix Factorization" (NMF) technique (n_components=17, max_iter=500) and For initialization, non-negative singular value analysis method with zeros filled with X value has been used. After reviewing and cleaning the data, in the processing stage, six different classification algorithms, including KNN (k=1), RF (number of trees= 1500), DT, ExtraTreesClassifier (number of trees= 1000) and Support Vector Classifier were training on 70% of data. The optimal value of the hyperparameters for the algorithms, was set using RandomizedSearchCV and GridSearchCV techniques. In order to investigate the effect of balancing and Dimensionality Reduction on the performance of models, experiments were performed in 3 stages (befor balancing, after balancing befor Dimensionality Reduction, after balancing and Dimensionality Reduction) on 30% of the remaining data. The results shown in Table (2-4) indicate that this proposed model, which uses tree-based optimized ensemble learning algorithms with data balancing and NMF dimensionality reduction method, increases the f1score of turnover prediction. In the hr_data dataset, the best f1score for the RandomForest algorithm was 99.52% and for the IBM HR dataset, the best f1score for the ExtraTreesClassifier algorithm was 95.82%, which is higher than previous research. Table 5 compares the results of previous research with this research. Since, the prediction of employee attrition will not be enough without finding the characteristics that affect it, therefore, after building models and evaluating their performance, using a combined feature selection method by averaging the results of the single-variable feature selection method called "SelectKBest", and A wrapper feature selection method called "Recursive feature elimination" (RFE) with four learning algorithms RF, DT, ExtraTreesClassifier and AdaBoost, the most effective features were selected. SelectKBest combines the chi2 univariate statistical test with the selection of K features based on the statistical result between the features and the target variable. Also, in the RFE method, machine learning algorithms are used to remove the least important features after recursive training, so that finally the number of features reaches the set number (17 features in this article). The performance results of the models based on the selected features are shown in Table 6. The most effective characteristics are "age", "daily rate", "over time", "NumCompaniesWorked" and, "monthly income" .

Article number: 6

Keywords: data mining, human resource management, ensemble learning, employee turnover

Full-Text [PDF 1033 kb] (443 Downloads)

Type of Study: Applicable | Subject: Paper
Received: 2022/05/31 | Accepted: 2023/07/18 | Published: 2024/01/14 | ePublished: 2024/01/14

References

1. [1] Y. Zhao, M. K. Hryniewicki, F. Cheng, B. Fu, and X. Zhu, "Employee turnover prediction with machine learning: A reliable approach," in Proceedings of SAI intelligent systems conference, 2018: Springer, pp. 737-758.

2. [2] N. B. Yahia, J. Hlel, and R. Colomo-Palacios, "From big data to deep data to support people analytics for employee attrition prediction," IEEE Access, vol. 9, pp. 60447-60458, 2021.

3. [3] M. Pratt, M. Boudhane, and S. Cakula, "Employee Attrition Estimation Using Random Forest Algorithm," Baltic Journal of Modern Computing, vol. 9, no. 1, pp. 49-66, 2021.

4. [4] S. S. Alduayj and K. Rajpoot, "Predicting employee attrition using machine learning," in 2018 international conference on innovations in information technology (iit), 2018: IEEE, pp. 93-98.

5. [5] A. Huda and N. Ardi, "Predictive Analytic on Human Resource Department Data Based on Uncertain Numeric Features Classification," Int. J. Interact. Mob. Technol., vol. 15, no. 8, pp. 172-181, 2021.

6. [6] M. Al Akasheh, E. F. Malik, O. Hujran, and N. Zaki, "A Decade of Research on Data Mining Techniques for Predicting Employee Turnover: A Systematic Literature Review," Available at SSRN 4401862.

7. [7] P. Ajit, "Prediction of employee turnover in organizations using machine learning algorithms," algorithms, vol. 4, no. 5, p. C5, 2016.

8. [8] A. M. Esmaieeli Sikaroudi, R. Ghousi, and A. Sikaroudi, "A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing", Journal of industrial and systems engineering, vol. 8, no. 4, pp. 106-121, 2015.

9. [9] S. H. Dolatabadi and F. Keynia, "Designing of customer and employee churn prediction model based on data mining method and neural predictor," in 2017 2nd International Conference on Computer and Communication Systems (ICCCS), 2017: IEEE, pp. 74-77.

10. [10] S. Khaziri Afravi, M. Sardari Zarchi, S. M. M. Fatemi Bushehri, "Factors affecting the tendency to leave the organization using algorithms based on multi-objective neural network and genetics", Human Resource Management in the Oil Industry, 1397.

11. [11] X. Cai et al., "DBGE: employee turnover prediction based on dynamic bipartite graph embedding," IEEE Access, vol. 8, pp. 10390-10402, 2020.

12. [12] P. K. Jain, M. Jain, and R. Pamula, "Explaining and predicting employees' attrition: a machine learning approach," SN Applied Sciences, vol. 2, pp. 1-11, 2020.

13. [13] M. Lazzari, J. M. Alvarez, and S. Ruggieri, "Predicting and explaining employee turnover intention," International Journal of Data Science and Analytics, vol. 14, no. 3, pp. 279-292, 2022.

14. [14] X. Gao, J. Wen, and C. Zhang, "An improved random forest algorithm for predicting employee turnover," Mathematical Problems in Engineering, vol. 2019, 2019.

15. [15] M. Teng, H. Zhu, C. Liu, and H. Xiong, "Exploiting network fusion for organizational turnover prediction," ACM Transactions on Management Information Systems (TMIS), vol. 12, no. 2, pp. 1-18, 2021.

16. [16] N. Jain, A. Tomar, and P. K. Jana, "A novel scheme for employee churn problem using multi-attribute decision making approach and machine learning," Journal of Intelligent Information Systems, vol. 56, pp. 279-302, 2021.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote