Skip to content

PaullikeAI/Google_Advanced_Data_Analytics_Capstone_Project

Repository files navigation

Google Advanced Data Analytics Capstone Project

Overview

The goal of this project was to create a model to predict whether or not an employee would leave the company. The final XGBoost model performed with 97.89% accuracy and a 93% recall of the employees leaving.

The features deemed to be most important in determining whether an employee would leave were employee satisfaction, time spent at the company and the number of projects they were working on. The number of hours worked was also a strong indicator within a subset of employees who worked very long hours.

Exploratory Data Analysis

This was a small dataset with 15,000 rows and 9 features. The dataset was imbalanced with 83.4% of employees not having left the company. The skew of each features distribution was analysed, along with the counts and percentages of certain features, such as salary. A chi-square analysis was then done between the features salary and department, as it was noted that a high percentage of employees in management had a high salary, and it was found that there was link between these features. Pair plots and correlation heatmaps were then used to visualise patterns in the data. Additionally monthly hours and last evaluation score were put plotted, and it was noted that every employee who worked for over 288 hours a month left the company.

Modelling and Evaluation

Several single models were tested, with Naïve Bayes as the baseline. The results are shown below.

Model Train Accuracy Test Accuracy
Naive Bayes 76.13% 77.32%
Decision tree 98.49% 98.55%
Logistic regression 84.11% 82.38%
Support Vector Machine 95.77% 96%

Ensemble models were then tested, results shown below.

Model Train Accuracy Test Accuracy
Random Forest 98.40% 98.17%
AdaBoost 96.31% 96.78%
XGBoost 98.45% 98.28%

The XGBoost model was chosen to dive deeper into the analysis of feature importances, feature permutations and SHAP. From these conclusion were drawn and business recommendations given.

Conclusions and Business Recommendations

To retain employees, I would make the following recommendations, based on the model results and EDA. Cap the number of projects an employee is working on a one time, to prevent burn out. Cap the number of overtime hours an employee is allowed to do, unless a special exception is made. The long hours do not help with employee satisfaction, so ensure employees are fairly rewarded for working them if absolutely necessary.

About

The Capstone Project for the Google Advanced Data Analytics course. Exploratory data analysis followed by feature engineering, model building and evaluation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors