How high is your income?

Predicting income based on your background


Income is very important because it influences the choices we make in our lives. The quality of our health and lifestyle will be different if we have higher or lower income. I think we should always strive to have a higher income, because when we do, we will have more time to do other things that will make us happy. However, it is sometimes hard to pinpoint the variables that play in order to have a higher income.


This dataset is from a census in the United States about the people’s background and if they make above or below $50,000.


Before I start the models, I need to establish a baseline, so that I can compare if they will make the prediction more accurate.


Now, that I have established a baseline, I can start the models. I chose two models, so I can compare which model will give more accurate prediction.

  1. Random Forest Classifier Model


Logistic Regression Model:

Training Accuracy:     0.8435376084174605
Validation Accuracy: 0.8445177434030937
Training Accuracy:     0.9528224086449595
Validation Accuracy: 0.8409918107370337

ROC Curve

Since the accuracy scores are very close, we will do another visual test to see the model that has the higher area in under the line.


ROC-AUC Score is the area under the curve.

Logistic Regression ROC-AUC:            0.7501116955668395
Random Forest Classification ROC-AUC: 0.7582744751152407

Tune Model

Because the Random Forest has a higher ROC-AUC score, I will tune this model for hyper parameters and try to improve accuracy, and then I will check if this will give more accurate income prediction.

Training Accuracy:    0.8718043509171051
Validation Accuracy: 0.8620336669699727

Top 5

Earlier, I wanted to check the top 5 features that affect income, and this is the order of importance for the features:

Feature Importance


I think this just shows that education can really help us to be competitive and able to perform the tasks required in different job positions. We will also have more income with with time and experience. It is also wise to manage our assets properly in order to have capital gains. Of course, it is also important to effectively use our time while working. Our relationships and responsibilities are also great driving force for higher income. With all of these information presented, I think time is at our advantage. We have to effectively use our resources in order to improve ourselves, our decisions, and our wellbeing.

Data Science Student

