Analysis of data
Here we are trying to predict kid’s test scores using their mother’s IQ, high school degree, work status, and age. Only a few of there predictor variables have a substantial impact on the kid_scores. As we shall see soon that we can improve the fit (by reducing the Adjusted R-squared) by eliminating a few of these variables. To understand the baseline result we begin by testing against the full model.
We can see that the variables “mom_workyes” and “mom_age” have high p-values.
We start by fitting simple linear regression models with only one predictor variable. First, create a list of the predictor variables to iterate over.
Fitting kids_score against each predictor variable in the list (“mom_hs” “mom_iq” “mom_work” “mom_age” ) we get the following adjusted R-squared values.
The adjusted R-square values demonstrate that the mother’s IQ would be the best predictor of high school scores.
Fitting all possible combinations is a lot of work (See ). We would rather use Python to perform those tasks. I would write a separate blog post to perform the same analysis using python.
We can, however, analyze a few of the models manually. We can perform MLR on models by removing one predictor variable at a time .