Quantcast
Channel: CodeSection,代码区,网络安全 - CodeSec
Viewing all articles
Browse latest Browse all 12749

Bayesian Linear Regression on the Swiss dataset

$
0
0

Today we are again walking through a multivariate linear regression method (see my previous post on the topichere). This timehowever we discusstheBayesian approach and carry out all analysis and modeling in R. My relationship with R has been tempestuous to say the least, but the more I use it the more enjoyable it becomes.

Import R libraries

First thing to do is load up the libraries we’ll be using. For example we load the MASS libraryand get access to thestepAIC function and the dplyr library lets us use the piping operator %>%.

library(ggplot2) library(GGally) library(dplyr) library(BAS) library(MASS)

Please note: I will be using “=” in place of “<-" when writing R code because wordpress has a bad habit of changing my < characters in code snippets.

The Swiss dataset

The swiss dataset contains 47 observations on 6 variables.

# Store the swiss dataframe in memory data(swiss) # Create a pairplot ggpairs(swiss)
Bayesian Linear Regression on the Swiss dataset

Each sample is for a province in Switzerland and we are given the fertility measure, % of males involved in an agriculture occupation, % of draftees receiving the highest mark on an army examination, % of draftees with education beyond primary school, % catholic population, and infant mortality rates. The data is from the year 1888 by the way. We’ll use Bayesian linear regression to model the fertility of the population, but first let’s start with a Frequentist approach: Ordinary Least Squares (OLS).

Ordinary least squares For OLS we model as a function of
Bayesian Linear Regression on the Swiss dataset
with the equation:
Bayesian Linear Regression on the Swiss dataset
and solve for the parameters
Bayesian Linear Regression on the Swiss dataset
by minimizing the least squares objective function.

In R this can be done as follows, wherefertility is modeled as a function of each feature (as indicated by the . in the model equation).

swiss.lm_full = lm(formula = Fertility ~ ., data = swiss)

What will happen if we try and plot the resulting line of best fit?

# Set up dataframe containing predictions predict = data.frame(predict(swiss.lm_full)) predict$x = swiss$Agriculture names(predict) = c('y', 'x') # Plot data and predictions p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_line(data = predict, aes(x=x, y=y, color ='red', alpha=0.8), size=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict'))
Bayesian Linear Regression on the Swiss dataset

Expecting the line of best fit to be straight? We are fitting a model with 5 features so we would need 5-dimensionalspace to illustrate the linear hyperplane. Since none of us have 5-dimensions lying around we’ll just have to trust the math on this one. By now you may have already realized that the plot above is not even valid because we are simply drawing lines between predicted points. The figure should look like this:

p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_point(data = predict, aes(x=x, y=y, color ='red'), size=3, shape=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict'))
Bayesian Linear Regression on the Swiss dataset

This is awful to look at and can better be interpreted as a residual plot, where we plot the differences between the black filled points and red hollow ones.


Bayesian Linear Regression on the Swiss dataset
The model above was trained on all of the features, but it may be better to use only a subset.One method of determining the optimal subset of features is withthe stepAIC function, which attempts to minimize the Bayesian Information Criterion (BIC) metric. This metric ranks the models according to goodness of fit but includes a penalty for having more parameters that goes as
Bayesian Linear Regression on the Swiss dataset
where is the number of parameters. stepAIC(lm(Fertility ~., data = swiss), k=log(nrow(rock)))
Bayesian Linear Regression on the Swiss dataset

As can be seen, the BIC was reduced by removing the “Examination” feature. After this step it was found that no lower value could be achieved by removing additional features and the algorithm ended.

Bayesian linear regression

In bayesian linear regression we write a similar equation to the OLS method:


Bayesian Linear Regression on the Swiss dataset
where represents the sample number and is the error of each sample.Before revealing how the parameters
Bayesian Linear Regression on the Swiss dataset
, $\beta_1, …$ are determined [1], let’s talk about the errors. By rearranging, we could calculate for a given sample by evaluating
Bayesian Linear Regression on the Swiss dataset
. The errors are assumed to be normally distributed with mean of 0. We can check this assumption for the OLS swiss dataset model by solvingfor each and plotting the distribution. In other words, we plot a histogram of the residuals: # Compute errors errors = resid(swiss.lm_full) # Plot histogram and fitted line as.data.frame(errors) %>% ggplot(aes(errors)) + geom_histogram(binwidth=1.5, aes(y=..density..)) + geom_density(adjust=1.2, size=1, color='red') + xlim(-23, 23)
Bayesian Linear Regression on the Swiss dataset

Even with this small dataset of 47 samples we see the normal distribution beginning to take shape, as suggestedwith the red curve.

In Bayesian regression we assign prior probability distributions to the parameters
Bayesian Linear Regression on the Swiss dataset
and use a likelihood function to determine posterior using Bayes’ rule. For a given parameter
Bayesian Linear Regression on the Swiss dataset
this rule can be stated as:
Bayesian Linear Regression on the Swiss dataset
where
Bayesian Linear Regression on the Swiss dataset
is the prior distribution of
Bayesian Linear Regression on the Swiss dataset
,
Bayesian Linear Regression on the Swiss dataset
is the posterior distribution given the data and the other term is the likelihood [2].

We can see how the posterior will in principle depend on the choice of both prior and likelihood, but in this post we never explicitly define any priors because they will be dominated by the likelihood under our BIC assumptions. For more details, check out the top answer to my stack exchange question .

Once we have determinedthe posterior distribution for each
Bayesian Linear Regression on the Swiss dataset
we can set the parameters for our linear model. Our choice should depend on the loss function we wish to minimize. For a linear loss function we should take the mean and for a quadratic loss function (used in OLS) we should take the median. In this post our posteriors are symmetric, so each choice is equivalent.

To implement this in Rwe’ll import the BAS library and use the bas.lm function to evaluate a set of Bayesian models containing different combinations of features. We can then make predictions using various combinations of the resulting models.

swiss.lm_bay = bas.lm(Fertility ~ ., data = swiss,

Viewing all articles
Browse latest Browse all 12749

Latest Images

Trending Articles





Latest Images