Switch Editions?
Cancel
Sharing:
Title:
URL:
Channel: CodeSection,代码区,网络安全 - CodeSec
Viewing all articles

# Bayesian Linear Regression on the Swiss dataset

$0 0 Today we are again walking through a multivariate linear regression method (see my previous post on the topichere). This timehowever we discusstheBayesian approach and carry out all analysis and modeling in R. My relationship with R has been tempestuous to say the least, but the more I use it the more enjoyable it becomes. Import R libraries First thing to do is load up the libraries we’ll be using. For example we load the MASS libraryand get access to thestepAIC function and the dplyr library lets us use the piping operator %>%. library(ggplot2) library(GGally) library(dplyr) library(BAS) library(MASS) Please note: I will be using “=” in place of “<-" when writing R code because wordpress has a bad habit of changing my < characters in code snippets. The Swiss dataset The swiss dataset contains 47 observations on 6 variables. # Store the swiss dataframe in memory data(swiss) # Create a pairplot ggpairs(swiss) Each sample is for a province in Switzerland and we are given the fertility measure, % of males involved in an agriculture occupation, % of draftees receiving the highest mark on an army examination, % of draftees with education beyond primary school, % catholic population, and infant mortality rates. The data is from the year 1888 by the way. We’ll use Bayesian linear regression to model the fertility of the population, but first let’s start with a Frequentist approach: Ordinary Least Squares (OLS). Ordinary least squares For OLS we model as a function of with the equation: and solve for the parameters by minimizing the least squares objective function. In R this can be done as follows, wherefertility is modeled as a function of each feature (as indicated by the . in the model equation). swiss.lm_full = lm(formula = Fertility ~ ., data = swiss) What will happen if we try and plot the resulting line of best fit? # Set up dataframe containing predictions predict = data.frame(predict(swiss.lm_full)) predict$x = swiss$Agriculture names(predict) = c('y', 'x') # Plot data and predictions p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_line(data = predict, aes(x=x, y=y, color ='red', alpha=0.8), size=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict')) Expecting the line of best fit to be straight? We are fitting a model with 5 features so we would need 5-dimensionalspace to illustrate the linear hyperplane. Since none of us have 5-dimensions lying around we’ll just have to trust the math on this one. By now you may have already realized that the plot above is not even valid because we are simply drawing lines between predicted points. The figure should look like this: p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_point(data = predict, aes(x=x, y=y, color ='red'), size=3, shape=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict')) This is awful to look at and can better be interpreted as a residual plot, where we plot the differences between the black filled points and red hollow ones. The model above was trained on all of the features, but it may be better to use only a subset.One method of determining the optimal subset of features is withthe stepAIC function, which attempts to minimize the Bayesian Information Criterion (BIC) metric. This metric ranks the models according to goodness of fit but includes a penalty for having more parameters that goes as where is the number of parameters. stepAIC(lm(Fertility ~., data = swiss), k=log(nrow(rock))) As can be seen, the BIC was reduced by removing the “Examination” feature. After this step it was found that no lower value could be achieved by removing additional features and the algorithm ended. Bayesian linear regression In bayesian linear regression we write a similar equation to the OLS method: where represents the sample number and is the error of each sample.Before revealing how the parameters ,$\beta_1, …\$ are determined [1], let’s talk about the errors. By rearranging, we could calculate for a given sample by evaluating

. The errors are assumed to be normally distributed with mean of 0. We can check this assumption for the OLS swiss dataset model by solvingfor each and plotting the distribution. In other words, we plot a histogram of the residuals: # Compute errors errors = resid(swiss.lm_full) # Plot histogram and fitted line as.data.frame(errors) %>% ggplot(aes(errors)) + geom_histogram(binwidth=1.5, aes(y=..density..)) + geom_density(adjust=1.2, size=1, color='red') + xlim(-23, 23)

Even with this small dataset of 47 samples we see the normal distribution beginning to take shape, as suggestedwith the red curve.

In Bayesian regression we assign prior probability distributions to the parameters

and use a likelihood function to determine posterior using Bayes’ rule. For a given parameter

this rule can be stated as:

where

is the prior distribution of

,

is the posterior distribution given the data and the other term is the likelihood [2].

We can see how the posterior will in principle depend on the choice of both prior and likelihood, but in this post we never explicitly define any priors because they will be dominated by the likelihood under our BIC assumptions. For more details, check out the top answer to my stack exchange question .

Once we have determinedthe posterior distribution for each

we can set the parameters for our linear model. Our choice should depend on the loss function we wish to minimize. For a linear loss function we should take the mean and for a quadratic loss function (used in OLS) we should take the median. In this post our posteriors are symmetric, so each choice is equivalent.

To implement this in Rwe’ll import the BAS library and use the bas.lm function to evaluate a set of Bayesian models containing different combinations of features. We can then make predictions using various combinations of the resulting models.

swiss.lm_bay = bas.lm(Fertility ~ ., data = swiss,

Viewing all articles

### [MagicStar] 帮您推翻不在场证明 SP / アリバイ崩し承ります 特別編 EP02 [WEBDL] [1080p]

More Pages to Explore .....