Bayesian Linear Regression on the Swiss dataset

Today we are again walking through a multivariate linear regression method (see my previous post on the topichere). This timehowever we discusstheBayesian approach and carry out all analysis and modeling in R. My relationship with R has been tempestuous to say the least, but the more I use it the more enjoyable it becomes.

Import R libraries

First thing to do is load up the libraries we’ll be using. For example we load the MASS libraryand get access to thestepAIC function and the dplyr library lets us use the piping operator %>%.

library(ggplot2) library(GGally) library(dplyr) library(BAS) library(MASS)

Please note: I will be using “=” in place of “<-" when writing R code because wordpress has a bad habit of changing my < characters in code snippets.

The Swiss dataset

The swiss dataset contains 47 observations on 6 variables.

# Store the swiss dataframe in memory data(swiss) # Create a pairplot ggpairs(swiss)
Bayesian Linear Regression on the Swiss dataset

Each sample is for a province in Switzerland and we are given the fertility measure, % of males involved in an agriculture occupation, % of draftees receiving the highest mark on an army examination, % of draftees with education beyond primary school, % catholic population, and infant mortality rates. The data is from the year 1888 by the way. We’ll use Bayesian linear regression to model the fertility of the population, but first let’s start with a Frequentist approach: Ordinary Least Squares (OLS).

Ordinary least squares For OLS we model as a function of
Bayesian Linear Regression on the Swiss dataset

with the equation:
Bayesian Linear Regression on the Swiss dataset

and solve for the parameters
Bayesian Linear Regression on the Swiss dataset

by minimizing the least squares objective function.

In R this can be done as follows, wherefertility is modeled as a function of each feature (as indicated by the . in the model equation).

swiss.lm_full = lm(formula = Fertility ~ ., data = swiss)

What will happen if we try and plot the resulting line of best fit?

# Set up dataframe containing predictions predict = data.frame(predict(swiss.lm_full)) predict$x = swiss$Agriculture names(predict) = c('y', 'x') # Plot data and predictions p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_line(data = predict, aes(x=x, y=y, color ='red', alpha=0.8), size=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict'))
Bayesian Linear Regression on the Swiss dataset

Expecting the line of best fit to be straight? We are fitting a model with 5 features so we would need 5-dimensionalspace to illustrate the linear hyperplane. Since none of us have 5-dimensions lying around we’ll just have to trust the math on this one. By now you may have already realized that the plot above is not even valid because we are simply drawing lines between predicted points. The figure should look like this:

p = ggplot() + geom_point(data = swiss, aes(Agriculture, Fertility, color='black'), size=3) p = p + geom_point(data = predict, aes(x=x, y=y, color ='red'), size=3, shape=1) p + scale_colour_manual(name='', values=c('black', 'red'), labels=c('y_true', 'y_predict'))
Bayesian Linear Regression on the Swiss dataset

This is awful to look at and can better be interpreted as a residual plot, where we plot the differences between the black filled points and red hollow ones.

The model above was trained on all of the features, but it may be better to use only a subset.One method of determining the optimal subset of features is withthe stepAIC function, which attempts to minimize the Bayesian Information Criterion (BIC) metric. This metric ranks the models according to goodness of fit but includes a penalty for having more parameters that goes as
Bayesian Linear Regression on the Swiss dataset

where is the number of parameters. stepAIC(lm(Fertility ~., data = swiss), k=log(nrow(rock)))
Bayesian Linear Regression on the Swiss dataset

As can be seen, the BIC was reduced by removing the “Examination” feature. After this step it was found that no lower value could be achieved by removing additional features and the algorithm ended.

Bayesian linear regression

In bayesian linear regression we write a similar equation to the OLS method:

where represents the sample number and is the error of each sample.Before revealing how the parameters
Bayesian Linear Regression on the Swiss dataset

, $\beta_1, …$ are determined [1], let’s talk about the errors. By rearranging, we could calculate for a given sample by evaluating
Bayesian Linear Regression on the Swiss dataset

. The errors are assumed to be normally distributed with mean of 0. We can check this assumption for the OLS swiss dataset model by solvingfor each and plotting the distribution. In other words, we plot a histogram of the residuals: # Compute errors errors = resid(swiss.lm_full) # Plot histogram and fitted line as.data.frame(errors) %>% ggplot(aes(errors)) + geom_histogram(binwidth=1.5, aes(y=..density..)) + geom_density(adjust=1.2, size=1, color='red') + xlim(-23, 23)
Bayesian Linear Regression on the Swiss dataset

Even with this small dataset of 47 samples we see the normal distribution beginning to take shape, as suggestedwith the red curve.

In Bayesian regression we assign prior probability distributions to the parameters
Bayesian Linear Regression on the Swiss dataset

and use a likelihood function to determine posterior using Bayes’ rule. For a given parameter
Bayesian Linear Regression on the Swiss dataset

this rule can be stated as:
Bayesian Linear Regression on the Swiss dataset

where

is the prior distribution of
Bayesian Linear Regression on the Swiss dataset

is the posterior distribution given the data and the other term is the likelihood [2].

We can see how the posterior will in principle depend on the choice of both prior and likelihood, but in this post we never explicitly define any priors because they will be dominated by the likelihood under our BIC assumptions. For more details, check out the top answer to my stack exchange question .

Once we have determinedthe posterior distribution for each
Bayesian Linear Regression on the Swiss dataset

we can set the parameters for our linear model. Our choice should depend on the loss function we wish to minimize. For a linear loss function we should take the mean and for a quadratic loss function (used in OLS) we should take the median. In this post our posteriors are symmetric, so each choice is equivalent.

To implement this in Rwe’ll import the BAS library and use the bas.lm function to evaluate a set of Bayesian models containing different combinations of features. We can then make predictions using various combinations of the resulting models.

swiss.lm_bay = bas.lm(Fertility ~ ., data = swiss,

Bayesian Linear Regression on the Swiss dataset

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本