These documents contain notes and completed exercises from the book An Introduction to Statistical Learning in R.
All pages were completed in RMarkdown with code written in R and equations written in LaTeX. Pages were knitted into HTML using knitr.
git was used for source control and GitHub Pages is used for hosting.
These pages were put together to maximise my learnings from the book. As it is a learning exercise, there may be errors. If you see an error please raise an issue, referencing the chapter and the specific error. You are also most welcome to branch, fix the error, and issue a pull request.
I have approached the labs and applied exercises from a different perspective than the book. The book uses base R concepts, where I have used concepts taken from the tidyverse.
The chapters were completed during my downtime over the period of a year. During this time I learnt not just about statistical techniques, but also different approaches to wrangling data and different concepts within the R language and the tidyverse. As such the approach to a similar task may differ between the earlier and later chapters.
This chapter discusses statistical learning and assessing model accuracy.
This chapter discusses simple and multiple linear regressions, qualitative predictors, and comparisions to K-nearest neighbours.
This chapter discusses logistic regression, linear and quadratic discriminant analysis, and K-nearest neighbours.
This chapter discusses cross-validation, k-fold cross validation and bootstrapping.
This chapter discusses subset selection (best subset, stepwise selection), shrinkage methods (ridge regression and lasso), and dimension reduction methods (principal components regression and partial least squares).
This chapter discusses polynomials, splines, and general additative models (GAMs).
This chapter discusses regression and classification trees, as well as bagging, random forrests, and boosting.
This chapter discusses classification using hyperplanes, leading to support vector machines.
This chapter discusses unsupervised learning techniques including PCA and hierarchical clustering.