Summary

These documents contain notes and completed exercises from the book An Introduction to Statistical Learning in R.

All pages were completed in RMarkdown with code written in R and equations written in LaTeX. Pages were knitted into HTML using knitr.

git was used for source control and GitHub Pages is used for hosting.

Caveats

These pages were put together to maximise my learnings from the book. As it is a learning exercise, there may be errors. If you see an error please raise an issue, referencing the chapter and the specific error. You are also most welcome to branch, fix the error, and issue a pull request.

I have approached the labs and applied exercises from a different perspective than the book. The book uses base R concepts, where I have used concepts taken from the tidyverse.

The chapters were completed during my downtime over the period of a year. During this time I learnt not just about statistical techniques, but also different approaches to wrangling data and different concepts within the R language and the tidyverse. As such the approach to a similar task may differ between the earlier and later chapters.

Chapter 2 - Statistical Learning

This chapter discusses statistical learning and assessing model accuracy.

Exercises

Chapter 3 - Linear Regression

This chapter discusses simple and multiple linear regressions, qualitative predictors, and comparisions to K-nearest neighbours.

Exercises

Chapter 4 - Classification

This chapter discusses logistic regression, linear and quadratic discriminant analysis, and K-nearest neighbours.

Exercises

Chapter 5 - Resampling Methods

This chapter discusses cross-validation, k-fold cross validation and bootstrapping.

Exercises

Chapter 6 - Linear Model Selection and Regularization

This chapter discusses subset selection (best subset, stepwise selection), shrinkage methods (ridge regression and lasso), and dimension reduction methods (principal components regression and partial least squares).

Exercises

Chapter 7 - Moving Beyond Linearity

This chapter discusses polynomials, splines, and general additative models (GAMs).

Exercises

Chapter 8 - Tree-based Methods

This chapter discusses regression and classification trees, as well as bagging, random forrests, and boosting.

Exercises

Chapter 9 - Support Vector Machines

This chapter discusses classification using hyperplanes, leading to support vector machines.

Exercises

Chapter 10 - Unsupervised Learning

This chapter discusses unsupervised learning techniques including PCA and hierarchical clustering.

Exercises