Schedule

Calendar of resources

The material in this module is designed to be experienced in an intensive one week format followed by an assessment meant to showcase data science skills (e.g. a github project website that could be part of your cv). For enrolled students, the work will be supported with several live sessions during the main week of delivery.


Which do you choose: R or Python? (or both…)

Day Topics R Labs Python Labs Readings
Induction welcome activity Lab welcome exercise

Mon

live:

am

pm

*lecture videos password: data4life

00 Module overview

01 Introduction 1.1 1.2

02 Statistical learning 2.1 2.2

Lab guidance

Lab 01 Linear algebra fun

Lab 02 R programming refresh

Read Chapter 01 Brown 2023, install Python and Anaconda

Lab 02 stat learn

James et al. 2021 Ch 1,2

Efron 2020

Tues

am

pm

03 Linear regression 3.1 3.2

04 Classification 4.1 4.2

Lab 03 Linear regression

Lab 04 Classification

Lab 03 lin reg

Lab 04 Classification

James et al. 2021 Ch 3,4

Melesse 2018

Wed

am

pm

05 Bootstrapping 5.1 5.2

06 Model selection 6.1 6.2 6.3

Lab 05 Resampling

Lab 06 Model selection

Lab 05 Resampling

Lab 06 Model selection

James et al. 2021 Ch 5,6

Aho 2014

Thurs

(no vid) am

(no vid) pm

07 Non-linear models 7.1 7.2

08 Decision trees 8.1 8.2 8.3 8.4

Lab 07 Non-linear models

Lab 08 Decision trees

Lab 07 Nonlinear adventures

Lab 08 Decision trees

James et al. 2021 Ch 7,8

Barnard 2019

Otukei 2010

Fri

am

pm

09 Support vector machines 9.1 9.2

10 Unsupervised learning 10.1 10.2 10.3

Lab 09 SVM

Lab 10 (12) Unsupervised learning

Lab 09 SVM

Lab 10 (12) Unsupervised

James et al. 2021 Ch 9,12

Ebrahimi 2017

Howell 2020


References

Textbook: James et al. 2021 Introduction to statistical learning with Applications in R

Textbook: James et al. 2023 Introduction to statistical learning with Applications in Python

all refs zip

Aho, K., Derryberry, D., Peterson, T., 2014. Model selection for ecologists: the worldviews of AIC and BIC. Ecology 95, 631–636.

Barnard, D.M., Germino, M.J., Pilliod, D.S., Arkle, R.S., Applestein, C., Davidson, B.E., Fisk, M.R., 2019. Cannot see the random forest for the decision trees: selecting predictive models for restoration ecology. Restoration Ecology 27, 1053–1063.

Ebrahimi, M.A., Khoshtaghaza, M.H., Minaei, S., Jamshidi, B., 2017. Vision-based pest detection based on SVM classification method. Computers and Electronics in Agriculture 137, 52–58.

Efron, B., 2020. Prediction, Estimation, and Attribution. Journal of the American Statistical Association 115, 636–655.

Howell, O., Wenping, C., Marsland, R., Mehta, P., 2020. Machine learning as ecology. J. Phys. A: Math. Theor. 53, 334001.

James, G., Witten, D., Hastie, T., Tibshirani, R., 2021. An Introduction to Statistical Learning: with Applications in R, Springer Texts in Statistics 2ed. Springer-Verlag, New York.

Melesse, S., Sobratee, N., Workneh, T., 2016. Application of logistic regression statistical technique to evaluate tomato quality subjected to different pre- and post-harvest treatments. Biological Agriculture & Horticulture 32, 277–287.

Otukei, J.R., Blaschke, T., 2010. Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms. International Journal of Applied Earth Observation and Geoinformation, Supplement Issue on “Remote Sensing for Africa – A Special Collection from the African Association for Remote Sensing of the Environment (AARSE)” 12, S27–S31.



Harper Adams Data Science

Harper Data Science

This module is a part of the MSc in Data Science for Global Agriculture, Food, and Environment at Harper Adams University, led by Ed Harris.