STAT-H400 - Lab 6

Regression

Objectives

Note on the use of Generative AI tools (ChatGPT, Copilot, etc.): while not forbidden, I highly recommend to not use GenAI tools until you really understand the code that you are trying to write. GenAI tools are unreliable, and notoriously bad as a “learning” tool. They require a lot of micromanagement to be accurate. They can be helpful as a productivity boost for skilled developers, but will stop you from growing your own skills if used too soon.

For this lab, we will use the “Alzheimer” dataset from the previous labs. Our objective will be to see if we can predict the “ADL” (capacity to perform activities in daily life) from the “MMSE” and “FunctionalAssessment” variables (standardized cognitive tests).

Exercise 1

  1. On the Alzheimer data, train a LinearRegression model with ADL as the target and MMSE and FunctionalAssessment as the input variables.
  2. Using cross-validation, compare the results of the LinearRegression with the result of Ridge regression.
  3. Check the coefficients of the linear regression to determine the influence of each input variable on the result.
  4. Using the RMSE metric, compare the results of LinearRegression and Ridge regression on the test set. Use the appropriate statistical test for the hypothesis that both methods have the same results.