Univariate statistics
Note on the use of Generative AI tools (ChatGPT, Copilot, etc.): while not forbidden, I highly recommend to not use GenAI tools until you really understand the code that you are trying to write. GenAI tools are unreliable, and notoriously bad as a “learning” tool. They require a lot of micromanagement to be accurate. They can be helpful as a productivity boost for skilled developers, but will stop you from growing your own skills if used too soon.
Objectives: learn to quickly get a sense of the information presented in a dataset.
We will start by using the instrument_calibration.csv
file (provided on the Virtual University). It contains a synthetic
dataset about the calibration of different instruments, made at
different hospitals and by different operators.
instrument_calibration.csv
file.What conclusions can you draw from these information? How can you handle the missing values in the data?
Objectives: perform simple univariate hypothesis test, perform simple operations on datasets.
instrument_calibration.csv
file
so that you get, for each instrument, the distribution of the
difference between measured and target value.Which instrument(s) would you trust more ? How could you take the biases into account?
Objectives: compute confidence intervals on univariate data and test normality.
Hint: you can use the
qqplot
function from thestatsmodel.api
module.
Objectives: draw appropriate conclusions from experimental data.
We will use the instrument_experiment.csv
file. It
contains (synthetic) experimental data using the same hospital,
observers and instruments as before (i.e. the biases, precisions, etc.
that you computed still apply). This time, there are two measures: one
before an operation, and one after. It is expected
that, when the operation is successful, the measure decreases
by a certain amount.
Try to determine, from the data:
Justify your choices and propose appropriate visualizations, statistical tests and/or confidence intervals.