---
title: "08 Worksheet. Crash Course in Statistics (Summer 2025)"
subtitle: "Neuroscience Center Zurich, University of Zurich"
author: "Zofia Baranczuk"
date: "2025-08-25"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```



We consider the data set fat that describes body measurements used to predict the percentage of body fat in males.

The data set `fat` is available in the package AICcmodavg or in the Data folder of the course. 

We will use only the following variables:

- `Perc.body.fat.Siri` — Percent body fat using Siri's equation  
- `Age` - Age (years)  
- `Weight` - Weight (lbs)  
- `Height` - Height (inches)  
- `Neck.circ` - Neck circumference (cm)  
- `Chest.circ` - Chest circumference (cm)  
- `Abdomen.circ` - Abdomen circumference (cm)  
- `Hip.circ` - Hip circumference (cm)  
- `Thigh.circ` - Thigh circumference (cm)  
- `Knee.circ` - Knee circumference (cm)  
- `Ankle.circ` - Ankle circumference (cm)  
- `Biceps.circ` - Biceps circumference (cm)  
- `Forearm.circ` - Forearm circumference (cm)  
- `Wrist.circ` - Wrist circumference (cm)

## Load the data

Load the data. Inspect column names and basic structure.

```{r}
```

## EDA on selected variables

Create a data frame that only contains the variables listed above and perform an exploratory data analysis (summary statistics, plots).

```{r}

```

## Multiple regression (full model)

Fit a multiple linear regression with `Perc.body.fat.Siri` as the response and the remaining variables as predictors.

```{r}

```

## Assumptions & influencial points

Assuming independent errors, list other linear model assumptions. Check these (linearity, normality, equal variance, multicollinearity, etc.) and check for influential points (e.g., leverage, Cook's distance).

```{r}

```

## Model summary

Show the summary of the fitted model.

```{r}
```

## Significant predictors at 1%

Which predictors are statistically significant at the 1% significance level in this model? Which hypothesis is being tested. What does the p-value mean here?

```{r}
```

## Interpret the F-statistic

Explain the meaning of the F-statistic from the model summary - what hypothesis is being tested and how to interpret the value and p‑value in this context.

```{r}

```

## Model selection via AIC

Use an AIC-based selection (hint: `step()` or similar) to choose the best model. Comment on the selected model using its summary. From now on, use this selected model.

```{r}
```

## Effect of Wrist.circ difference

Holding all other predictors constant, what is the predicted difference in `Perc.body.fat.Siri` for a person with `Wrist.circ = 12` cm versus `Wrist.circ = 22` cm? 

```{r}
```

## Prediction & 95% PI

Compute the predicted value of `Perc.body.fat.Siri` for the following measurements (use only the ones that you need for the model), and report a 95% prediction interval.

- `Age = 25`  
- `Weight = 170`  
- `Height = 70`  
- `Neck.circ = 40`  
- `Chest.circ = 100`  
- `Abdomen.circ = 90`  
- `Hip.circ = 100`  
- `Thigh.circ = 60`  
- `Knee.circ = 40`  
- `Ankle.circ = 20`  
- `Biceps.circ = 30`  
- `Forearm.circ = 30`  
- `Wrist.circ = 20`

```{r}
```

## 99% CI for Wrist.circ coefficient; hypothesis

Using the selected model, compute the 99% confidence interval for the `Wrist.circ` coefficient. Interpret the p-value for the `Wrist.circ` coefficient and state the null and alternative hypotheses being tested.

```{r}
```

## Bootstrap CI

Compute the 99% confidence interval of the `Wrist.circ` coefficient via bootstrapping in the model above. Compare with the CI from the model.

```{r}
```

