---
title: "06 Worksheet. Crash Course in Statistics (Summer 2025)"
subtitle: "Neuroscience Center Zurich, University of Zurich"
author: "Zofia Baranczuk"
date: "2025-08-25"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 1. Anorexia example (Paired design).
Anorexia is an eating disorder characterized by low weight, food restriction, fear of gaining weight, and a strong desire to be thin. The dataset `MASS::anorexia` lists pre- and post-treatment weights for patients assigned to several treatments.
install.package

Focus on the CBT (cognitive behavioral therapy) group and test whether the treatment was effective.

a) State the hypotheses for a two-sided test at alpha= 0.05. 

H0: the true mean of the difference between Prewt and Postwd is equal 0. 

b) Create an exploratory plot of weight change (histogram and Q–Q plot), if useful other plots.  
c) Run a paired t-test; also run a Wilcoxon signed-rank test as a robustness check.  
d) Report the mean change, a 95% CI for the mean change, p-values, and the paired effect size. 
e) Interpret the results.

```{r}
#install.packages("MASS")
library(MASS)
data(anorexia)
CBT <- filter(anorexia, Treat == "CBT" ) #histogram and QQ plot of all CBT treated
qqnorm(CBT$Postwt-CBT$Prewt) # qq plot of the difference
# as we are looking at the paired plot, it would be sufficient if the difference in normally distributed
hist(CBT$Postwt-CBT$Prewt)
tt <- t.test(CBT$Postwt-CBT$Prewt, mu = 0) #testing H0: the mean of the 
#distribution of the difference is 0.
#equivalent to:
t.test(CBT$Postwt, CBT$Prewt, paired = TRUE)
wt <- wilcox.test(CBT$Postwt, CBT$Prewt, paired = TRUE)

# effect size 
dif <- CBT$Postwt-CBT$Prewt
ef <- mean(dif) / sd(dif)


list(
  tpval       = tt$p.value,
  wilcoxonpval     = wt$p.value,
  mean_change  = tt$estimate,
  mean_change1 = mean(dif),
  ci_95        = tt$conf.int,
  ef           = ef 
)

# We see a small data set with q-q plot indicating the normality assumption might b=not be fulfilled. The t.test returns a significant result, but Wilcoxon might be a correct test selection here. According to the Wilcoxon test, we would not reject H0, as the p-value is above the selected significance level (5%). 
```

## 2. T-test vs. Wilcoxon test.
To get more intuition about t-test vs. the Wilcoxon test: Construct (simulate or choose) two independent samples for which the two-sample t-test is significant at alpha = 0.05 while the Wilcoxon rank-sum test is not. Then run two permutation tests using (i) the difference in means and (ii) the difference in medians as the test statistic.
```{r}
#other examples:
#t.test(c(1,2,3,4,7), c(5,6,8,9))
#wilcox.test(c(1,2,3,4,7), c(5,6,8,9))

#t.test(c(1,2,3,4,8), c(5,6,7,9,10))
#wilcox.test(c(1,2,3,4,8), c(5,6,7,9,10))

a <- c(1.1,0.8,2.1,2.3,3.1,2.9,11,10,10.1)
b <- c(8.8,8.9,9,9.1,9.2,9.3,9.4,9.5,9.9)
     t.test(a, b)
wilcox.test(a, b)


# or, an example with a situation where the data actullay come form a normal distirbution:
# set.seed(18)
#  a <- rnorm(10, 12, 1)
#  b <- rnorm(10,5, 10)
#  a
#  b
#  t.test(a,b)
#  wilcox.test(a, b)
 
```

```{r perm_mean}
perm_test_mean <- function(x, y) { # defining a new function
R <- 1000
tobs <- mean(x) - mean(y) # Store the ’original’ sample mean
all.data <- c(x, y) # Store the data without the group labels
tsim <- array(0, R) # Pre-allocation
for (i in 1:R) {
index <- sample(1:length(all.data), length(x), replace = F) # random permutation
meanxA <- mean(all.data[index]) # Sample mean of group A
meanxB <- mean(all.data[-index]) # Sample mean of group B
tsim[i] <- meanxA - meanxB # Difference for the current iteration
}
hist(tsim)
abline(v=tobs)
return(sum(abs(tsim) >= abs(tobs))/R) # Sample p-value
}
perm_test_mean(a,b)
```

```{r perm_median}
perm_test_median <- function(x, y) { # defining a new function
R <- 1000
tobs <- median(x) - median(y) # Store the ’original’ sample median
all.data <- c(x, y) # Store the data without the group labels
tsim <- array(0, R) # Pre-allocation
for (i in 1:R) {
index <- sample(1:length(all.data), length(x), replace = F) # random permutation
medianxA <- median(all.data[index]) # Sample median of group A
medianxB <- median(all.data[-index]) # Sample median of group B
tsim[i] <- medianxA - medianxB # Difference for the current iteration
}
hist(tsim)
abline(v=tobs)
return(sum(abs(tsim) >= abs(tobs))/R) # Sample p-value
}
perm_test_median(a,b)
```