---
title: "07 Simulations – Crash Course in Statistics (Summer 2025)"
subtitle: "Neuroscience Center Zurich – University of Zurich"
author: "Dr. Zofia Baranczuk"
date: "2025-08-25"
output: pdf_document
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```



## 1. Multiple testing and type I error.
Run a simulation to show why multiple testing may lead to rejecting true hypothesis and how Bonferroni correction deals with that.


```{r multiple_testing}
# Idea: run multiple tests when H0 is true, check how often we get at least one significant result.








set.seed(1)
reps <- 1000
sigs <- numeric(reps)
sigs_bonferroni <- numeric(reps)
for (j in 1:reps) {
  N <- 100  #number of tests
  p_vals <- numeric(N) # prepared vector for the p-values for all the tests
  n <- 50 # sample_size
    for (i in 1:N){
    x <- rnorm(n)
    y <- rnorm(n)
    tt <- t.test(x, y, var.equal = TRUE, conf.level = 0.95)
    p_vals[i] <- tt$p.val
  }
  sigs[j] <- any(p_vals < 0.05)
  sigs_bonferroni[j] <- any(p_vals < 0.05/N)
}
mean(sigs)
mean(sigs_bonferroni)

```

## 2. Bonferroni correction, example 2. 
Bonferroni correction is strict when the tests are not independent. Run a simulation showing that.
```{r multiple_testing}
# Idea: run multiple tests when H0 is true, check how often we get at least one significant result
set.seed(1)
reps <- 1000
sigs <- numeric(reps)
sigs_bonferroni <- numeric(reps)
for (j in 1:reps) {
  N <- 100  #number of tests
  p_vals <- numeric(N) # prepared vector for the p-values for all the tests
  x_0 <- rnorm(n)
  y_0 <- rnorm(n)
  n <- 50 # sample_size
    for (i in 1:N){
    x <- x_0 + rnorm(n, 0, 0.3)
    y <- y_0 + rnorm(n, 0, 0.3)
    tt <- t.test(x, y, var.equal = TRUE, conf.level = 0.95)
    p_vals[i] <- tt$p.val
  }
  sigs[j] <- any(p_vals < 0.05)
  sigs_bonferroni[j] <- any(p_vals < 0.05/N)
}
mean(sigs)
mean(sigs_bonferroni)

```



## 3. Distribution of p-value.
Run a simulation to show the distribution of p-values when the hypothesis 0 is true, and the assumptions are met.  

```{r p_value1}
set.seed(1)
N <- 10000
pvals <- numeric(N)
for (i in 1:N){
  tt <- t.test(rnorm(10), rnorm(10))
  pvals[i] <- tt$p.value
}
hist(pvals)
mean(pvals < 0.05)
```

## 4. Distribution of p-value 3.
Run a simulation to show the distribution of p-values when the hypothesis 0 is true, and the assumptions are not met.

```{r p_value2}

set.seed(1)
N <- 10000
pvals <- numeric(N)
for (i in 1:N){
  tt <- t.test(rexp(10), rexp(10))
  pvals[i] <- tt$p.value
}
hist(pvals)
mean(pvals < 0.05)

```

## 5. Simulate power of a t-test. 
```{r}

set.seed(1)
N <- 1000
n <- 30
pvals <- numeric(N)
for (i in 1:N){
  x <- rnorm(n)
  y <- rnorm(n, mean = 0.75) 
  pvals[i] <- t.test(x, y)$p.value
}
mean(pvals < 0.05)

```
6. Central Limit Theorem
```{r}
sample_means <- replicate(10000, mean(rexp(200)))

x <- seq(0, 2, length.out = 1000)
hist(sample_means, breaks = 30, main = "CLT: Sampling distribution of the mean", freq = FALSE)
lines(x, dnorm(x, mean = 1, sd = 1 / sqrt(200)), col = "red", lwd = 2)
```