---
title: "03 Probability Distributions. Crash Course in Statistics (Summer 2025)"
subtitle: "Neuroscience Center Zurich, University of Zurich"
author: "Zofia Baranczuk"
date: "2025-08-25"
output: pdf_document
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 1. Probability distributions

In R, distribution functions follow a **consistent naming convention**:

- **d*** : density function (continuous) or probability mass function (discrete)  
- **p*** : cumulative distribution function (CDF)  
- **q*** : quantile function (inverse CDF)  
- **r*** : random sampling  

Examples by family:  
- Normal: `dnorm`, `pnorm`, `qnorm`, `rnorm`  
- Binomial: `dbinom`, `pbinom`, `qbinom`, `rbinom`  
- Poisson: `dpois`, `ppois`, `qpois`, `rpois`  
- Exponential: `dexp`, `pexp`, `qexp`, `rexp`  
- Uniform: `dunif`, `punif`, `qunif`, `runif`

Try with R help: `?dnorm`, `?pbinom`, `?ppois`, `?rexp` (and equivalents for other families). 

Which results do you expect from the expressions below:

```{r}
pnorm(0)
pnorm(0, mean = 4, sd = 4) #in R, one uses sd and not the variance as the parameter
pnorm(0, 4, 4)

qnorm(0.5)
qnorm(0.999)

pbinom(q = 0, size = 2, prob = 0.5)

dbinom(x = 1, size = 2, prob = 0.8)
```


## 2. Example IQ. (03_Worksheet)
IQ scores are designed to follow a normal distribution with a mean of 100 and a standard deviation of 15. Using this, compute the following. For each of these points, add a corresponding plot. (It doesn't have to be aesthetically pleasing, but it should help you get the intuition what you are computing.)
a) P(IQ < 100)
b) P(IQ = 100)
c) P(IQ > 130)
d) P(95 < IQ < 120)
e) P(IQ < X) = 0.8. Compute X
f) P(|IQ-100| > X) = 0.2. Compute X
```{r }


```


## 3. Ploting: pmf, cdf, density. 

```{r}
# pmf:
x <- c(0:10)
p <- dbinom(x = x, prob = 0.5, size = 10)
plot(x, p)
plot(x, p, type = "h", lwd = 5, col = "Darkred")



# a cdf:
#first try:
cdf <- pbinom(q = x, prob = 0.5, size = 10)
plot(x, cdf) # only in the defined points, we would like to see the whole cdf

#second try:
x2 <- seq(from = -1, to = 11, by = 0.01)
cdf2 <- pbinom(q = x2, prob = 0.5, size = 10)
plot(x2, cdf2, lwd = 1, t = "l", col = "Darkmagenta") # that does not look like a function. Sufficient for intuition.

plot( stepfun( x, c(0,cdf)), ylab=bquote(F[X](x)), verticals=FALSE,  main='', pch=20)
# And finally a proper plot of a cdf.

# a density:
x3 <- seq(from = -5, to = 5, by = 0.01)
d <- dnorm(x = x3, mean = 1, sd = 0.75)
plot(x3, d, type = "l", col = "mistyrose3", lwd = 3)

plot(x3, d, type = "l", col = "mistyrose3", lwd = 3)
abline(v=2, col ="red")
```

## 4. Poisson Distribution
a) Visualize the PMF and CDF of X ~ Pois(lambda), for lambda = 2.

```{r PoissonDistribution }

par(mfrow = c(1, 2))
lambda <- 2
xvals <- 0:10
probs <- dpois(xvals, lambda)
plot(xvals, probs, type = "h", main = paste("PMF (lambda =", lambda, ")"),
       ylab = "P(X = x)", xlab = "x", col = "blue", lwd = 2)
cdf <- ppois(xvals, lambda)
plot(xvals, cdf, type = "s", main = paste("CDF (lambda =", lambda, ")"),
       ylab = "P(X ≤ x)", xlab = "x", col = "darkgreen", lwd = 2)

```

b) Sample from X1,...,Xn ∼ Pois(lambda) with n = 200 and draw a histogram. Compare the histograms with a). What do you expect to happen when n increases?

```{r}
set.seed(1) # "fixing the randomness" 
n <- 200
xs <- rpois(n, lambda = 2)

# to overlay with pmf
# breaks at half-integers => bars centered on integers
x_max <- max(xs)
breaks <- seq(-0.5, x_max + 0.5, by = 1)

hist(xs,
     breaks = breaks,
     probability = TRUE,          
     col = "mistyrose", border = "mistyrose4",
     xlab = "x", ylab = "Relative frequency",
     main = bquote("Sample histogram, n = " * .(n)))

# overlay the theoretical PMF on the same scale
x <- 0:x_max
pmf <- dpois(x, lambda)
segments(x0 = x, y0 = 0, x1 = x, y1 = pmf, lwd = 3, col = "deepskyblue4")
points(x, pmf, pch = 19, col = "deepskyblue4")

```

## 5. Q-Q Plots: Checking Normality and Shape

Generate samples from different distributions. Plot the Q-Q plots. Discuss the shape. 

```{r}
set.seed(2025)
n <- 1000
x_norm <- rnorm(n)
x_exp  <- rexp(n)
x_pois <- rpois(n, lambda = 2)
x_pois10 <- rpois(n, lambda = 10)
x_t <- rt(n, df = 5)
x_unif <- runif(n)

# Plot Q-Q plots side by side
par(mfrow = c(2, 3))

# a) Normal vs. Exponential
qqnorm(x_norm, main = "Normal Q-Q (rnorm)")
qqline(x_norm, col = "blue")

qqnorm(x_exp, main = "Q-Q (rexp)")
qqline(x_exp, col = "red")

# b) Poisson with lambda = 2 and λ = 10
qqnorm(x_pois, main = "Q-Q (Poisson λ = 2)")
qqline(x_pois, col = "darkgreen")

qqnorm(x_pois10, main = "Q-Q (Poisson λ = 10)")
qqline(x_pois10, col = "darkorange")

# c) t-distribution and Uniform
qqnorm(x_t, main = "Q-Q (t, df = 5)")
qqline(x_t, col = "purple")

qqnorm(x_unif, main = "Q-Q (Uniform)")
qqline(x_unif, col = "black")


# - Which distributions look approximately normal?
# - What do deviations from the line tell you?
# - How do tails and symmetry appear in Q-Q plots?

## Plotting empirical density:
plot(density(x_exp))

```