---
title: "04-05 Worksheet. Crash Course in Statistics (Summer 2025)"
subtitle: "Neuroscience Center Zurich, University of Zurich"
author: "Zofia Baranczuk"
date: "2025-08-25"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

  
## 1. GDP and depression.
04:
We will work with GDP, country and depression data as in 02_DataSets and 04_DataWrangling.
Load GDP, country, and depression data.
Merge the data so that you can do the following:
- compute median, mean, min, max depression per region and the number of countries per region. 
05:
- plot GDP vs. depression using different colors for different regions for one selected year. You can choose any indicator about the depression.
- Check if some countries are dropped due to missing depression or GDP for the chosen year.
- What other plots could be a good visualization for the data? Add these plots. 

```{r packages}
library(dplyr) 
library(tidyr) 
library(readr) 
library(here)
library(ggplot2)

```

```{r prep_data}
GDP <- read_csv(here("Data", "GDP.csv"))
country <- read_csv(here("Data", "Country_region.csv"))
depression <- read_csv(here("Data", "depression-rates-by-country-2025.csv"))
GDP_region <- left_join(GDP, country)
head(GDP_region)
GDP_region <- GDP_region %>% rename(country = `Country Name`)
GDP_depression <- left_join(depression, GDP_region)
head(GDP_depression) # I don't have rows with regions, 
#it was present in the GDP table, but not depression
#so with left_join I oculd authomatically skipt it. 
```

```{r summary}
summary_depression <- GDP_depression %>%
  group_by(Region) %>% #Take the data and group by Subject
  summarise(
    median = median(RatePer100k_2021),
    mean = mean(RatePer100k_2021),
    min = min(RatePer100k_2021),
    max = max(RatePer100k_2021),
    n_of_countries = n()
)

summary_depression
```

```{r}
library(ggthemes)
GDP_depression_clean <- GDP_depression %>% filter(!is.na(`2021`))

ggplot(GDP_depression_clean, aes(x = `2021`, y = RatePer100k_2021, color = Region))+ 
  geom_point(alpha = 0.7, size = 3) + 
  scale_x_log10() + 
  labs(title = "GDP vs. Depression Rate (2021)", 
  y = "Depression Rate (/100k)",
  x = "log10(GDP)") +
 geom_smooth(aes(group = Region), method="lm", formula = "y ~ x", se = F, alpha = 0.3)+
    theme_bw()

```
