3/22/2018
Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Obs. count | 49 | 10 | 34 | 206 | 55 | 354 |
Exp. count | 15.22 | 7.08 | 44.25 | 272.58 | 14.87 | 354 |
\[ Z_{asian}^2 = (49 - 15.2)^2/15.2 = 75.16 \\ Z_{black}^2 = (10 - 7.08)^2/7.08 = 1.20 \\ Z_{hispanic}^2 = (34 - 44.25)^2/44.25 = 2.37 \\ Z_{white}^2 = (206 - 272.58)^2/272.58 = 16.26 \\ Z_{other}^2 = (55 - 14.87)^2/14.87 = 108.3 \]
\[ Z_{asian}^2 + Z_{black}^2 + Z_{hispanic}^2 + Z_{white}^2 + Z_{other}^2 = 203.29 = \chi^2_{obs} \]
Which of the following is an appropriate null hypothesis?
\(H_0\): The first-year class at Reed is sampled from a population that shares the same ethnic distribution as Oregon.
\[ p_{asian} = .043, \quad p_{black} = .02, \quad p_{hispanic} = .125, \quad p_{white} = .77, \quad p_{other} = .042 \]
\(H_A\): The first-year class at Reed is sampled from a population that has a different ethnic distribution than Oregon.
At least one \(p\) is different.
then our statistic can be well-approximated by the \(\chi^2\) distribution with \(k - 1\) degrees of freedom.
1 - pchisq(203.29, df = 4)
## [1] 0
We reject the hypothesis that the Reed first-year class represents a random sample from Oregon w.r.t ethnicity.
The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it. (wikipedia)
The number of parameters that are free to vary, without violating any constraint imposed on it.
\[ p_{asian}, p_{black}, p_{hispanic}, p_{white}, p_{other} \]
Since \(\sum_{i = 1}^k p_i = 1\), one of our parameters is contrained, leaving \(k-1\) that are free to vary.
These make no sense. Why not?
treatment <- rep(c("acu", "sham", "trad"), c(387, 387, 388)) pain <- c(rep(c("reduc", "noreduc"), c(184, 203)), rep(c("reduc", "noreduc"), c(171, 216)), rep(c("reduc", "noreduc"), c(106, 282))) table(pain, treatment)
## treatment ## pain acu sham trad ## noreduc 203 216 282 ## reduc 184 171 106
\(H_0\) implies that the associations between these two vectors are just due to chance, so we mirror that by randomizing the vectors to get another possible data set under the \(H_0\).
library(infer) null_dist <- acu %>% specify(response = pain, explanatory = treatment) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "Chisq")
The proportion of simulated \(\chi^2\) statistics under \(H_0\) that are greater than \(\chi^2_{obs}\) is 0 \(\rightarrow\) we reject the idea that pain is independent of treatment mode.
The mathematical approximation is good enough when
\[ df = (R - 1) \times (C - 1) \]
1 - pchisq(38.05, df = 2)
## [1] 5.46e-09