Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Reed count | 49 | 10 | 34 | 206 | 55 | 354 |
Oregon % | .043 | .02 | .125 | .77 | .042 | 1 |
If the students at Reed were drawn from a population with these proportions, how many counts would we expect in each group?
\[\textrm{exp. count} = n \times p_i\]
Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Obs. count | 49 | 10 | 34 | 206 | 55 | 354 |
Exp. count | 15.22 | 7.08 | 44.25 | 272.58 | 14.87 | 354 |
n <- 354 p <- c(.043, .02, .125, .77, .042) samp <- factor(sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p), levels = c("asian", "black", "hispanic", "white", "other")) table(samp)
## samp ## asian black hispanic white other ## 20 11 39 272 12
obs <- c(49, 10, 34, 206, 55)
obs <- c(49, 10, 34, 206, 55) samp <- factor(sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p), levels = c("asian", "black", "hispanic", "white", "other")) table(samp)
## samp ## asian black hispanic white other ## 15 8 40 276 15
samp <- factor(sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p), levels = c("asian", "black", "hispanic", "white", "other")) table(samp)
## samp ## asian black hispanic white other ## 17 7 38 278 14
reed_demos %>% specify(response = ethnicity) %>% hypothesize(null = "point", p = c("asian" = .043, "black" = .02, "hispanic" = .125, "white" = .77, "other" = .042)) %>% generate(reps = 500, type = "simulate")
We could do a tests/CIs on \(p_{reed} - p_{oregon}\) for each group, however:
For each of \(k\) categories:
Then add them all up.
\[\chi^2 = \sum_{i = 1}^k \frac{(obs - exp)^2}{exp}\]
Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Obs. count | 49 | 10 | 34 | 206 | 55 | 354 |
Exp. count | 15.22 | 7.08 | 44.25 | 272.58 | 14.87 | 354 |
\[ Z_{asian}^2 = (49 - 15.22)^2/15.22 = 74.97 \\ Z_{black}^2 = (10 - 7.08)^2/7.08 = 1.20 \\ Z_{hispanic}^2 = (34 - 51.5)^2/51.5 = 5.95 \\ Z_{white}^2 = (206 - 272.58)^2/272.58 = 16.26 \\ Z_{other}^2 = (55 - 14.87)^2/14.87 = 108.30 \]
\[ Z_{asian}^2 + Z_{black}^2 + Z_{hispanic}^2 + Z_{white}^2 + Z_{other}^2 = 206.68 = \chi^2_{obs} \]
null_dist <- reed_demos %>% specify(response = ethnicity) %>% hypothesize(null = "point", p = c("asian" = .043, "black" = .02, "hispanic" = .125, "white" = .77, "other" = .042)) %>% generate(reps = 500, type = "simulate") %>% calculate(stat = "Chisq")
What is the probability of observing our data or more extreme (\(\chi^2 = 206.68\)) under the null hypothesis that Reedies share the same ethnicity proportions as Oregon?
About zero.
If…
then our statistic can be well-approximated by the \(\chi^2\) distribution with \(k - 1\) degrees of freedom.
1 - pchisq(206.68, df = 4)
## [1] 0