In general, how do you choose which variable to stratify your sample over?
More specifically, what should the proportions look like on the variable your stratifying over?
For example, I have data that I can stratify based on either age (15-90 years old), race, sex, or marital status. With this data, I am trying to estimate total income for the population.
When I stratify over race, I get
> stratsizes = table(ipums_data$Race)
> prop.table(stratsizes)
1 2 3 4 5
0.863919493 0.109874488 0.006079198 0.016572829 0.003553993
where the numbers 1 through 5 represent different races.
When I stratify over marital status, I get
> stratsizes = table(ipums_data$Marstat)
> prop.table(stratsizes)
1 2 3 4 5
0.58285479 0.02218440 0.06168983 0.07685977 0.25641122
where again, the numbers 1 through 5 represent the different marital statuses.
Stratifying through age gives an pretty even proportion through all the ages 15-90, and stratifying with sex gives a 50-50 proportion
What variable would be the best to stratify over, and why?