4
$\begingroup$

Imagine I'm polling a random sample from the population and it asks them if they approve of the President or not. I also ask them some categorical demographic questions (age-bracket, race, gender, income-bracket).

Now given a new randomly-selected person from the population, I want to know the aposteriori probability distribution that he approves.

I take it that if this is all I know, the answer is just $Beta[approvers+1, nonapprovers+1]$ (assuming a uniform prior).

But I happen to know all the demographic information for this person too -- it's a 24-to-34-year-old white man in the lowest income bracket. Now I could just look at the 24-to-34-year-old low-income white men I polled, but I only polled one (or no) other person like that, leaving me essentially just with my prior. How do I appropriately combine all the information I have about different demographics and sub-demographics?

  • 0
    @joriki Yes that should be in the title also.2012-09-12

1 Answers 1

2

You have a binary response (approve/disapprove) and a set of covariates (age-bracket, race, gender, income-bracket). You can fit a logistic regression model to the data using the classical frequentist approach. Then this model will give a prediction for a future person given the set of demographic conditions (values for the covariates). If you do not have a lot of data for a particular combination of the covariates that is associated with the person you want to make the prediction on, the prediction will probably not be very accurate. To take a Bayesian approach, you could put prior distributions on the parameters of the model and combine that with the likelihood function for the observed responses. Given the model form you could get a Bayesian posterior joint distribution for the parameters from which you could select parameter estimates.