4
$\begingroup$

Imagine I'm polling a random sample from the population and it asks them if they approve of the President or not. I also ask them some categorical demographic questions (age-bracket, race, gender, income-bracket).

Now given a new randomly-selected person from the population, I want to know the aposteriori probability distribution that he approves.

I take it that if this is all I know, the answer is just $Beta[approvers+1, nonapprovers+1]$ (assuming a uniform prior).

But I happen to know all the demographic information for this person too -- it's a 24-to-34-year-old white man in the lowest income bracket. Now I could just look at the 24-to-34-year-old low-income white men I polled, but I only polled one (or no) other person like that, leaving me essentially just with my prior. How do I appropriately combine all the information I have about different demographics and sub-demographics?

  • 0
    @Michael: I agree there was something wrong with the question, since the assumptions behind the "probability distribution" hadn't been explicated, but simply replacing that by "probability" has made things worse -- now the beta distribution makes no sense.2012-09-12
  • 0
    Okay @joriki, I see I need to change it to posterior probability distribution for the outcome.2012-09-12
  • 0
    @Michael: I think it should be "the aposteriori probability distribution for the probability that he approves", no? There are some questionable implicit assumptions behind this, but at least it makes sense, whereas I don't understand what "the aposteriori probability distribution that he approves" means.2012-09-12
  • 0
    @joriki Yes that should be in the title also.2012-09-12

1 Answers 1

2

You have a binary response (approve/disapprove) and a set of covariates (age-bracket, race, gender, income-bracket). You can fit a logistic regression model to the data using the classical frequentist approach. Then this model will give a prediction for a future person given the set of demographic conditions (values for the covariates). If you do not have a lot of data for a particular combination of the covariates that is associated with the person you want to make the prediction on, the prediction will probably not be very accurate. To take a Bayesian approach, you could put prior distributions on the parameters of the model and combine that with the likelihood function for the observed responses. Given the model form you could get a Bayesian posterior joint distribution for the parameters from which you could select parameter estimates.