0
$\begingroup$

I was a bit hesitant whether to post this on CrossValidated SE or here, but decided for Mathematics SE due to the primarily mathy nature of my question.


I am trying to grasp the "missing data notation" used in the statistical literature for example by Rubin (1978) or Gelman and colleagues (2013, pp. 199-201) when describing the probability distributions of partly observed data $y$.

Following these sources, let $y$ be a matrix of potential data and $I$ a matrix of dimension of $y$ indicating whether $y_{ij}$ is observed ($I_{ij}=1$) or not ($I_{ij}=0$). The authors then state "For notational convenience, let 'obs'$=\{(i,j) \ : \ I_{ij} = 1 \} $ index the observed components of $y$ and 'mis'$=\{(i,j) \ : \ I_{ij} = 0 \} $ index the missing components." Variable $I$ is always observed.

Later the authors then heavily rely on integrals of the type

$$p(y_{obs},I|\theta,\phi) = \int p(y,I|\theta, \phi) \ dy_{mis},$$

where $\theta$ parameterizes a model of $y$ and $\phi$ the model of $I$.

My question is: I basically do not understand what is happening in the integral (i.e. why this action is admissible). It seem the authors use $dy_{mis}$ as some sort of selector to only integrate $p(y,I|...)$ over some of the elements. Is this conformant with the definition of integrals and probability densities?


References:

Gelman, A. et al. (2013). Bayesian Data Analysis. CRC Press.

Rubin, D. B. (1978). Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1), 34–58.

1 Answers 1

1

What this is actually saying is that $$ p(y_{\mbox{obs}},I | \theta,\phi) = \int p(y_{\mbox{obs}},I | y_{\mbox{mis}}, \theta,\phi) p(y_{\mbox{mis}}) \,dy_{\mbox{mis}} $$ or in words,

The probability of a given set of values for the observed $y$'s is the weighted integral over the values of the non-observed $y$'s of the conditional probability of that set of values for the observed $y$'s given that the unobserved $y$'s values have the values being integrated over, and the weighting is the probability of those values of the unobserved variables.

So your question boils down to, "why can we say $p(y)$ when we mean $p(y_{\mbox{obs}}| y_{\mbox{mis}})p(y_{\mbox{mis}})$?"

And we can say that because Bayes' Therorem is precisely that.

  • 0
    Thanks, but wait. When you say $p(y_{obs}|y_{mis})p(y_{mis})$, don't you say then that there are two different $y$ which can have a joint density $p(y_{obs},y_{mis})$? And what is the relation between $p(y_{obs},y_{mis})$ and $p(y)$? They are not equivalent are they?2017-02-16
  • 0
    My confusion is perhaps rooted in the fact that they say these variables are indexed by obs and mis, but maybe what they mean is that there are two versions of $y$ of which we only see one.2017-02-16