I was a bit hesitant whether to post this on CrossValidated SE or here, but decided for Mathematics SE due to the primarily mathy nature of my question.
I am trying to grasp the "missing data notation" used in the statistical literature for example by Rubin (1978) or Gelman and colleagues (2013, pp. 199-201) when describing the probability distributions of partly observed data $y$.
Following these sources, let $y$ be a matrix of potential data and $I$ a matrix of dimension of $y$ indicating whether $y_{ij}$ is observed ($I_{ij}=1$) or not ($I_{ij}=0$). The authors then state "For notational convenience, let 'obs'$=\{(i,j) \ : \ I_{ij} = 1 \} $ index the observed components of $y$ and 'mis'$=\{(i,j) \ : \ I_{ij} = 0 \} $ index the missing components." Variable $I$ is always observed.
Later the authors then heavily rely on integrals of the type
$$p(y_{obs},I|\theta,\phi) = \int p(y,I|\theta, \phi) \ dy_{mis},$$
where $\theta$ parameterizes a model of $y$ and $\phi$ the model of $I$.
My question is: I basically do not understand what is happening in the integral (i.e. why this action is admissible). It seem the authors use $dy_{mis}$ as some sort of selector to only integrate $p(y,I|...)$ over some of the elements. Is this conformant with the definition of integrals and probability densities?
References:
Gelman, A. et al. (2013). Bayesian Data Analysis. CRC Press.
Rubin, D. B. (1978). Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics, 6(1), 34–58.