11
$\begingroup$

Mathematical description of a random sample: which one is it and why?

  1. $X_1(\omega), X_2(\omega), ..., X_n(\omega)$, where $X_1, ..., X_n$ are different but i.i.d. random variables.

  2. $X(\omega_1), X(\omega_2), ..., X(\omega_n)$, where $X$ is a (single) random variable.

  • 2
    Then, for $X(t_i)$ to be random one needs to pick $t_i$ randomly in the set $\{\omega_1,\ldots,\omega_6\}$ (otherwise $X(t_i)$ is a number and not a random variable). Question: how do you choose $t_i$? The answer seems to be that you choose $t_i$ **at random**, in other words $t_i$ is a map from an unspecified probability space $S$ to $\{\omega_1,\ldots,\omega_6\}$... in other words, you simply encoded each random variable $X_i$ of option 1. as a random variable $X\circ t_i$ defined on $S$. In the end, option 2. does not exist. (Unrelated: please use the @ sign to notify your comments.)2011-07-31

3 Answers 3

6

Let's say that the result of an experiment is a n-tuple of real numbers. When we accept 1. as a model of our experiment, we have a probability space $\Omega$ and a random variable $ X: \Omega \to \mathbb{R}^n $ The outcome of an experiment corresponds to a $\omega \in \Omega$ and therefore to an n-tuple $(X_1(\omega), ..., X_n(\omega))$. This model allows us to ask if the elements of this n-tuple are independent and if not, what their joint distribution is.

If we accept 2. as a model, we have a probability space $\Omega$ and a tuple of random variables $ X_i: \Omega \to \mathbb{R} $ so that the n-tuple is a random variable of the probability space $\Omega^n$ (Cartesian product). So, in this case, the independence of the elements of the tuple is built into the model. If the elements of the tuple are supposed to be independent, it does not matter.

Note that in the first case we can set $X_i = X_j$, either strict or modulo a null set; in this case we will have a tuple of identically distributed random variables. Choice no.1 does not necessarily imply that the elements of the n-tuple are different random variables (either strictly different or modulo a null set).

1

Thanks to the stimulating discussion with @Didier, I've clarified something for myself. From the technical standpoint, we have $n$ random variables in the option 1, and $n$ numbers in the option 2. The problem with this may be illustrated by the following example. Consider 3 different people producing their own random samples of size 5 by throwing a die. Here is what they get:

1 person: 1, 3, 1, 4, 2 $\;\;\;\rightarrow$ X_1(\omega'), X_2(\omega'), ..., X_5(\omega')
2 person: 2, 2, 1, 6, 3 $\;\;\;\rightarrow$ X_1(\omega''), X_2(\omega''), ..., X_5(\omega'')
3 person: 1, 4, 2, 1, 3 $\;\;\;\rightarrow$ X_1(\omega'''), X_2(\omega'''), ..., X_5(\omega''')

On the right side, I used option 1 to code these outcomes. How to code them using option 2? We can try this:

1 person: 1, 3, 1, 4, 2 $\;\;\;\rightarrow$ $X(t_1), X(t_2), ..., X(t_5)$
2 person: 2, 2, 1, 6, 3 $\;\;\;\rightarrow$ $X(t_1), X(t_2), ..., X(t_5)$
3 person: 1, 4, 2, 1, 3 $\;\;\;\rightarrow$ $X(t_1), X(t_2), ..., X(t_5)$

($t$ is for "trial number"). Well, this clearly doesn't work. What about this:

1 person: 1, 3, 1, 4, 2 $\;\;\;\rightarrow$ $X(\omega_1), X(\omega_3), ..., X(\omega_2)$
2 person: 2, 2, 1, 6, 3 $\;\;\;\rightarrow$ $X(\omega_2), X(\omega_2), ..., X(\omega_3)$
3 person: 1, 4, 2, 1, 3 $\;\;\;\rightarrow$ $X(\omega_1), X(\omega_4), ..., X(\omega_3)$

This seems to work, but how to write it in a general manner?

person ?: ?, ?, ?, ?, ? $\;\;\;\rightarrow$ $X(?), X(?), X(?), X(?), X(?)$

I think it is this point at which we arrive to an appropriate general notation specified in the option 1.

  • 0
    @Didier, yes, the answer that I posted is good enough for me. I hope it may be helpful for other people as well.2011-08-02
0

The first line is a notation for "$n$ given functions $X_i$, all computed on the same input (which was randomly chosen) $\omega$". For example, $\sin(t), \cos(t), \exp(t), \dots$ where $t$ is a randomly chosen integer between 1 and 128. Total information in this set of random numbers is 7 bits no matter how large $n$ is. The sine, cosine, and exponential of $t$ are not independent.

The second line describes: "one function $X$ computed on $n$ different random inputs $\omega_i$". For example, $X(t)$ is the $t^{\rm th}$ user in the math.SE userlist where $t$ is an integer from 1 to 128. If you make $n$ random choices of $t_i$ the amount of information in the list $X(t_1), \dots, X(t_n)$ is $7n$ bits; each $X(t_i)$ is independent of the others.

Only the second corresponds to an i.i.d random sample of $n$ objects. To describe such a sample in the notation used in (1), $\omega$ must depend on $n$, such as $\omega = (\omega_1, \dots, \omega_n)$. The notation as given in the question does not explicitly include any such dependence.


[added in light of the comments discussion]

The original question did not specify precisely what $X$ and $\omega$ mean:

Mathematical description of a random sample: which one is it and why?

  1. $X_1(\omega), X_2(\omega), ..., X_n(\omega)$, where $X_1, ..., X_n$ are different but i.i.d. random variables.

  2. $X(\omega_1), X(\omega_2), ..., X(\omega_n)$, where $X$ is a (single) random variable.

I think the trouble is that (1) should be written as:

" 0. $X_1, X_2, \dots X_n$ where the $X_i$ are different but i.i.d random variables." (Each $X_i$ is a selection from the distribution of one sample.)

In the original schemes 1-2, writing both "$X$" and "$\omega$" expressions has the implication that they play different roles: that $\omega$ are the random choices made in constructing the samples, and the $X$ or $X_i$ are deterministic functions describing how to convert the random choices into particular samples. For example, if the random samples are single random bits generated using dice, $\omega$ or $\omega_i$ can denote random throws of dice and $X$ a method of converting 1/2/3/4/5/6 outcomes into 0/1 bits. Then $\omega_i$ is itself a sequence of i.i.d random variables, and as a result, so is $X(\omega_i)$, as in formulation (2).

The standard mathematical descriptions are (0), if the details of the sample-generation process are suppressed, and (2), if they are made more explicit. It would be interesting to examine the literature on this but I think (1) is much less common, and where used it may involve strange conventions such as a "prophetic" $\omega$ that includes all future randomness that is used in generating other random variables that are combined with the $X_i$ in subsequent calculations. Be that as it may, there is also a notational or conceptual inconsistency in scheme (1), where the sampling results $X$ are individuated into particular $X_i$ but the underlying sequence of i.i.d random choices is aggregated into one collective $\omega$. The definition of $\omega$ is either left nebulous ("all randomness used to generate the samples") or, if it is made more explicit using the sequence $\omega_i$, then $\omega$ itself is superfluous and the situation would normally be expressed using scheme (2).

The conclusion (or mine at least) is that (1) is defective but (2) or (0) make sense.

  • 0
    Displaying an example of (1) in the literature would be helpful, with several X's but one global omega. There was no reference to empirical samples, though of course one can consider realizations of any random variable that was mentioned (maybe you are objecting to the words "different [iid rv's]" which I retained from OP's text to clarify the difference of (0) and (1), but it is superfluous per se and can be dropped). Each $\omega_i$ is a random variable, and random variable $Y_i$ its pushforward (image directe) under the measurable function X. Note the deterministic character of X.2011-08-03