0
$\begingroup$

Take the joint probability of a complex algebraic expression consisting of a sequence of $n$ variables $P(x_i, x_2, \dots, x_n)$. My goal is to calculate the product of the probabilities of all possible subsequences of these $n$ variables.

For example, take the sequence $S = x_1, x_2, x_3, x_4$, which occurs with the joint probability $P_S = P(x_1, x_2, x_3, x_4)$. The set of probabilities of all possible order-preserving subsequences of $S$ is as follows:

slider size = 1: $P(x_1)$, $P(x_2)$, $P(x_3)$, $P(x_4)$

slider size = 2: $P(x_1, x_2)$, $P(x_2, x_3)$, $P(x_3, x_4)$

slider size = 3: $P(x_1, x_2, x_3)$, $P(x_2, x_3, x_4)$

The slider can be thought of as a unit of a certain size sliding through the sequence and "cutting out" subsequences of that size until it slides all the way to the end. In general, we can say that if we have a sequence of $n$ elements, the set of all its order-preserving subsequences will have the following properties:

  • there will be $n-1$ slider sizes
  • the smallest slider size will have $n$ subsequences
  • the largest slider size will have $n-(n-2) = 2$ subsequences

As I mentioned earlier my goal is to calculate the product of all these probabilities. In the above example the product would be:

$P(x_1)$ $P(x_2)$ $P(x_3)$ $P(x_4)$ $P(x_1, x_2)$ $P(x_2, x_3)$ $P(x_3, x_4)$ $P(x_1, x_2, x_3)$ $P(x_2, x_3, x_4)$

Put into a matrix-like format where the vertical axis is the grain size and the horizontal axis is the slider i.e. the specific subsequence within the larger sequence at a given grain size, we would get the following representation:

$P(x_1)$ $P(x_2)$ $P(x_3)$ $P(x_4)$

$P(x_1, x_2)$ $P(x_2, x_3)$ $P(x_3, x_4)$

$P(x_1, x_2, x_3)$ $P(x_2, x_3, x_4)$

I am trying to come up with a succinct formal generalization of this example for sequences of length $n$. The best solution I could come up with so far is the rather illegible expression below, which - it has been suggested to me - can be simplified to the form $\prod\prod P(x)$, but I don't know how.

$\prod_{i=1}^{n} P(x_i) \prod_{j=1}^{n-1} P(x_j, x_{j+1}) \dots \prod_{z=1}^{2} P(x_z, x_{z+1}. \dots, x_{z+{n-1}})$

where $n$ represents the number of elements in the sequence, $i$ is the index of the initial element of the subchunk of the smallest slider size, $j$ is index of the initial element of the subchunk of the next smallest slider size etc.

Is there a way to simplify this expression further? I would really appreciate some help with this from those who read math like their morning newspaper.

1 Answers 1

0

I would probably try to define it "recursively". By that, I mean something similar to the following.

Define: $$S_{n,k} = S_{n,k-1}\prod_{z = 1}^{n-(k-2)}P(x_z,\dots,x_{z+(k-2)})$$ Then, for $S_{4,4}$, we have that: \begin{align*} S_{4,4} & = S_{4,3}\prod_{z = 1}^{4-2} P(x_z,x_{z+1},x_{z+2}) \\ S_{4,3} & = S_{4,2} \prod_{z = 1}^{4-1}P(x_z,x_{z+1}) \\ S_{4,2} & = S_{4,1}\prod_{z = 1}^{4-0}P(x_z) \\ \end{align*} Here, we need to make the definition that $S_{n,1} = 1$. Then, you just want to find $S_{n,n}$, which expands to be: \begin{align*} S_{4,4} & = S_{4,3}\prod_{z = 1}^{4-2}P(x_z,x_{z+1},x_{z+2}) = S_{4,2}\prod_{z = 1}^{4-1}P(x_z,x_{z+1})\prod_{z = 1}^{4-2}P(x_z,x_{z+1},x_{z+2}) \\ & = S_{4,1}\prod_{z = 1}^{4-0}P(x_z)\prod_{z = 1}^{4-1}P(x_z,x_{z+1})\prod_{z = 1}^{4-2}P(x_z,x_{z+1},x_{z+2}) \\ & = \prod_{z = 1}^{4-0}P(x_z)\prod_{z = 1}^{4-1}P(x_z,x_{z+1})\prod_{z = 1}^{4-2}P(x_z,x_{z+1},x_{z+2}) \end{align*} I would caution you that, while this "simplifies" the expression, I don't personally find it easier to understand. Of course, this might be lessened if you work on your exposition of the problem overall.

  • 0
    Thank you for your proposal, Mark! Going about it recursively seems a way to go, because as the grain/slider size gets progressively larger (all the way up to n-1) the number of subchunks gets progressively smaller (all the way down to 2). What do n and k in your formulation stand for? So if I have a sequence consisting of four elements, then I will have $S_{4,4}$ and if I have a sequence consisting of three elements, then I will have $S_{3,3}$?2017-01-24
  • 0
    I used $n$ to denote the maximum number of variables, and $k$ as a sort of "index" for the size of the "slider". It may not be exactly the size of the slider (it might be that $k = \text{slider} + 1$ or something like that).2017-01-24