1
$\begingroup$

Suppose $r$ is a set of attributes with probabilities while $p$ is a set of attributes without probabilities. For example, say that $r$ = {$a$:0.4, $b$:1.0} and $p$ = {$a$, $c$}. (Here, $a.prob = 0.4$ and $b.prob = 1.0$.) There are two possible worlds for $r$: {$a$, $b$} with probability 0.4*1 = 0.4 and {$b$} with probability (1-0.4)*1 = 0.6.

I would like to efficiently compute the following equation:

\displaystyle F(r,p) = \sum_{r' \in 2^r - \mathrm{\{\emptyset\}}} (\Pi_{a \in r'} a.c)(\Pi_{a \in r-r'} 1 - a.c)\frac{|W(r')\cap p|}{|W(r')|}

where W(r') strips off the probabilities from r' (e.g., $W$({$b$:1.0}) = {$b$}).
Hence, $F(r,p) = 0.4 * \frac 1 2 + 0.6 * 0 = 0.2.$

The above equation is clearly inefficient to compute because of the exponential size of the power set $2^r$. The approximation

\displaystyle F'(r,p) = \frac{\sum_{a \in r \cap p} a.prob}{\sum_{a \in r} a.prob}

does not seem accurate because F'(r,p) = 0.4 / 0.4+1 = 0.285 $\neq$ 0.2 = $F(r, p)$.

Is F' still an theoretically approximate result or is there some better way to efficiently compute $F$ (say in polynomial time)? What are the related math problems?

1 Answers 1

2

Let's calculate the full sum instead (without removing $\emptyset$) - you can always recover the original (you have to decide how to interpret the $0/0$ quantity - if you interpret it as a zero then the calculation below will just work). What you have is a random set $W$ where for each $x \in r$, the probability that $x \in W$ is some $c(x)$. You want to calculate the expectation of $|W \cap p|/|W|$. Since expectation is linear, this reduces to the sum $ \sum_{x \in p} E\left[\frac{|W \cap \{x\}|}{|W|}\right].$ Now pick some $x \in p$. The variable $|W \cap \{x\}|{|W|}$ is non-zero only if $x \in W$, which happens with probability $c(x)$. Now let W' = W \setminus \{x\}. The entire summand is equal to c(x) E\left[\frac{1}{1+|W'|}\right]. I'm not sure how to calculate this entity, but now we're in more familiar grounds, and maybe someone else knows how to calculate it, since |W'| is just a sum of independent Bernoulli random variables (with various probabilities).

  • 0
    Sure, that's a good idea.2011-05-12