0
$\begingroup$

Given a known Set x . Given a know size y. User choose a random subset from x of size y. We need to guess the sum of the new subset. both x and y are less then 5000. After searching on net this problem seem to be related like Variance (one example is :Fair dice ).

But I am not able to find the solution for this problem. As number can't be repeated in the set. So what will the formula for this problem ?

  • 0
    One can generalize somewhat by allowing some of the $a_i$ to be equal. The wording has to be changed, but the answer doesn't.2011-09-01

2 Answers 2

1

Let $B$ be a set of $n$ real numbers. We choose at random an $m$-element subset $A$ of $B$. We want to know the expected value (mean) of the sum of the chosen numbers.

Let $B=\{b_1,b_2,\dots, b_n\}$. Imagine that the choosing is done sequentially. Let the random variable $X_i$ be the $i$-th number chosen, and let $X=\sum_{i=1}^m X_i.$ Then $X$ is the sum of the chosen numbers, and we want to find $E(X)$, the expected value of this sum. For any random variables $X_1, X_2, \dots, X_m$ we have $E\left(\sum_{i=1}^m X_i\right)=\sum_{i=1}^m E(X_i).$ Now we need to compute $E(X_i)$. But $X_i$ takes on the values $b_1, b_2,\dots, b_n$, each with probability $1/n$. It follows that $E(X_i)=\frac{\sum_{i=1}^n b_i}{n},$ and therefore $E(X)=\frac{m}{n}\sum_{i=1}^nb_i.$

Comments:

$1$. The OP specified that the numbers chosen are distinct. In standard terminology, this means that the choosing is done without replacement. The above argument does not depend on whether the choosing is done with replacement or without replacement: the answer under each assumption is the same.

$2$. In an answer to one of the comments, the OP seemed to say that the set $B$ consists of the first $n$ positive integers, though that wasn't entirely clear. If $B=\{1,2,\dots,n\}$, then $\sum_{i=1}^n i=\frac{n(n+1)}{2}\quad\text{and therefore}\quad E(X)=\frac{m(n+1)}{2}.$

$3$. The same argument goes through if $B$ is a multiset (some of the $b_i$ are allowed to be equal), and $A$ is a (multi)subset of $B$.

$4$. As Yuval Filmus has pointed out in a comment, we can also find the variance of $X$ by an argument that exploits the linearity of expectation.

0

The expected value will be indeed better estimation for the sum, while the variance corresponds to the error of this estimation. Let me denote $x = \{x_1,...,x_n\}$. To model the random choice you should construct a probability space and there are several possibilities to do it.

The one is to pick up a random combination $s = \{s_1,...,s_y\}\subset x$ of $y$ elements from $x$. E.g. $s_1 = x_2,s_2 = x_3,s_3=x_4$ and so on. Let $S$ denote the set of all possible such combinations. Since numbers cannot repeat, $S$ has $C^y_n$ elements. To model a random choice you can assume that each combination has the same probability which is $p(n,y) = \displaystyle{\frac{1}{C^y_n}}.$

The formula for the expectation of the sum will be $ E = p(n,y)\sum\limits_{s\in S}\sum\limits_{i=1}^ys_i $ and for the variance $ V = p(n,y)\sum\limits_{s\in S}\left(E-\sum\limits_{i=1}^ys_i\right)^2. $

Although these formulas look weird, unfortunately if you don't have any special information about $x$ (e.g. $x =1,2,...,n$) then I believe you should deal with all possbile combinations. If you $x$ has a nice structure, then there can be nice formulas for $E$ and $V$.

  • 0
    There is no need to delete. Just modify your absolute assertion that nothing further can be done. You have given a valid expression for the answer.2011-09-02