3
$\begingroup$

I would like to learn pointers to how to prove "a function $f(x)$ always greater than function $g(x)$ in a interval $(a,b)$". In other words show that $f(x) > g(x)$ in $(a,b)$.

Their end point values are same.

$$f(a) = g(a)$$ $$f(b) = g(b)$$

$f(x)$ and $g(x)$ are functions used in Decision tree induction.

$f(x)$ is information gain.

$$ f(x) = - \sum p(x)\log_2(p(x)) $$

g(x) is GINI index function.

$$ g(x) = 1 - \sum p(x)^2 $$

$p(x)$ functions are probabilities therefore $ \sum{p(x)} = 1$.

I tried to show that $f(x) - g(x) > 0$ is always true. But I think I need formal name of this approach or other approaches.

I am engineer by profession therefore I would like to learn full names of approaches to search in the internet.

I added functions and more information about this problem.

A good explanation of my problem is given here. Example is about SPAM classification. http://people.cs.vt.edu/~ramakris/Courses/CS6604/lectures/lec9/lec9.pdf

$f(x)$ and $g(x)$ functions are discrete, probabilities are computed using given data. Above link has a figure which shows that between $(0,1)$, $f(x)$ is always greater that $g(x)$.

I thought that generic answers is possible but it seems that I am mistaken. Every problem is somewhat unique.

  • 0
    Nothing special about $f$ and $g$?2011-12-12
  • 2
    Neither $f$ nor $g$ depends on $x$. Since they are both constants and their boundary values coincide, they're identical, so it can't be that one is greater than the other.2011-12-12
  • 0
    @joriki corrected p(i) as p(x)2011-12-12
  • 3
    Ah, I thought the sum was over $i$. Now what's the sum over?2011-12-12
  • 0
    I'm guessing it's really an integral from $a$ to $b$ rather than a sum, but that some probability textbook (for engineers, no doubt) didn't want any scary $\int$ signs in it.2011-12-12
  • 0
    Thanks for comments. These functions are discrete and probabilities are discrete too. I am sorry about misunderstanding. I am learning to give more information.2011-12-12
  • 0
    You might want to look at the inequalities for Entropy (your $f(x)$) in information theory, or the wikipedia page on Decision tree learning: http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity.2011-12-12
  • 0
    The point in @joriki's first comment still fully applies. There is no occurrence in the document you link to of this confusion between $x$ the argument of $f$ and $g$ and $x$ the mute variable in the summations.2011-12-12

1 Answers 1

1

I am interpreting the question as follows:

Let $X$ be a finite set and let $p : X \to [0, 1]$ be a probability distribution over $X$. Define the entropy to be $H(p) = - \sum \limits_{x \in X} p(x) \ln p(x)$ and the Gini index to be $g(p) = 1 - \sum \limits_{x \in X} p(x)^2 $. Show that the entropy always dominates the Gini index; i.e., show that $H(p) \geqslant g(p)$ for all distributions $p$.

Proof. Assume without loss of generality that $p(x) > 0$ for all $x \in X$. The Gini index can be rewritten equivalently as $$ g(p) = 1 - \sum_{x \in X} p(x)^2 = \sum_{x \in X} p(x) - \sum_{x \in X} p(x)^2 = \sum_{x \in X} p(x) (1-p(x)). $$ From the standard inequality $1+z \leqslant \mathrm e^z$, we have $$ 1 + \ln p(x) \leqslant \exp(\ln p(x)) = p(x), $$ which gives $1 - p(x) \leqslant - \log p(x)$. Multiplying by $p(x) \gt 0$, we have $$ - p(x) \ln p(x) \geqslant p(x) (1 - p(x)). $$ Finally, summing over all $x \in X$, we get the claim.