"Pattern Classification" book I study that in nonparametric methods we need to estimate $p(x)$, and we don't want just the averaged version of it. They give an example of theoretical procedure to achieve estimation of $p(x)$ but don't understand it.
Suppose we use the following procedure. To estimate the density at x, we form a sequence of regions $R_1$, $R_2$,.. containing $x$ - the first region to be used with one sample, the second with two, and so on. Let $V_n$ be the volume of $R_n$, $k_n$ be the number of samples falling in $R_n$, and $p_n(x)$ be the $n$-th estimate for $p(x): p_n(x)=(k_n/n)/V_n$. [Eq.7]
If $p_n(x)$ is to converge to $p(x)$, three conditions appear to be required:
- $\lim\limits_{n\to\infty} V_n = 0$;
- $\lim\limits_{n\to\infty} k_n = \infty$;
- $\lim\limits_{n\to\infty} k_n/n = 0$.
The first condition assures us that the space averaged $P/V$ will converge to $p(x)$, provided that the regions shrink uniformly and that $p(\cdot)$ is continuous at $x$. The second condition, which only makes sense if $p(x) = 0$, assures us that the frequency ratio will converge (in probability) to the probability $P$. The third condition is clearly necessary if $p_n(x)$ given by [Eq.7] is to converge at all. It also says that although a huge number of samples will eventually fall within the small region $R_n$, they will form a negligibly small fraction of the total number of samples.
My problem:
- I don't understand why do these conditions are necessary?
- If 2nd condition holds, how does it coexist with the 3rd? I mean that $k_n = \infty$, so $\infty/n = 0$ ?