0
$\begingroup$

I am reading a paper (the link to it is below). An there the computational complexity of the following algorithm is stated as $\mathcal{O}(k+h^{-d}) $, but I think the complexity should be $\mathcal{O}(k+kh^{-d}) $. I just don't know where my mistake could be, I would be very grateful for any help.

The Algorithm bassically solves a PDE on a grid $G_h$ with stepsize $0

Given a set $S_n$ of $n$ i.i.d d-dimensional randomvectors

  1. Choose $k

  2. Solve the pde numerically on the grid (this only involves the computation of a backward difference on each grid point $x$ and the computation of $\hat{f}_h(x)$ at the same point).

Where $\hat{f}_h(x)$ is given as $\hat{f}_h(x)=\frac{1}{kh^d} \sum_{i=1}^k { \chi_{[x,x+h]} (Y_i) } $

So choosing the k samples in step 1. is $\mathcal{O}(k) $ and the complexity in step 2 is dominated by the compution of $\hat{f}_h(x)$ on each point on the grid. Assume the grid has $h^{-d}$ points, the the total complexity in my opinion is $\mathcal{O}(k+kh^{-d}) $

1 Answers 1

2

It depends on how you compute the density estimation. If at each grid point you evaluate the sum $\hat{f}_h(x)=\frac{1}{kh^d} \sum_{i=1}^k { \chi_{[x,x+h]} (Y_i) }$, then the complexity of the density estimation is $O(kh^{-d})$. But this is terribly inefficient.

Instead you should initialize $\hat{f}_h$ to be zero at each grid point, and then visit $Y_1,\dots,Y_k$ one at a time and compute which grid cell each sample falls in and increment $f$ at that grid cell by $1/(kh^d)$. This takes $O(h^{-d})$ operations to initialize $\hat{f}_h$ and then $O(k)$ operations to loop through $Y_1,\dots,Y_k$ and update the density. Total complexity is $O(h^{-d}+k)$ for the density estimation.

Technically you could get just $O(k)$ complexity for the density estimation step if you only store the nonzero values of $\hat{f}_h$ (this is a sparse representation). However, there is not much advantage to this since you have to solve the PDE at each grid point anyway.