I want to estimate the quantile of some data. The data are so huge that they can not be accommodated in the memory. And data are not static, new data keep coming. Does anyone know any algorithm to monitor the quantile(s) of the data observed so far with very limited memory and computation? I find P2 algorithm useful. But it does not work very well for my data, which are extremely heavy-tailed distributed.
algorithm to dynamically monitor quantile(s)
4
$\begingroup$
algorithms
-
1**Crossposted**: http://stats.stackexchange.com/questions/7959/algorithm-to-d$y$namicall$y$-monitor-quantiles – 2011-03-08
1 Answers
4
The state of the art in this area has advanced some ways since the mid-1980s. The keywords you should be using in your search are "stream" and "online", both terms used to denote the situation when the size of the input data set (or stream) is far too large to store in memory.
Specifically, I would suggest starting with "Space-Efficient Online Computation of Quantile Summaries" by Greenwald & Khanna and "How to Summarize the Universe" by Gilbert, et. al.
-
0Correct. Seems ["Sketch"](http://www.eecs.harvard.edu/~michaelm/CS222/countmin.pdf) is one of the latest work. – 2011-03-09