I have a dataset of the counts of each user visiting a set of websites in a year (each user visits at least 1 website in my data). Half of the users visit 7 or fewer sites though the top user visits 9384 sites. I want to find a count distribution that can fit the data well but it seems challenging.
Here is the data summary:
- 46285 observations
- Mean: 33.1
- Std. Dev.: 138.5
- Skewness: 20.0 Kurtosis: 808.1
Percentile: Value - Smallest
- 1%: 1 - 1
- 5%: 1 - 1
- 10%: 1 - 1
- 25%: 1 - 3
Median
- 50%: 7
Percentile: Value - Largest
- 75%: 19 - 4947
- 90%: 53 - 5281
- 95%: 116 - 7111
- 99%: 522 - 9384
I tried Poisson which obviously doesn't work because mean << std. dev. Negative binomial does not do too much better.
Any suggestions?
Thanks!