1
$\begingroup$

I have a dataset of the counts of each user visiting a set of websites in a year (each user visits at least 1 website in my data). Half of the users visit 7 or fewer sites though the top user visits 9384 sites. I want to find a count distribution that can fit the data well but it seems challenging.

Here is the data summary:

  • 46285 observations
  • Mean: 33.1
  • Std. Dev.: 138.5
  • Skewness: 20.0 Kurtosis: 808.1

Percentile: Value - Smallest

  • 1%: 1 - 1
  • 5%: 1 - 1
  • 10%: 1 - 1
  • 25%: 1 - 3

Median

  • 50%: 7

Percentile: Value - Largest

  • 75%: 19 - 4947
  • 90%: 53 - 5281
  • 95%: 116 - 7111
  • 99%: 522 - 9384

I tried Poisson which obviously doesn't work because mean << std. dev. Negative binomial does not do too much better.

Any suggestions?

Thanks!

  • 0
    For things like this, the received wisdom seems to be a power law: http://en.wikipedia.org/wiki/Power_law How does that fit?2012-07-28
  • 0
    That's not a count model though isn't it. The values here have to be count values.2012-07-28
  • 0
    you make your data continuous, so the probability of a given number goes as $n^{-k}$ for some $k$ chosen to fit the data.2012-07-28
  • 0
    That's not even a distribution isn't it. I can't ask what the probability that the number of visits is $k$ is. For what it's worth, I tried exponential distribution http://en.wikipedia.org/wiki/Exponential_distribution and it's also a bad fit. Again it's not discrete distribution.2012-07-28

1 Answers 1