1
$\begingroup$

A given user can interact in multiple ways with a website. Let's simplify a bit and say say a user can:

  • Post a message
  • Comment a message
  • "like" something on the website via Facebook

(after that we could add, following the site on twitter, buying something on the site & so on, but for readability's sake let's stick to these 3 cases)

I'm trying to find a formula that could give me a number between 0 and 100 that reflects accurately the user interaction with the given website.

It has to take the following into account:

  • A user with 300 posts and a one with 400 should have almost the same score, very close to the maximum

  • A user should see his number increase faster at the beginning. For instance a user with 1 post would have 5/100, a user with 2 would have 9/100, one with 3 would have 12/100 and so on.

  • Each of these interactions have a different weight because they do not imply the same level of involvement. It would go this way: Post > Comment > Like

  • In the end, the repartition of data should be a bit like the following, meaning a lot of user around 0-50, and then users really interacting with the website.

enter image description here


This is quite specific and data-dependent, but I'm not looking for the perfect formula but more for how to approach this problem.

  • 0
    @Willie True, I didn't even know there was a stats stackexchange ^^2011-07-27

2 Answers 2

2

Well, one approach might be to just assign a fixed score for each action, sum the scores of all actions taken by the user, and then apply a saturating function like $f(x) = 1-\exp(-x)$ to the result. Of course, it may be easier to store the raw sum of scores internally and just apply $f$ when displaying it.

To elaborate a little, let's say you use the saturating function $f(x) = 100(1-\exp(x/100))$. This function is close to identity when $x$ is small, so that e.g. $f(5) \approx 4.9$, $f(10) \approx 9.5$, $f(15) \approx 13.9$ and so on. It saturates at 100, so that e.g. $f(250) \approx 91.8$, $f(500) \approx 99.3$ and $f(1500) \approx f(2000) \approx 100$. If you internally award 5 points for each post, the adjusted score should look pretty much like your examples.

  • 0
    (I won't accept your answer just yet, I plan on offering a bounty when I can)2011-07-19
0

Just use fixed scores which adds up for each action. The data distribution will likely behave like a power-low. If you want to prevent people from spamming the system, reward less and less as the number of actions grow (like XP in video games). You can use any $o(n)$ function (e.g. $\log$, $\mathrm{sqrt}$) with $n$ the number of actions so far.