3
$\begingroup$

I want to have a value for users between 0-1 , that shows how much they like movies of specific type depending on how many movies they have watched and movie type ( comedy , etc ). I have a data of users watching movies during years.

data format

  • user - movie - year - type

Was thinking of something like

user1_comdey = number of comedy movies / number of watched movies

How can I use the years in the equation to make recent years more important ?

What we call this in Math ? I didn't know what best tags to use.

Thanks

  • 0
    First, welcome to math.SE! It sounds like you want to find weights for modelling a [time series](http://en.wikipedia.org/wiki/Time_series).2012-03-29

2 Answers 2

1

Long comment:

Let's focus on one particular user, Alice, and on comedy movies, during a period of years $\{y_1, y_2, \ldots, y_n\},$ where $y_n$ is the most recent year. Assume Alice watched total $t_{i}$ movies during each year $i.$ Further more, she watched $c_{i}$ comedy movies during that year $i.$

To model Alice's preference of comedy movies, we can use the fraction $$ \frac{\sum_{i = 1}^{n} c_i}{\sum_{i = 1}^{n} t_i} = \frac{\text{total number of comedy movies watched}}{\text{total number of movies watched}} $$

To add more weight to recent years, you can use a weighted sum, and assign higher weights to recent years. In other words, Alice's comedy score would be: $$ \frac{\sum_{i = 1}^{n} w_i c_i}{\sum_{i = 1}^{n} t_i} $$ where, for example, $$w_n = n, w_{n-1} = n-1, \ldots, w_1 = 1.$$ This weight assignment intuitively says that if Alice watched a comedy movie recently then we are going to count it more than once.

You can use different weight schemes. For example, you can read about exponential weights here in this Wikipedia article on Moving Averages.

  • 0
    I was wondering about the min and max of the formula with weighted sums. It looks like it is not going to be between 0 and 1 as in the OP.2012-03-29
  • 0
    @EmmadKareem The current range is $[0, x]$ (too lazy to figure out $x$). You can always rescale/normalize to $[0, 1].$2012-03-29
  • 0
    In fact, if we change the denominator to $\sum_{i=1}^{n} w_i t_i,$ then it should be well within $[0, 1].$2012-03-29
  • 0
    Thanks for the clarification.2012-03-29
  • 0
    We call this function/equation a model ? and the below answer another model ? a model is a function that can be simple or complicated ?2012-03-30
  • 0
    @tnaser We would call this a model. Models are composed of functions, and constants, and mathematical expressions. There could be different models to solve the same problem. For example, the answer by Emre presents a different model.2012-03-30
  • 0
    How can we increase the weight ? I am trying on some data and get very small number like 0.00004 .2012-04-05
  • 0
    @tnaser I guess you need to post a new question, show us your model, your equations, and your sample data.2012-04-05
  • 0
    But I guess (without seeing your model), a temporary solution would be to scale all weights, i.e. $\dfrac{w_i}{\displaystyle\max_j {w_j}}$ should be fine.2012-04-05
  • 0
    didn't get that , what is j ?2012-04-05
  • 0
    $\max_j w_j$ is another was to say find the maximum weight among all the weights you computed.2012-04-05
  • 0
    What about using log , square root , 2^ , exp , e . I see them a lot but don't know why and how they were selected ?2012-04-05
0

A simple weighting function is $w(movie)=\exp(t_{movie}-t_0)$, where $t_0$ is the present year and $t_{movie}$ is the year of the movie. For user $i$:

$u_i (comedy) \equiv \frac{1}{T_i} \sum_{k \in comedies} w_{movie_k}$

where $T_i=\sum_{n \in movies} w_{movie_n}$

This will give you a number between 0 and 1.

  • 0
    Why to use exp ?2012-03-30
  • 1
    Because it is a simple function that smoothly tends to zero, and works with arbitrarily distant dates. If you want to tweak the attenuation rate, you can use $\exp(-a\Delta t)$ instead of just $\exp(-\Delta t)$, where $a$ is your tweaking knob.2012-03-30
  • 0
    What will be the difference between calculating , exp (-1) + exp(-2) than calculating exp(-3) ?2012-03-30
  • 0
    Should I sum the years than get the exp , or sum the exp ?2012-03-30
  • 0
    Sum the exponentials. The numerator is taken over the set of comedies. The denominator is taken over the set of all movies viewed by a particular user.2012-03-30