3
$\begingroup$

I want to have a value for users between 0-1 , that shows how much they like movies of specific type depending on how many movies they have watched and movie type ( comedy , etc ). I have a data of users watching movies during years.

data format

  • user - movie - year - type

Was thinking of something like

user1_comdey = number of comedy movies / number of watched movies

How can I use the years in the equation to make recent years more important ?

What we call this in Math ? I didn't know what best tags to use.

Thanks

  • 0
    First, welcome to math.SE! It sounds like you want to find weights for modelling a [time series](http://en.wikipedia.org/wiki/Time_series).2012-03-29

2 Answers 2

1

Long comment:

Let's focus on one particular user, Alice, and on comedy movies, during a period of years $\{y_1, y_2, \ldots, y_n\},$ where $y_n$ is the most recent year. Assume Alice watched total $t_{i}$ movies during each year $i.$ Further more, she watched $c_{i}$ comedy movies during that year $i.$

To model Alice's preference of comedy movies, we can use the fraction $ \frac{\sum_{i = 1}^{n} c_i}{\sum_{i = 1}^{n} t_i} = \frac{\text{total number of comedy movies watched}}{\text{total number of movies watched}} $

To add more weight to recent years, you can use a weighted sum, and assign higher weights to recent years. In other words, Alice's comedy score would be: $ \frac{\sum_{i = 1}^{n} w_i c_i}{\sum_{i = 1}^{n} t_i} $ where, for example, $w_n = n, w_{n-1} = n-1, \ldots, w_1 = 1.$ This weight assignment intuitively says that if Alice watched a comedy movie recently then we are going to count it more than once.

You can use different weight schemes. For example, you can read about exponential weights here in this Wikipedia article on Moving Averages.

  • 0
    What about using log , square root , 2^ , exp , e . I see them a lot but don't know why and how they were selected ?2012-04-05
0

A simple weighting function is $w(movie)=\exp(t_{movie}-t_0)$, where $t_0$ is the present year and $t_{movie}$ is the year of the movie. For user $i$:

$u_i (comedy) \equiv \frac{1}{T_i} \sum_{k \in comedies} w_{movie_k}$

where $T_i=\sum_{n \in movies} w_{movie_n}$

This will give you a number between 0 and 1.

  • 0
    Sum the exponentials. The numerator is taken over the set of comedies. The denominator is taken over the set of all movies viewed by$a$particular user.2012-03-30