1
$\begingroup$

I am trying to derive a meaningful statistic from a survey where I have asked the person taking the survey to put objects in a certain order. The order the person puts the objects is compared to a correct order and I want to calculate the error.

For example:

Users order: 1, 3, 4, 5, 2

Correct order: 3, 2, 1, 5, 4

I have come up with a method of finding an error measure: For each object in the sequence I calculate how many places it is from the correct place (not wrapping on the ends) and divide by the number of alternative places. For the object 3 - this measure would be 1/4. For the object 2 - this measure would be 3/4. Then I average these measures and divide by the measure I would get in the case of the sequence that maximizes the number of total places of error.

I have found I can calculate this maximum with the following algorithm:

// Number of places is 5 in example. int sum = 0; int i = 1; while(i

How would one write this as an equation? Is this the most meaningful measure I can make for figuring the error?

  • 0
    It was suggested to me that Kendall's rank correlation coefficients may also be useful for this task: http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient2012-09-11

2 Answers 2

1

Everything depends on your definition of the error function which is subjective. After you define it then everything becomes objektive.

Normally you define a global cost function for such kind of problems. That is the combination of the costs of individual events. Then you choose the individual costs in such a way that they jointly maximize/minimize your expectations which are defined on the global cost function.

If I focus on your approach, these individual costs are already defined. For example the cost of an error even is defined as the distance from the true location, which is later on normalized by the maximum deviation.

One other measure could be the number of correct values without respecting to any distance criteria. In this case I am only interested in the number of correct positions. It doesnt matter how much it is deviated from the true value.

In another case one can signify the cost of an error by a different kind of cost assignment such as the square of the distance from the true value. In this way one will have a very low tolarence to the deviations from the true value. Any linear combination of the above schemes can be also meaningful as it is a subjective process depending on your definition.

1

If $n = $NUMBER_OF_PLACES, your maximum is $\sum_{j=0}^{\lfloor n/2 \rfloor - 1} 2(n-2j-1) = 2 \left\lfloor \frac{n}{2} \right\rfloor \left(n - \left\lfloor \frac{n}{2} \right\rfloor \right) $ This is $n^2/2$ if $n$ is even and $(n^2-1)/2$ if $n$ is odd. Thus for $n=5$, the maximum is $(5^2-1)/2 = 12$.