Say I have a set of positive numbers (well, 0 is allowed, but if 0 is the max then no further operation is done and hence the special case of divide by zero won't arise). These are essentially some ratings - could all be on different scales.
Now I wish to 'normalize' the set so that it's between [0,1] (or [0,10] or [0,100] etc.,) Here are the three ways and I'm really confused about when to choose one over the other or are they equivalent?
Given:
$\qquad S$ = $\{1,3,7,9,11,12,34\}$
Possible ways of normalizing: (I may be incorrect and hence the question)
First way: divide eah element by the total sum of all elements in the set:$\begin{equation*} \forall s \in S: s_i^' = \Large\frac{s_i}{\sum_{i=1}^ns_i} \end{equation*} \qquad\small i = 1\space to \space n; \space n = |S| \space and \space s_i^'= is\space normalized \space quantity.;\space Ex: \space \frac{1}{77}, \frac{3}{77},...,\frac{34}{77}$
(In order to scale to a different interval, say [0,10] just multiply s_i^' by 10)
Second way: Find the max of the set and divide each element by it
\begin{equation} \forall s \in S: s_i^' = \Large\frac{s_i}{max(S)} \end{equation} \space;Ex: \space \frac{1}{34}, \frac{3}{34},...,\frac{34}{34}=1
Another way: The same as the above, but this time based on interpretation: i.e., if a smaller number is a better 'rating' than a bigger number e.g., a 1 is better than a 9. So in this case the order would just reverse:
\begin{equation} \forall s \in S: s_i^' = \Large\frac{min(S)}{s_i} \end{equation}\space;Ex: \space \frac{1}{1}=1, \frac{1}{3},\frac{1}{7},...,\frac{1}{34}
(here too in order to scale to a different interval, say [0,10] just multiply s_i^' by 10)
Third way:
\begin{equation} \forall s \in S: s_i^' = rating *\large\frac{worst(S) - s_i}{worst(S) - best(S)} \end{equation}
$where:$
$\qquad rating = 1$ if required interval is [0,1]
$\qquad worst(S) = max(S)$ if 'lower' rating is better i.e., 1 is better than a 9
$\qquad\qquad\qquad\qquad = min(S)$ otherwise
$\qquad best(S) = min(S)$ if 'lower' rating is better i.e., 1 is better than a 9
$\qquad\qquad\qquad\space = max(S)$ otherwise (basically the opposite of $worst(S)$)
Example: Assume that a higher rating is better i.e., $best(S)=34 \space and \space worst(S)=1$. Normalization interval: [0,1] thus $rating=1$
$s_1 = 1 * \frac{1-1}{1-34} = 0$ notice this method has a zero value!
$s_2 = 1 * \frac{1 - 2}{1 - 34} = 0.30303$
$s_7 = 1 * \frac{1-34}{1-34} = 1$
So in this technique the worst value is given the score of zero and the best gets the score of 1. Depending on the interval the value of the rating is chosen as the multiplying factor
Context of usage: Assume that each such set of numbers has an associated weight $w$. Let $w_s$ be a numeric weight associated with set $S$. Thus the the 'weight' of each alternative $a_i = s_i \in S$ can be given by w_s * s_i^' (i.e.,$w_s * normalized \space value \space \forall s_i)$ (basically wish to perform simple additive weighting, for giving the process a name)
Question: Based on the situation, I'd want to to normalize within some interval - specifically [0,n] for some n (usually, 1, 10, 100). The question is then which method to use for normalization? Which is one 'correct' (if correct can by aptly defined in this context)? Are these all equivalent? (the first one doesn't have any max/min best/worst associated with it like the other two, implying that there IS probably a difference!)
PS: There is no associated tag for normalization and I've tried choosing the best ones describing the problem. Those with higher privileges my add/remove them to make it more appropriate
UPDATE: I'm looking to perform simple additive weighting - where scores/ratings that are on a different scale are normalized before proceeding. Given a set $C$ of criteria with weights $w_i$ and each alternative $a_j$ is rated on each criteria. The overall weight of each alternative is given by $\sum_{i=1}^n w_i * a_i$
| C1 | C2 | W(C1) = 2 and W(C2) = 4 A1 | 20 | 7 | A2 | 4 | 17 | A3 | 12 | 3 |
The overall scores for the alternatives A1 - A3 would be calculated as per the above formula $\sum_{i=1}^n w_i * a_i$. The question is what normalizing (for the scores) approach is the best to use for this purpose? Does it matter?
(Note: The weights for each critieria $w_i$ are normalized using the first way as above)