0
$\begingroup$

I've never posted on this forum, so I hope this question is valid...

I have sets of data that are composed of these values (simplified)

data1: 1396.53, 1, 1396.53, 106.85, 1, 55.6949 data2: 370.155, 1, 370.155, 16.2414, 1, 13.3966 

The idea is to get the average of each column, then apply a weighting to give each the same scale. For example, the averages come out to be:

Averages: 883.3425, 1, 883.3425, 61.5457, 1, 34.54575 

Then the goal would be to apply weights to put each column on roughly the same scale:

(883.3425 * 0.01) + (1) + (883.3425 * 0.01) + (61.5457 * 0.1) + (1) + (34.54575 *0.1) 

I'm rather poor with thinking of algorithms, so I cant quite find a systematic way of calculating the weights. I hope this question makes sense and thank you for your help!

  • 0
    Ok, sounds good. I'll post it as an answer then.2011-10-19

2 Answers 2

0

If the weights must always be powers of ten, then a natural choice would be to use $10^{−n}$ for the column whose average is between $10^n$ and $10^{n+1}$. That is, $n=\lfloor \log_{10} a \rfloor$, where $a$ is the average value of the column.

0

You could subtract the average of each column from each data point. Now each column has average $0$. If you want each column to have the same standard deviation ($\sigma$) you can take the standard deviation of each column and divide by that. What is special about powers of $10$? That is an artifact of our notation.

  • 0
    @kurisukun: Maybe you want standard scores, as in http://en.wikipedia.org/wiki/Standard_score Each datapoint is scored by how many standard deviations it is from the average, so this rescales the average and the standard deviation. The bad news is that it overweights points far from the mean if the tails are larger than a Gaussian (which is the normal case for real world data).2011-10-19