0
$\begingroup$

Given a set of numbers

e.g. {1, 2, 3, 4, 5} or { 50, 100} or {50000, 50001}

I want to normalize these into a range with a min and max e.g. 2 >= x <= 50

My current algorithm is $$ ((range_{max} - range_{min}) / (x_{max} - x_{min})) * (x - x_{min}) + range_{min} $$

This does result is numbers within the range however a set of numbers like {50000,500001} will result in 50000 = 2 and 50000 = 50 which is too skewed. In this case I would like a result still in the range but with 2 numbers closer together e.g. 2 & 3 or 30 & 31 .

What formula could I use to do this? I'm guessing I need to use $\log(x_{max} / x_{min})$ somewhere but I'm not sure how to work it into the equation.

  • 0
    That depends a lot on the domain of what your $x$ can be and also on what you want to do later with the result.2012-08-23
  • 0
    You could try pretending that zero is also in your data set, *i.e.* replace $x_\min$ with $0$.2012-08-23
  • 0
    x could be any value. I'm generating a scatter chart from the data and the normalized value will be the radius of the point.2012-08-23
  • 0
    Using 0 in the data set might work better, will have a think about it. It would however produce a bias towards larger numbers in the range which I would need to tweak as a bias towards smaller numbers is better for my purposes.2012-08-23
  • 0
    Then pretend that some very big number is in your given set.2012-08-23

1 Answers 1

1

If what you're trying to do is pick radii of points for visualization, I would just use $$\max\left(2, 50\sqrt{x/x_\max}\right).$$ This accomplishes two things: the area of the point becomes proportional to the value, and extremely small values get clamped so their points are not too small to be visible.

  • 0
    That is quite nice. Only issue is that it can be biased towards a larger radii for smaller data sets however I could use $\ 2*(x/x_{min})$ when $\ 2*(x_{max}/x_{min}) <= 50$ as well or @Gerry Myerson suggestion above. Thanks.2012-08-23
  • 0
    However don't think this will work when $\ x < 0 $2012-08-23
  • 0
    1. If the radii are too big then you should replace $50$ with a smaller value. 2. Yes, this won't work when the data values are negative.2012-08-23
  • 0
    I think what you should think about first is whether you want the results to represent the relative *magnitudes* of the data, or just their relative *differences*. That is, do you consider $\{1,2,3\}$ to be the same as $\{5,10,15\}$ and $\{500,1000,1500\}$? If so, then you're going to have to do something special about negative values. On the other hand, if you want to treat $\{-5,0,5\}$ the same as $\{5,10,15\}$, then it'll also be the same as $\{4995,5000,5005\}$, and you can't complain about the latter getting mapped to too wide a range.2012-08-23
  • 0
    Representing the magnitudes would be better for display as the radius in a scatter chart so yes, I am going to need to do something special for negative values,2012-08-23