1
$\begingroup$

I have this problem where I have a set of data. Each data point has two values:

Value1: Birthday Value2: Height 

Now, the problem I am faced with involves finding the optimal groupings of data points to minimize the percent variance of Birthdays and Heights in each group (there are other additional constraints that I won't go into here). Finding the percent variance of Height is very straight forward, but for Birthday, it is more complex.

To calculate the percent variance of Birthday, my strategy is to count the numbers after the date January 1st, 0 AD of each Birthday, calculate the average, and average the squared percent difference of each Birthday to get the percent variance. That's all well and good, but because the number of days after January 1st, 0 AD is so huge, the problem I'm having is that the percent variance is tiny compare to the percent variance in height.

My strategy is to use percent variance so I can normalize the variances in Birthday and Height so I can measure the quality of a grouping by comparing these variances, but because the Birthday percent variance is so small, it's hard to determine when a real improvement is made in groupings based on Birthday variance.

Are there any other ways I can calculate a percent variance for Birthday, so that it is more comparable to the variance in Height? Are there any strategies that I could use to help improve my results?

  • 0
    @Jay, in percentage terms, this still has the same problems as just measuring birthdays from the mean of the birthdays in a particular group.2011-06-30

1 Answers 1

1

What if you first scale all birthdays and heights so that they fall between 0 and 1? E.g., if all the heights are between, say, 60 inches and 75 inches, then replace each height $h$ with $(h-60)/15$?