0
$\begingroup$

I have an application where I get a small number of scores from medical tests (the same test repeated a few times) and I need to combine them. Normally the highest score would "win" in this combination, but not all the tests have the same "quality", so higher quality tests should be weighted higher, even if they have a lower score. And it wouldn't hurt if there's a little bit of averaging thrown in, so long as it's not overwhelming.

Typically there would be 2-5 repetitions of the test, producing open-ended scores from zero to about 100 (typical scores running 2-20), and with a "quality" metric that ranges from 0.0 to 1.0, with typical values from about 0.05 to 0.8 ("fair" being maybe 0.15 and "good" being 0.3).

I should add: Ultimately the combined score will be compared to a relatively fixed threshold of about 10.0 to produce a "positive" (ie, "perform next level of testing") or "negative" ("no further testing required") result. (Though there is room for "physician's judgment" to sway the decision one way or another when near that threshold.)

Adding more: The "quality" metric is a fairly fuzzy number. If I had scores/quality metrics of 5.0/0.4 and 6.0/0.3, I'd still tend to favor the 6.0 value, since both quality metrics are in the "pretty good" range. But when "quality" drops much below about 0.2 then the score becomes more suspect.

I could of course combine the scores using some sort of decision tree, but I'd prefer to use something resembling a continuous function (if that makes any sense at all where one is taking the "max" of several values).

(One general approach that comes to mind is to raise the scores to the Nth power, then average with quality weighting and take the Nth root of the result. But I thought someone else might have a better idea, or at least ideas to embellish that one. In particular, it's not clear how to apply the quality weighting when it's not a nice clean quality scale.)

Here's what I've got so far (Java code):

public class CombineScores {
    public static void main(String[] argv) {

        // These are test inputs
        double[] scores = {5.0, 10.0, 15.0, 40.0, 50.0};
        double[] qualities = {0.5, 0.4, 0.3, 0.1, 0.05};

        double sum = 0.0;
        double weightedSum = 0.0;
        double weightedQuality = 0.0;
        double sumOfWeights = 0.0;

        // Raising scores to this power emphasizes larger scores
        double power = 6.0;  // 6 just seems to fit expectations
        double root = 1.0 / power;  // For taking the matching Nth root

        for (int i = 0; i < scores.length; i++) {

            // Simple sum of scores
            sum += Math.pow(scores[i], power);

            // Convert "quality" into a "weight" that can be multiplied times the fudged score
            double weight = Math.pow(Math.tanh(qualities[i] / 0.2), 4.0);
            System.out.println("Quality = " + qualities[i] + ", weight = " + weight);

            // This is the sum of weighted scores that is expected to produce my main result
            weightedSum += Math.pow(scores[i], power) * weight;

            // Attempt to also weight quality.  (This seems suspect, but I don't know of a better scheme.)
            weightedQuality += qualities[i] * weight;

            // Use this to properly scale the "weighted average"
            sumOfWeights += weight;
        }
        System.out.println("Sum = " + sum);
        System.out.println("Sum root = " + Math.pow(sum, root));
        System.out.println("Weighted sum = " + weightedSum);
        System.out.println("Weighted sum root = " + Math.pow(weightedSum, root));
        System.out.println("Sum of weights = " + sumOfWeights);
        System.out.println("Average = " + Math.pow(sum / scores.length, root));
        System.out.println("Weighted average = " + Math.pow(weightedSum / sumOfWeights, root));
        System.out.println("Weighted quality average = " + weightedQuality / sumOfWeights);
    }
}

It raises the scores to the 6th power before combining, and uses TANH to softly "clip" the quality scores with a "knee" at 0.20 (after which they're raised to the 4th power). (Still playing with the parameters.) The "weighted average" value is what I would presumably use as my final result.

This seems to roughly match my expectations, though the conversion of "quality" to "weight" seems a bit flaky, and result produced as "Weighted quality average" seems higher than it should be.

Can anyone suggest any improvements, or (perhaps more importantly) identify any instabilities or boundary conditions that could lead the algorithm awry?

  • 0
    Need it be complicated? I would weigh each test so that I could multiply their score by its weight. Take the maximum value of the weighted scores and see if it is higher than the target value to determine success or failure. This is akin to giving the tester a few points for taking a harder version of the test, however, and I'm not sure if I understand the objective completely...2012-08-17
  • 0
    @JoshuaShaneLiberman -- I'm not sure I understand the objectives completely, which is part of the problem. But there's an issue with the quality metrics I probably didn't explain very well -- I'll do that.2012-08-17

0 Answers 0