I'm trying to find the best way to assign a "frequency score" to sentences in Chinese. Basically, I have a database which tells me how frequent each Chinese character is. From that, I would like to evaluate how "easy" a sentence is compared to other sentences. i.e. how easy it might be for a beginner to understand the sentence.
My first approach was to average all the character frequencies in the sentence. However, I found that certain sentences with very uncommon characters end up having a higher frequency score than sentences with more common characters. I think this is because it only takes a few very common characters to really increase the total score.
For example, here are two sentences of six characters (each number represents the frequency score of a character):
10 1 2 0 9 1 = 23
3 2 4 3 3 5 = 20
In this example, the second sentence is likely to be easier because all the characters are reasonably frequent. The first sentence has a higher score because of the two "10" and "9" characters. However, the "0" and "1" characters will make it hard to understand for beginners.
So I was wondering - what would be the best way to calculate the frequency score in this case?