I've analysed newspapers by counting the language distributions of the articles.
The results look like that:
Day 1 Day 2 Day 3 Economy Economy Economy language 1: 0,35 language 1: 0,30 language 1: 0,90 language 2: 0,11 language 2: 0,10 language 2: 0,00 language 3: 0,54 language 3: 0,60 language 3: 0,10 Sports Sports Sports language 1: 0,40 language 1: 0,30 language 1: 1.00 language 2: 0,20 language 2: 0,20 language 2: 0,00 language 3: 0,40 language 3: 0,50 language 3: 0,00
I've have already posted another question on that topic (Remove statistical outliers), but here comes my second problem. First of all, I want to remove all statistical outliers from data (e.g. day 3), to make it "clean" (see my other question (other post). After that, I want to terminate which changes in my data are just "noise" and witch are significant changes. But I'm not sure how to do it.
I was thinking of the following approach:
I could calculate the standard deviation (like in my other post) and treat every value outside of it as a "significant change". But I think this will cause a mistake if all my values are slightly increasing or decreasing.
Is there any mathematical technique to find the significant changes in my data?
Thanks in advance.
(PS: I have only a few samples (~ 150 days).)