I'm not very mathematically oriented (working on fixing that) so excuse me if this is basic :)
I need to calculate "fragmentation ratio" - the best name I could come up with the thing I'm trying to achieve.
I have 400 different lists where each list contains 20 named items (with some meta data). Same name item can exist in multiple lists. I calculated that in my dataset there's 326 unique items among those 8000 items (400*20).
So where I stumble is that how do I calculate a meaningful value for fragmentation of items between those lists?
--- edit ---
It looks like I was even more vague and hard to understand than I thought.
I researched this a bit more and I think better term for this would be similarities between lists and calculating that from the collection.
Here's a small example of the data structures:
List 1: ['this','is','a','list'] List 2: ['is','it'] List 3: ['yes,'it','is','list']
Now what I would like to quantify is how much similarities these arrays share.
Does this explain my problem any better? :)