15
$\begingroup$

Tonight a game of Scrabble ended in what I consider a very unusual graph structure, unlike this generic web image, which seems more typical:
            
Let us call the Scrabble graph the one whose nodes are tiles, with two tiles connected by an arc in the graph if they share a side. The unusual graph that occurred contained a long snaking chain that consumed most of the board. This made me wonder if anyone has performed a statistical analysis of "average" Scrabble graphs; for example, their diameter, or their cycle structure? No doubt these properties depend on the skill of the players, but it seems within some reasonable parameters, there are perhaps answers of sorts. I would be curious to hear of any data in this direction. Thanks!

Afterthought. Maybe equally interesting: What is the largest diameter legal, achievable Scrabble graph?

  • 1
    It would also be interesting to consider the following graph: The vertices are the words on the final board, and two vertices are joined by an edge, iff they share a "letter piece". Of course, both graphs are somehow related.2012-07-23

3 Answers 3

25

Methods

Starting from the base url http://www.cross-tables.com/annotated.php?a=1 I used a combination of Python's urllib, multiprocessing and BeautifulSoup to extract the first 10000 games. The games were parsed and turned into numpy 15x15 Boolean matrices. The matrices were then turned into graphs making an edge if two adjacent cells in the matrix were both active. Graph properties were then analysed with networkx

Of the 10000 games, only 9966 were usable. Some games did not start on the center title, while others ended so quickly and strangely they did not behave properly. Fortunately these games were rare enough that the sample should give a robust estimation of the true distributions.

Methods (Update)

There was a bit more data cleanup needed. I had not taken into consideration challenged moves, leading to games that had >100 tiles used. In the process, I noted false moves and fake games. We may have to live with a bit of uncertainty in the data, such is the cost of true empirical data.

Results

The first interesting piece of information is the board frequency - this gives a nice spatial connection to the graphs we are about to study. Notice that, due to how the game is played (and how we read left to right and top to bottom) the board is asymmetric.

Board frequency

From here we can answer the question,

"What is the distribution of graph diameters and radii for an average Scrabble game?"

enter image description here

A scatter plot versus the graph size reveals a bit more information for the smaller $N$ values:

enter image description here

Results (Update)

Based of a comment, I plotted the radius vs the diameter, giving a mostly linear relationship of 1 to 2 except for a range of games with some variance. Feel free to make some observations on the significance of this in the comments.

enter image description here

Quick Conclusion (TLDR)

From the data studied, there were mostly full games played ~100 tiles with an average graph radius 18 and a diameter of 36. Further work is needed to compare these results to random graphs with the same size and edge counts but different edge distribution.

  • 1
    @aromero All plots were made with `matplotlib` (`pylab`) on Python. As suggested by Rahul, a hexplot in the pylab library might be better for the scatter plots.2012-07-29
9

It turns out that the maximum possible diameter uses the full hundred tiles! My friend Carl illustrated this in 2008 in a game he made up, here: http://www.cross-tables.com/annotated.php?u=2493#0# .

Thanks for the question!

  • 0
    Very clever!!! $\;$2012-07-29
2

If you require the viable dictionaries that are allowable for scrabble, they are available on the http://www.lexicalwordfinder.com/about/ site (Lexical Word Finder website).

An interesting experiment: Assuming most people do not play expertly, you can use the generated list from Lexical Word finder to play medium words to see what the average graphs look like. It would be nice to also see what the "best" moves at each turn produce as well.

Interesting use of networkx.