Suppose that we have access to only the clues of a crossword puzzle along with the number of letters that the answers are supposed to be. Is there an algorithm that we can use to reconstruct the crossword puzzle? Apart from the clues and the length of answers we also know the grid size of the crossword.
Is there an algorithm to recover a crossword grid based on the clues alone?
-
0You might need more information to make the result of the "reconstruction" unique. – 2011-10-28
-
0The only other information I can think of that may be relevant is the grid size. So, let us assume that we know the grid size. – 2011-10-28
-
0transpositions will produce a second distinct puzzle for any given solution, so perhaps you should add which are the "across" and which are the "down" clues (in order), but it still doesn't seem to fit uniqueness, especially considering multiple reasonable words matching each clue. – 2011-10-28
-
1Ah, you must mean, reconstruct the empty slots for a puzzle, no? Why are the contents of the clues relevant, then? – 2011-10-28
-
0Yes, I simply want to recover the empty slots of the puzzle. We do know which ones are across and which ones are down along with the corresponding 'clue number'. Well the contents do matter as I wish to solve the crossword puzzle after reconstructing the grid. :-) – 2011-10-28
-
0Different answers to the clues could result in different grids. – 2011-10-28
-
0But, the answers are constrained by the length so the answers per se do not matter for reconstructing the grid. – 2011-10-28
-
0Isn't it possible for two words of equal length to answer the same clue? – 2011-10-28
-
0@TheChaz I do not see why multiple answers matter for the purpose of reconstructing the *empty* grid. – 2011-10-28
4 Answers
Edit: Here are two American crossword shapes which have the same clue numbers and lengths. They are rotationally symmetric, every letter is "checked", the white squares are connected, and the board is a square with odd side length (7).
- Across 1(3), 4(6), 7(6), 8(3), 9(6), 12(6), 13(3)
- Down 1(7), 2(7), 3(7), 4(2), 5(2), 6(2), 9(2), 10(2), 11(2)
$$\left[\begin{array}{rrrrrrr} ◼& ◼& 1 & 2 & 3 & ◼ & ◼ \\ ◼& 4& ◻ & ◻ & ◻ & 5 & 6 \\ ◼& 7& ◻ & ◻ & ◻ & ◻ & ◻ \\ ◼& ◼& 8 & ◻ & ◻ & ◼ & ◼ \\ 9&10& ◻ & ◻ & ◻ &11 & ◼ \\ 12& ◻& ◻ & ◻ & ◻ & ◻ & ◼ \\ ◼& ◼&13 & ◻ & ◻ & ◼ & ◼ \\ \end{array}\right]\qquad \left[\begin{array}{rrrrrrr} ◼ & ◼ & 1 & 2 & 3 & ◼ & ◼ \\ 4 & 5 & ◻ & ◻ & ◻ & 6 & ◼ \\ 7 & ◻ & ◻ & ◻ & ◻ & ◻ & ◼ \\ ◼ & ◼ & 8 & ◻ & ◻ & ◼ & ◼ \\ ◼ & 9 & ◻ & ◻ & ◻ &10 &11 \\ ◼ &12 & ◻ & ◻ & ◻ & ◻ & ◻ \\ ◼ & ◼ &13 & ◻ & ◻ & ◼ & ◼ \\ \end{array}\right]$$
I think the result is not unique in general. You might want to look for some smaller-but-realistic crosswords and see to what extent "reasonable" examples are unique. Here is a very short crossword:
The crossword is 2×3
- 1 Across [3 letters]
- 2 Down [2 letters]
- 3 Across [1 letters]
Solution (empty grid) #1: $\begin{array}{r} \square\square\square\\ \blacksquare\square\blacksquare\\ \end{array}$
Solution (empty grid) #2: $\begin{array}{r} \square\square\square\\ \blacksquare\blacksquare\square\\ \end{array}$
You might like a similar type of puzzle: Griddlers.
There they tell you which row the across clues are in and which column the down clues are in. If we knew that for the 2×3 example, then the answer would be unique. A "griddler" wouldn't have to be unique, but I believe the website requires posted puzzles to have unique solutions.
Apparently I don't know much about how crosswords are supposed to look, btu in my defense it looks like there are some serious cultural differences in crosswords. See the wikipedia article and a bit lower on the diagramless rules.
-
0Your first grid should be: 1 Across [2 letters], 2 Across [1 letter] and 1 Down [*2* letters] which is not the same as the second one which would be: 1 Across [2 letters], 2 Down [2 letters] and 3 Across [1 letter]. Am I missing something? – 2011-10-28
-
0@crisscross: Oh, that is a good point. I can make the example a little bigger to fix it. – 2011-10-28
-
0@crisscross: is that ok? I assume you only number squares that start clues. If you number all the white squares, then there might still be problems, but it'll take me a bit to make them (I am thinking more like 5x5). – 2011-10-28
-
0hmm, that seems to work a bit better but the crossword is not square. :-) Anyway, I guess you are right about the lack of a unique solution. – 2011-10-28
Kathryn Lybarger constructed a small example to show that having the clues and the grid size, but not the clue lengths, is not enough information to reconstruct the grid for an American crossword puzzle.
Consider the following two grids:
$$ \begin{array}{rrrrr} \blacksquare \square \square \square \blacksquare \\ \square \square \square \square \square \\ \square \square \blacksquare \square \square \\ \square \square \square \square \square \\ \blacksquare \square \square \square \blacksquare \\ \end{array} \quad \text{and}\quad \begin{array}{rrrrr} \blacksquare \blacksquare \square \square \square \\ \square \square \square \square \square \\ \square \square \blacksquare \square \square \\ \square \square \square \square \square \\ \square \square \square \blacksquare \blacksquare \\ \end{array} $$
Then one can confirm that for both puzzles, the Across clues are 1, 4, 6, 7, 8, and 10, while the Down clues are 1, 2, 3, 4, 5, and 9. Of course, the clue lengths differ in this example.
Not an answer, but too much for a comment: This is similar to (but not the same as) the problem of doing diagramless crossword puzzles. In that case, you are given the size of the grid and the numbers of each clue. The number of the second down clue gives you the length of the first across clue, and the clue numbering continues to be important as you solve the puzzle. It is very important in that case to specify that you are playing by crossword rules: diagram symmetric under 180 degree rotation, all letters checked, no two letter words. One thing to note is that puzzles get harder as the number of black squares increases. I haven't done a study, but strongly suspect that you cannot recover the diagram just from the clue number information. If you can, it isn't obvious.
Providing the number of letters in the answer is in the spirit of British or cryptic crosswords, though in those cases the information is also available from the diagram. Many variety cryptics provide that information and it is an important clue.
I've done a few crosswords under such conditions before (clues with word-lengths, missing grid) and I would make a few points:
(1) Uniqueness of grid is certainly not guaranteed - this is especially obvious if you are also in the position of not knowing the size of the grid.
(2) Therefore any program to produce the right grid will probably require at least some human intervention to narrow down the possible solutions.
(3) In practical terms, you may as well enjoy the combined puzzle of solving the crossword and reconstructing the grid simultaneously - once you are reasonably confident of a couple of the answers, the need for intersecting letters to agree will rapidly reduce the range of possibilities. (And after all, the grid mostly only helps solve the clues once you have made a start). But use a pencil - you'll have to move some of the words around.
(4) I'll probably now go off and try to write a computer program to make a 'best guess' at the grid for a given set of clues, so if you're really keen to have such a program, email me sometime on i.bendr@gmail.com and I'll report on my progress.