2019-07-23, 11:00–11:30, Elm B
If you like using serious scientific tools to do silly things, then this talk is for you. Join me as I explore the intersection of computational linguistics, algorithm design, and machine learning in an effort to seriously overthink cryptic crossword clues.
Cryptic (or British-style) crosswords are designed to be intentionally vague, misleading, or ambiguous. Each clue
combines a standard crossword clue with wordplay elements like anagrams, reversals, or homophones, so solving the clue requires understanding both crossword definitions and a combinatorial explosion of possible wordplays. Here are a couple of easy examples:
Clue: "Spin broken shingle"
Explanation: "broken" means to take an anagram of "shingle", which produces "english", and "english" can mean "spin" (at least in billiards).
Clue: "Initially babies are naked"
Explanation: "initially" means to take the first letter of "babies", giving "b". Combining "b" and "are" gives "bare", which means "naked".
We could try to enumerate every possible thing a word might mean, and every way those meanings might combine, but doing so would result in billions of possibilities, most of which are nonsense. instead, I'll show how we can use tools from computational linguistics to attack this silly problem in a serious way, and I'll show how Julia makes doing so even easier.
In particular, I will talk about:
- Developing a formal grammar for cryptic crossword clues
- Implementing probabilistic parsers which can parse cryptic crossword grammars (or any other grammar, I suppose)
- Squeezing as much performance as possible out of string manipulation in Julia
- Analyzing the meaning of words and phrases with WordNet.jl and machine learning
To learn more, check out the code, all of which is available online right now. You can find the parsing code at https://github.com/rdeits/ChartParsers.jl and the solver itself at https://github.com/rdeits/CrypticCrosswords.jl