The Hangman Advisor provides advice to someone who is trying to guess the hidden word in Hangman. It analyzes the template and recommends the next letter to be guessed, using four different algorithms described below. Note that each algorithm may produce multiple recommendations, and that the number that accompanies each recommendation may be interpreted as a level of confidence in that recommendation (higher numbers are better).
The program accepts the following inputs:
a word template consisting of a sequence of letters and underscores. Each underscore represents a single unknown letter.
a (possibly empty) list of excluded characters. These are letters that have already been guessed and that do not appear in the hidden word.
The program finds all English words that (A) appear in the spelling dictionary, (B) match the template, and (C) do not contain any excluded characters. These words (up to the first 250) are reported by the program.
The program generates a list of candidate letters for subsequent guesses. These are letters that appear in words that match the template but that do not appear in the template or in the list of excluded characters.
If no letters have been guessed, the program will recommend the letter e.
Otherwise, the following four algorithms are exercised:
Algorithm #1: This algorithm, which uses data from a depth-1 search, recommends letters based on the expected number of (new) positions filled (per word that matched the original template).
Algorithm #2: This algorithm, which uses data from the same depth-1 search, recommends letters based on the ratio of the expected number of positions filled to the expected number of lives lost.
Algorithm #3: This algorithm is similar to Algorithm #1, but uses data from a depth-2 search.
Algorithm #4: This algorithm is similar to Algorithm #2, but uses data from a depth-2 search.
Note:
In the above, expected means average (the probabilisitic mean), with all words that match the specified template being treated as equally likely.
Algorithm #4 appears to be the best of these. I believe that algorithm #4 is a nearly optimal indicator of the best guess, but further work must be done to establish this.
There is no mechanism for specifying the number of lives remaining, i.e., the number of wrong guesses that would cause one to lose the game. A somewhat different algorithm would be needed if one is in imminent danger of losing—in particular, if one has only a single life remaining.
Results are sensitive to the choice of dictionary. There is currently no mechanism for a user to provide his/her own dictionary.
The following pseudo-code explains the operation of the depth-1 search, which generates the data for Algorithms 1 and 2:
Find all words that match the template and that do not contain any excluded letters. Store these as `words`. Find all letters that appear in `words` but not in the template. Store these as `candidate_letters`. for letter in candidate_letters: for word in words: if the candidate letter appears in the word: Update the count of template positions filled (underscores replaced by letters) with this candidate letter. Note that the same letter may appear in more than one position. else: Update the count of words on which a life was lost with this candidate letter. Report results.
The following pseudo-code explains the operation of the depth-2 search, which generates the data for Algorithms 3 and 4:
Find all words that match the template and that do not contain any excluded letters. Store these as `words`. Find all letters that appear in `words` but not in the template. Store these as `candidate_letters`. for letter1 in candidate_letters: candidate_letters2= candidate_letters with letter1 removed for word in words: # (all words that match the original template) if the candidate letter appears in the word: Update the count of template positions filled (underscores replaced by letters) with this candidate letter. Note that the same letter may appear in more than one position. else: Update the count of words on which a life was lost with this candidate letter. for letter2 in candidate_letters2: Repeat above counting process for letter2. For letter1, record best results over all possible letter2s. Report results.
Expectations are taken over all words that match the original template.
The number of positions filled and the number of lives lost must count letter 1 and letter 2 together.
If a given letter 2 did not fill a position in any word, it represents a wasted guess that would not have been made by an intelligent player having knowledge of the dictionary against which he/she is playing.
I ran the program with the template _a___
, with the single
excluded letter e
, using the small spelling dictionary. The
program reported 221 matching words. Results from algorithms 3 and 4 are
moderately interesting.
Algorithm #3: Considering the expected number of template positions filled in after two guesses, the following letter(s) are recommended: i (0.92), l (0.97), n (0.93), r (1.00), s (1.00), t (0.99), Algorithm #4: Considering the expected number of template positions filled in divided by the expected number of lives lost, the following letter(s) are recommended: l (0.85), r (0.93), s (0.93), t (0.89),
The seven English letters used with the greatest frequency are e, t, a, o, i, n, and s. e and a have already been guessed, so this leaves t, o, i, n, and s. Note that although o appears second in this list, algorithms 3 and 4 do not recommend it, which is consistent with the fact that o does not occur with high frequency in the words that match the template.
This example demonstrates that English letter frequencies are a reasonable starting point for deciding what letters to guess in Hangman, but that one can do better by performing a search against the dictionary of allowed words (assuming that such a dictionary exists and that the hidden word has been chosen at random from this dictionary).
Last update: 30 Mar., 2014