February 13th 2022
(tldr; Best 5 words are Nares, Lares, Kaies, Tares, and Canes in that order.)
I grabbed a table of letter frequency (‘General Text’) from Wikipedia, but quickly realized that the letter frequencies for five letter words might be completely different than for general text. So I ran my own analysis on five letter words (‘5 Letter Words’).
The most common letter in five letter words is “S”, but “E” is still very popular!
For the first word you wouldn’t want to have any double (or triple letter words, lookin at you Ninny, Tatty, etc…) because that would reduce the total amount of information you’d get. Re-running the analysis on the list of five letter words without double and triple letter words reveals a slightly different frequency of letters (‘No Dupes’).
From the No Dupes list the fiver letters with the highest frequency are S, E, A, R, & O. Since I’m no good at rearranging letters in my head the Internet Anagram Server reveals one word “Arose”. Which seems to track pretty well with some other people.
“Arose” does an excellent job of meeting the letter frequencies, which gives it a high probability of returning green or yellow letters.
General Text | 5 Letter Words | No Dupes | |
---|---|---|---|
A | 8.20% | 9.24% | 8.84% |
B | 1.50% | 2.51% | 2.44% |
C | 2.80% | 3.13% | 3.42% |
D | 4.30% | 3.78% | 3.95% |
E | 13.00% | 10.27% | 9.31% |
F | 2.20% | 1.72% | 1.56% |
G | 2.00% | 2.53% | 2.58% |
H | 6.10% | 2.71% | 2.96% |
I | 7.00% | 5.80% | 6.31% |
J | 0.15% | 0.45% | 0.46% |
K | 0.77% | 2.32% | 2.43% |
L | 4.00% | 5.20% | 5.20% |
M | 2.50% | 3.05% | 3.06% |
N | 6.70% | 4.55% | 4.91% |
O | 7.50% | 6.84% | 6.51% |
P | 1.90% | 3.11% | 3.11% |
Q | 0.10% | 0.17% | 0.22% |
R | 6.00% | 6.41% | 6.76% |
S | 6.30% | 10.28% | 9.44% |
T | 9.10% | 5.08% | 5.11% |
U | 2.80% | 3.87% | 4.32% |
V | 0.98% | 1.07% | 1.11% |
W | 2.40% | 1.60% | 1.77% |
X | 0.15% | 0.44% | 0.52% |
Y | 2.00% | 3.20% | 3.12% |
Z | 0.07% | 0.67% | 0.57% |
Positional Frequency | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
A | 4.75% | 17.44% | 10.45% | 7.95% | 3.6% | A |
B | 7.37% | 0.5% | 2.34% | 1.49% | 0.5% | B |
C | 7.8% | 1.45% | 3.32% | 3.36% | 1.15% | C |
D | 5.46% | 0.58% | 2.61% | 3.82% | 7.27% | D |
E | 1.67% | 10.11% | 5.09% | 18.29% | 11.39% | E |
F | 5.3% | 0.13% | 0.83% | 1.11% | 0.42% | F |
G | 5.35% | 0.41% | 2.58% | 3.14% | 1.43% | G |
H | 3.8% | 4.7% | 0.94% | 2.13% | 3.26% | H |
I | 1.2% | 10.98% | 9.58% | 7.93% | 1.87% | I |
J | 1.56% | 0.07% | 0.38% | 0.28% | 0.01% | J |
K | 2.67% | 0.5% | 2.27% | 4.43% | 2.3% | K |
L | 4.78% | 5.97% | 6.37% | 5.47% | 3.41% | L |
M | 5.58% | 1.33% | 3.89% | 2.96% | 1.54% | M |
N | 2.25% | 2.69% | 8.05% | 6.81% | 4.73% | N |
O | 2.24% | 16.5% | 6.44% | 4.78% | 2.58% | O |
P | 7.01% | 1.81% | 2.58% | 2.96% | 1.21% | P |
Q | 0.78% | 0.14% | 0.13% | 0.02% | 0.02% | Q |
R | 4.67% | 8.11% | 10.24% | 5.42% | 5.35% | R |
S | 10.05% | 0.64% | 2.75% | 3.51% | 30.25% | S |
T | 6.08% | 1.66% | 4.64% | 6.61% | 6.57% | T |
U | 1.77% | 9.59% | 6.19% | 3.53% | 0.52% | U |
V | 2.09% | 0.34% | 1.98% | 1.12% | 0.04% | V |
W | 3.51% | 1.21% | 2.4% | 1.17% | 0.55% | W |
X | 0.11% | 0.47% | 1.21% | 0.12% | 0.67% | X |
Y | 1.44% | 2.4% | 1.8% | 0.87% | 9.11% | Y |
Z | 0.75% | 0.25% | 0.91% | 0.73% | 0.23% | Z |
If we want to maximize the number of green letters we could look at the letter frequency for each position in a 5 letter word.
“Arose” doesn’t have the best positional frequency. Unsurprisingly, the table reveals lots of words end in “ES”.
We can score a given word using the positional frequency table. Take “Fumes” for example. We’ll take each positional percentage (i.e. F’s percentage in column 1 is 5.3%), add them together, and divide by 5.
F | U | M | E | S | Total | Total/5 |
---|---|---|---|---|---|---|
5.3% | 9.59% | 3.89% | 18.29% | 30.25% | 67.32 | 13.46% |
This gives us a score of how well each word’s letters match the positional frequency.
Here are the top 10 words generated with this method:
Looks like other people score “Cares” pretty well too. Scrolling way down the rankings, “Arose” is 5891th.
I had one other thought about word ranking. If you guessed a word and none of the letters matched (all gray) then the list of words that remain can’t contain those letters. Words with more common letters will eliminate more words, whereas words with less common letters will eliminate fewer words.
If every letter in “Cares” was gray that would reduce the total number of possible words by 93.09%.
I looked at how many words would be eliminated if each word in the list was all gray. Which, by the way is 168,272,784 comparisons.
Some pretty uncommon words float to the top 10 of this list, but they all look like high quality starting words. Also “Arose” rose close to the top!
Now that we’ve got two solid ranking systems, let’s combine them and see what floats to the top. For this I’m taking a word’s position in each list and adding them together and sorting. For example, “Caves” is in position 81 of the positional frequency list and in position 853 of the elimination list giving it a score of 934. After adding all the words scores together and sorting the list we get to the final combined scores list.
Since we’ve built out this system, we might as well look at the worst rated starting words.