Greg.Randall

Wordle: Best Starting Word

February 13th 2022

(tldr; Best 5 words are Nares, Lares, Kaies, Tares, and Canes in that order.)

Letter Frequency

I grabbed a table of letter frequency (‘General Text’) from Wikipedia, but quickly realized that the letter frequencies for five letter words might be completely different than for general text. So I ran my own analysis on five letter words (‘5 Letter Words’).

The most common letter in five letter words is “S”, but “E” is still very popular!

For the first word you wouldn’t want to have any double (or triple letter words, lookin at you Ninny, Tatty, etc…) because that would reduce the total amount of information you’d get. Re-running the analysis on the list of five letter words without double and triple letter words reveals a slightly different frequency of letters (‘No Dupes’).

From the No Dupes list the fiver letters with the highest frequency are S, E, A, R, & O. Since I’m no good at rearranging letters in my head the Internet Anagram Server reveals one word “Arose”. Which seems to track pretty well with some other people.

“Arose” does an excellent job of meeting the letter frequencies, which gives it a high probability of returning green or yellow letters.

  General
Text
5 Letter
Words
No
Dupes
A 8.20% 9.24% 8.84%
B 1.50% 2.51% 2.44%
C 2.80% 3.13% 3.42%
D 4.30% 3.78% 3.95%
E 13.00% 10.27% 9.31%
F 2.20% 1.72% 1.56%
G 2.00% 2.53% 2.58%
H 6.10% 2.71% 2.96%
I 7.00% 5.80% 6.31%
J 0.15% 0.45% 0.46%
K 0.77% 2.32% 2.43%
L 4.00% 5.20% 5.20%
M 2.50% 3.05% 3.06%
N 6.70% 4.55% 4.91%
O 7.50% 6.84% 6.51%
P 1.90% 3.11% 3.11%
Q 0.10% 0.17% 0.22%
R 6.00% 6.41% 6.76%
S 6.30% 10.28% 9.44%
T 9.10% 5.08% 5.11%
U 2.80% 3.87% 4.32%
V 0.98% 1.07% 1.11%
W 2.40% 1.60% 1.77%
X 0.15% 0.44% 0.52%
Y 2.00% 3.20% 3.12%
Z 0.07% 0.67% 0.57%
Positional Frequency
  1 2 3 4 5  
A 4.75% 17.44% 10.45% 7.95% 3.6% A
B 7.37% 0.5% 2.34% 1.49% 0.5% B
C 7.8% 1.45% 3.32% 3.36% 1.15% C
D 5.46% 0.58% 2.61% 3.82% 7.27% D
E 1.67% 10.11% 5.09% 18.29% 11.39% E
F 5.3% 0.13% 0.83% 1.11% 0.42% F
G 5.35% 0.41% 2.58% 3.14% 1.43% G
H 3.8% 4.7% 0.94% 2.13% 3.26% H
I 1.2% 10.98% 9.58% 7.93% 1.87% I
J 1.56% 0.07% 0.38% 0.28% 0.01% J
K 2.67% 0.5% 2.27% 4.43% 2.3% K
L 4.78% 5.97% 6.37% 5.47% 3.41% L
M 5.58% 1.33% 3.89% 2.96% 1.54% M
N 2.25% 2.69% 8.05% 6.81% 4.73% N
O 2.24% 16.5% 6.44% 4.78% 2.58% O
P 7.01% 1.81% 2.58% 2.96% 1.21% P
Q 0.78% 0.14% 0.13% 0.02% 0.02% Q
R 4.67% 8.11% 10.24% 5.42% 5.35% R
S 10.05% 0.64% 2.75% 3.51% 30.25% S
T 6.08% 1.66% 4.64% 6.61% 6.57% T
U 1.77% 9.59% 6.19% 3.53% 0.52% U
V 2.09% 0.34% 1.98% 1.12% 0.04% V
W 3.51% 1.21% 2.4% 1.17% 0.55% W
X 0.11% 0.47% 1.21% 0.12% 0.67% X
Y 1.44% 2.4% 1.8% 0.87% 9.11% Y
Z 0.75% 0.25% 0.91% 0.73% 0.23% Z

If we want to maximize the number of green letters we could look at the letter frequency for each position in a 5 letter word.

“Arose” doesn’t have the best positional frequency. Unsurprisingly, the table reveals lots of words end in “ES”.

We can score a given word using the positional frequency table. Take “Fumes” for example. We’ll take each positional percentage (i.e. F’s percentage in column 1 is 5.3%), add them together, and divide by 5.

FUMESTotalTotal/5
5.3%9.59%3.89%18.29%30.25%67.3213.46%

This gives us a score of how well each word’s letters match the positional frequency.

Here are the top 10 words generated with this method:

  1. Cares – 16.80%
  2. Bares – 16.71%
  3. Pares – 16.64%
  4. Cores – 16.61%
  5. Bores – 16.53%
  6. Tares – 16.46%
  7. Pores – 16.46%
  8. Canes – 16.36%
  9. Mares – 16.36%
  10. Dares – 16.33%

Looks like other people score “Cares” pretty well too. Scrolling way down the rankings, “Arose” is 5891th.


Elimination of Other Words

I had one other thought about word ranking. If you guessed a word and none of the letters matched (all gray) then the list of words that remain can’t contain those letters. Words with more common letters will eliminate more words, whereas words with less common letters will eliminate fewer words.

If every letter in “Cares” was gray that would reduce the total number of possible words by 93.09%.

I looked at how many words would be eliminated if each word in the list was all gray. Which, by the way is 168,272,784 comparisons.

Some pretty uncommon words float to the top 10 of this list, but they all look like high quality starting words. Also “Arose” rose close to the top!

  1. Toeas – 95.72%
  2. Stoae – 95.72%
  3. Aloes – 95.68%
  4. Aeons – 95.64%
  5. Arose – 95.55%
  6. Aeros – 95.55%
  7. Soare – 95.55%
  8. Aesir – 95.17%
  9. Reais – 95.17%
  10. Serai – 95.17%

Best Rated Words

Now that we’ve got two solid ranking systems, let’s combine them and see what floats to the top. For this I’m taking a word’s position in each list and adding them together and sorting. For example, “Caves” is in position 81 of the positional frequency list and in position 853 of the elimination list giving it a score of 934. After adding all the words scores together and sorting the list we get to the final combined scores list.

  1. Nares
  2. Lares
  3. Kaies
  4. Tares
  5. Canes
  6. Cares
  7. Lanes
  8. Rales
  9. Rates
  10. Tales
  11. Cates
  12. Hares
  13. Lores
  14. Nates
  15. Taces
  16. Manes
  17. Rones
  18. Mares
  19. Races
  20. Yates
  21. Panes
  22. Pares
  23. Gares
  24. Aures
  25. Roles
  26. Yales
  27. Dates
  28. Roues
  29. Aunes
  30. Dares

Worst Rated Words

Since we’ve built out this system, we might as well look at the worst rated starting words.

  1. Oxbow
  2. Xylyl
  3. Immix
  4. Infix
  5. Fluff
  6. Ungum
  7. Undug
  8. Whizz
  9. Urubu
  10. Uhuru
  11. Cwtch
  12. Ictic
  13. Chuff
  14. Whiff
  15. Jugum
  16. Kudzu
  17. Whump
  18. Phpht
  19. Zhomo
  20. Gyppo
  21. Ghyll