So, it turns out there’s a new text to speech engine on the block. It’s really good. Voices sound pretty human have little breaths and make sensible pauses and generally create well-formed speech. I wouldn’t say they’re as good as a great audiobook reader, but I’d say they’re better than the average audiobook reader. And you get to run it on your local machine.
There are also some new Edge-TTS voices, which if you haven’t played with them are really quite good, free?, and super-fast. Also, there are 300+ voices in many languages including about 50 for english. Most of the voices “VoicePersonalities” are set to “Friendly, Positive ” but there are some new ones that list things like warm, confident, authentic, honest, and rational.
I wanted to compare the voices and so I wrote a little tool that swaps between voices (code) but keeps your place in the recording. I had to spend a while trying to get the volume of all the recordings to be identical. I noticed that if a voice was slightly louder, I strongly preferred it. I wrote up a bit of code to try and fix that issue. I also struggled to find a piece of text that was a good test of the TTS engines, I picked a bit from The Fall of the House of Usher which is not in copyright and also has some words that are uncommon.
Then what I thought I really needed was a blind A/B comparison of the voices to see which one was the best. Which then necessitated some way to rank the choices. Initially I wrote some code to count the win/loss ratio for each of the voices but that seemed like not the best way. I worked up an Elo chess ranking system to sort the results.
I’ve run almost 200 blind A/B tests so far, I find the results pretty believable, but more would be better probably. If you want a quick TLDR; try af_bella and bm_lewis.
I hope to share the code for the testing/ranking soon but it’s all still in flux. It’s actually really hard to decide which paring of voices will give you the most information.
I’ve been interested in the conversion of photographic/video imagery to something that looks more like a drawing or painting for many years, about 10 years ago I started experimenting with piping video as frames through Photoshop manipulating the video so that it had a variety of non-photographic effects.
The issue with the Photoshop manipulations is that there’s no way for photoshop to understand the content of the images in a meaningful way — the pixels that make up a face are as relevant to the output as the trees in the background. For the drawing experiments I got around that to some degree by using a white backdrop, but the issue remains. (I know, I know, I need better testing footage, but sometimes you just want to start, and you shoot something)
I recently started looking back into image translation while playing around with making an Epaper picture frame. The Epaper display I picked out can only display black OR white, so you’re stuck with some kind of dithering to emulate grays or maybe if you could create a drawing, you’d be able to just use the black and white.
I played with a few online tools, and they’ve definitely gotten better. There are some that respond to the content of the image in a way that is pretty interesting, though the styles they have available are pretty limited, and of course you don’t really have much control over what they are doing.
I was flipping through Reddit posts about creating drawings from photos but they were mostly just “use website x, y, or z”. I did find a post that mentioned Automatic1111, but it’s kind of a mess to work with. Though reading more about Automatic1111 brought me to InstructPix2Pix, and on their Github info about how to run it in Python!!
Now I had everything I needed to run some initial experiments, so I wrote a bit of code to take art styles (Expressionism, Cubism, Art Nouveau, Technical Drawing, Scientific Illustration, Pointillism, Hatching, Chiaroscuro, and on and on) and mashed the words up with a list of art adjectives (textured, smooth, rough, dreamy, mysterious, whimsical, flowing, static, rhythmic, and on and on) so I could give InstructPix2Pix a prompt like “Make this photograph into an Art Nouveau work that’s textured and whimsical.” I ran 500 variations and looked them over. Some were interesting, most were not.
I have found some really odd and funny issues, if you prompt InstructPix2Pix with something like “Make this in the style of van Gogh” a lot of the time it’ll make the people in the picture into van Gogh.
I took some interesting style and adjective variations and started iterating on them, but the process wasn’t easy. So, I decided to write a bit of code to create a sort of prompt tree. The user enters a prompt, and with a call to OpenAi we transform that into 5 similar prompts each generating a picture, the user selects their favorite picture and maybe adds a bit more direction, then 5 more pictures are generated, and so on. This part is still a work in progress, but hopefully some code on Github soon.
Finally, I’ve tried applying some of these styles to video. Results are interesting but not consistent across frames. I think with more work I might be able to increase consistency — very tight and direct prompting generated by the prompt tree above will be helpful, also some additional preprocessing of frames to remove noise and increase contrast will help.
Tell the tool what factors are important to you, and it shows you the best cities based on those factors. You can adjust how much each factor matters by changing the percentages. For example, if having lots of parks nearby is important to you, you can give that a higher percentage. If you don’t care about population density, you can set that one to zero.
The URL changes as you change your numbers so if you want to share, just copy the URL!
Here’s a quick rundown of where the data comes from:
Cost of Living: Compiled from multiple sources: – Economic Policy Institute – Payscale – AdvisorSmith – Numbeo – BestPlaces, in order of preference.
Food Awards: James Beard awards excellent restaurants and chefs.
Arts Vibrancy: Data sourced from SMU DataArts’ Arts Vibrancy Map, based on counties. If a city lies in multiple counties, the score is proportionally assigned.
Park Walkability: Percentage of residents within a 10-minute walk to a park, sourced from the Trust for Public Land (TPL). Initially tried Walk Score, but it overemphasized downtown areas.
Climate Risk: Based on the ProPublica article’s table. The score is calculated by summing six different scores for each county, then converting from county to city proportions as mentioned above.
Days above 90F & Days Below 32F: Counted the number of days above/below for the last 10 years and then divided giving an average of the number of days.
(tldr; Best 5 words are Nares, Lares, Kaies, Tares, and Canes in that order.)
I grabbed a table of letter frequency (‘General Text’) from Wikipedia, but quickly realized that the letter frequencies for five letter words might be completely different than for general text. So I ran my own analysis on five letter words (‘5 Letter Words’).
The most common letter in five letter words is “S”, but “E” is still very popular!
For the first word you wouldn’t want to have any double (or triple letter words, lookin at you Ninny, Tatty, etc…) because that would reduce the total amount of information you’d get. Re-running the analysis on the list of five letter words without double and triple letter words reveals a slightly different frequency of letters (‘No Dupes’).
From the No Dupes list the fiver letters with the highest frequency are S, E, A, R, & O. Since I’m no good at rearranging letters in my head the Internet Anagram Server reveals one word “Arose”. Which seems to track pretty well with some other people.
“Arose” does an excellent job of meeting the letter frequencies, which gives it a high probability of returning green or yellow letters.
General Text | 5 Letter Words | No Dupes | |
---|---|---|---|
A | 8.20% | 9.24% | 8.84% |
B | 1.50% | 2.51% | 2.44% |
C | 2.80% | 3.13% | 3.42% |
D | 4.30% | 3.78% | 3.95% |
E | 13.00% | 10.27% | 9.31% |
F | 2.20% | 1.72% | 1.56% |
G | 2.00% | 2.53% | 2.58% |
H | 6.10% | 2.71% | 2.96% |
I | 7.00% | 5.80% | 6.31% |
J | 0.15% | 0.45% | 0.46% |
K | 0.77% | 2.32% | 2.43% |
L | 4.00% | 5.20% | 5.20% |
M | 2.50% | 3.05% | 3.06% |
N | 6.70% | 4.55% | 4.91% |
O | 7.50% | 6.84% | 6.51% |
P | 1.90% | 3.11% | 3.11% |
Q | 0.10% | 0.17% | 0.22% |
R | 6.00% | 6.41% | 6.76% |
S | 6.30% | 10.28% | 9.44% |
T | 9.10% | 5.08% | 5.11% |
U | 2.80% | 3.87% | 4.32% |
V | 0.98% | 1.07% | 1.11% |
W | 2.40% | 1.60% | 1.77% |
X | 0.15% | 0.44% | 0.52% |
Y | 2.00% | 3.20% | 3.12% |
Z | 0.07% | 0.67% | 0.57% |
Positional Frequency | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
A | 4.75% | 17.44% | 10.45% | 7.95% | 3.6% | A |
B | 7.37% | 0.5% | 2.34% | 1.49% | 0.5% | B |
C | 7.8% | 1.45% | 3.32% | 3.36% | 1.15% | C |
D | 5.46% | 0.58% | 2.61% | 3.82% | 7.27% | D |
E | 1.67% | 10.11% | 5.09% | 18.29% | 11.39% | E |
F | 5.3% | 0.13% | 0.83% | 1.11% | 0.42% | F |
G | 5.35% | 0.41% | 2.58% | 3.14% | 1.43% | G |
H | 3.8% | 4.7% | 0.94% | 2.13% | 3.26% | H |
I | 1.2% | 10.98% | 9.58% | 7.93% | 1.87% | I |
J | 1.56% | 0.07% | 0.38% | 0.28% | 0.01% | J |
K | 2.67% | 0.5% | 2.27% | 4.43% | 2.3% | K |
L | 4.78% | 5.97% | 6.37% | 5.47% | 3.41% | L |
M | 5.58% | 1.33% | 3.89% | 2.96% | 1.54% | M |
N | 2.25% | 2.69% | 8.05% | 6.81% | 4.73% | N |
O | 2.24% | 16.5% | 6.44% | 4.78% | 2.58% | O |
P | 7.01% | 1.81% | 2.58% | 2.96% | 1.21% | P |
Q | 0.78% | 0.14% | 0.13% | 0.02% | 0.02% | Q |
R | 4.67% | 8.11% | 10.24% | 5.42% | 5.35% | R |
S | 10.05% | 0.64% | 2.75% | 3.51% | 30.25% | S |
T | 6.08% | 1.66% | 4.64% | 6.61% | 6.57% | T |
U | 1.77% | 9.59% | 6.19% | 3.53% | 0.52% | U |
V | 2.09% | 0.34% | 1.98% | 1.12% | 0.04% | V |
W | 3.51% | 1.21% | 2.4% | 1.17% | 0.55% | W |
X | 0.11% | 0.47% | 1.21% | 0.12% | 0.67% | X |
Y | 1.44% | 2.4% | 1.8% | 0.87% | 9.11% | Y |
Z | 0.75% | 0.25% | 0.91% | 0.73% | 0.23% | Z |
If we want to maximize the number of green letters we could look at the letter frequency for each position in a 5 letter word.
“Arose” doesn’t have the best positional frequency. Unsurprisingly, the table reveals lots of words end in “ES”.
We can score a given word using the positional frequency table. Take “Fumes” for example. We’ll take each positional percentage (i.e. F’s percentage in column 1 is 5.3%), add them together, and divide by 5.
F | U | M | E | S | Total | Total/5 |
---|---|---|---|---|---|---|
5.3% | 9.59% | 3.89% | 18.29% | 30.25% | 67.32 | 13.46% |
This gives us a score of how well each word’s letters match the positional frequency.
Here are the top 10 words generated with this method:
Looks like other people score “Cares” pretty well too. Scrolling way down the rankings, “Arose” is 5891th.
I had one other thought about word ranking. If you guessed a word and none of the letters matched (all gray) then the list of words that remain can’t contain those letters. Words with more common letters will eliminate more words, whereas words with less common letters will eliminate fewer words.
If every letter in “Cares” was gray that would reduce the total number of possible words by 93.09%.
I looked at how many words would be eliminated if each word in the list was all gray. Which, by the way is 168,272,784 comparisons.
Some pretty uncommon words float to the top 10 of this list, but they all look like high quality starting words. Also “Arose” rose close to the top!
Now that we’ve got two solid ranking systems, let’s combine them and see what floats to the top. For this I’m taking a word’s position in each list and adding them together and sorting. For example, “Caves” is in position 81 of the positional frequency list and in position 853 of the elimination list giving it a score of 934. After adding all the words scores together and sorting the list we get to the final combined scores list.
Since we’ve built out this system, we might as well look at the worst rated starting words.
This is the third part of the reverse engineering Oatly Project (Part 1 & Part 2). I’ve done more experimentation and simplified the recipe. Make sure to read the first two parts for more context!
I bought a Brix Refractometer to measure the amount of sugar over time, and found that I could do a single cooking temperature for one hour and get similar sugar levels to the previous recipe.
Also, I made corn milk with this recipe. It wasn’t great. Maybe some more experimentation would help. Weirdly, it smelled exactly like the oat milk.
Water | 680 gr |
Rolled Oats | 80 gr |
Malted Barley | 8 gr |
Canola Oil | 22 gr |
Salt | 1 gr |
*Temperature is critical here; this recipe will be hard to reproduce without an immersion circulator. If the temperature goes much beyond 150F (65C) the enzymes will denature and stop converting the starches in the oats to sugars.
**The toasting is different for different oats. Preheat your oven to 250F (121C), put some oats on a cookie sheet, and set a timer for 4 minutes. At 4 minutes, pull a few oats out to taste. The oats done are when they have a hint of roastiness with no bitter/burnt/off flavor. If they aren’t done, give it another 4 minutes. Different brands of oats we’ve tested have required between 4 and 12 minutes. Instant oats seem to need longer, while fancier non-instant oats need shorter times.