---
title: 'Translations of Photographs with “timbrooks/instruct-pix2pix”'
date: '2025-01-02T19:59:56+00:00'
type: post
word_count: 685
char_count: 4176
tokens: 891
categories:
  - Uncategorized
---

# Translations of Photographs with “timbrooks/instruct-pix2pix”

I’ve been interested in the conversion of photographic/video imagery to something that looks more like a drawing or painting for many years, about 10 years ago I started experimenting with piping video as frames through Photoshop manipulating the video so that it had a variety of non-photographic effects.





The issue with the Photoshop manipulations is that there’s no way for photoshop to understand the content of the images in a meaningful way — the pixels that make up a face are as relevant to the output as the trees in the background. For the drawing experiments I got around that to some degree by using a white backdrop, but the issue remains. (I know, I know, I need better testing footage, but sometimes you just want to start, and you shoot something)





![](https://gregr.org/wp-content/uploads/2025/01/Screenshot-2025-01-02-101734.png)Epaper Display

I recently started looking back into image translation while playing around with making an Epaper picture frame. The Epaper display I picked out can only display black OR white, so you’re stuck with some kind of dithering to emulate grays or maybe if you could create a drawing, you’d be able to just use the black and white.

![](https://gregr.org/wp-content/uploads/2025/01/unnamed-2.png)<https://portraitart.app/>

I played with a few online tools, and they’ve definitely gotten better. There are some that respond to the content of the image in a way that is pretty interesting, though the styles they have available are pretty limited, and of course you don’t really have much control over what they are doing.

I was flipping through Reddit posts about creating drawings from photos but they were mostly just “use website x, y, or z”. I did find a post that mentioned [Automatic1111](https://www.reddit.com/r/StableDiffusion/comments/10jqkd5/sketch_function_in_automatic1111/), but it’s kind of a mess to work with. Though reading more about Automatic1111 brought me to [InstructPix2Pix](https://www.reddit.com/r/MachineLearning/comments/10nxqfg/r_instructpix2pix_learning_to_follow_image/), and on their Github info about [how to run it in Python](https://github.com/timothybrooks/instruct-pix2pix?tab=readme-ov-file#instructpix2pix-in--diffusers)!!

Now I had everything I needed to run some initial experiments, so I wrote a bit of code to take art styles (Expressionism, Cubism, Art Nouveau, Technical Drawing, Scientific Illustration, Pointillism, Hatching, Chiaroscuro, and on and on) and mashed the words up with a list of art adjectives (textured, smooth, rough, dreamy, mysterious, whimsical, flowing, static, rhythmic, and on and on) so I could give InstructPix2Pix a prompt like “Make this photograph into an **Art Nouveau** work that’s **textured** and **whimsical**.” I ran 500 variations and looked them over. Some were interesting, most were not.

![](https://gregr.org/wp-content/uploads/2025/01/Screenshot-2025-01-02-103531.png)Sample Output from InstructPix2Pix

![van Goghes](https://gregr.org/wp-content/uploads/2025/01/van-gogh.jpg)

I have found some really odd and funny issues, if you prompt InstructPix2Pix with something like “Make this in the style of van Gogh” a lot of the time it’ll make the people in the picture **into van Gogh**.

I took some interesting style and adjective variations and started iterating on them, but the process wasn’t easy. So, I decided to write a bit of code to create a sort of prompt tree. The user enters a prompt, and with a call to OpenAi we transform that into 5 similar prompts each generating a picture, the user selects their favorite picture and maybe adds a bit more direction, then 5 more pictures are generated, and so on. This part is still a work in progress, but hopefully some code on Github soon.

Finally, I’ve tried applying some of these styles to video. Results are interesting but not consistent across frames. I think with more work I might be able to increase consistency — very tight and direct prompting generated by the prompt tree above will be helpful, also some additional preprocessing of frames to remove noise and increase contrast will help.
