One of my most-used shortcuts, and the one that gets mentioned the most online, is my Generate Alt Text with OpenAI Vision shortcut. Earlier today I received a request to build in support for generating alt text for multiple images at once. It was already on my feature roadmap, and I knew it wouldn’t be hard to do, so I set to work and made it happen. Now if you share multiple images into the shortcut, it’ll generate them all as a block of text, each one separated by a blank line. Check!
But then as I readied the new version to update in the Shortcuts Library, I remembered that there was something else I’d been meaning to try. See, I’ve written a custom prompt for the LLM to follow to generate the alt text. It asks the AI to follow an object-action-context framework, keep the word count reasonable, be judicious about punctuation, and more. But someone (Matt Birchler? Chris Lawley?) mentioned recently that LLMs already have a good understanding of quality Alt Text and do a pretty fine even if you simply ask it to “generate alt text.” So duplicated the shortcut, simplified the prompt in one of them, and ran some tests.
I was impressed!
Testing
Here are the two prompts I used.
Simple prompt: Please create alt text for this image.
Complex prompt:
Please provide a functional, objective description of the provided image in no more than around 30 words so that someone who could not see it would be able to imagine it. If possible, follow an “object-action-context” framework. The object is the main focus. The action describes what’s happening, usually what the object is doing. The context describes the surrounding environment.
If there is text found in the image, do your best to transcribe the important bits, even if it extends the word count beyond 30 words. It should not contain quotation marks, as those tend to cause issues when rendered on the web.
If there is no text found in the image, then there is no need to mention it.
You should not begin the description with any variation of “The image”.
Example A
Simple: Three people are standing in deep snow at night in front of the Adirondack Mountain Club Cascade Welcome Center sign. They are dressed in winter gear and appear to be engaging in a winter activity. Two people are standing with ski poles, and one is sitting with ski equipment. The scene is illuminated by artificial light, with snow falling around them.
Complex: Three people smile, posing with ski gear amidst deep snow. Behind them, a sign reads Adirondack Mountain Club, Cascade Welcome Center, Cross Country Ski Trails & Shop. It’s nighttime, with snow falling.
Example B
Simple: A golden retriever lying in deep snow on a wooden deck. The dog is playfully chewing on a piece of wood, and snowflakes are gently falling around it. The deck is partially covered in snow, and there is a wooden railing and post visible in the background.
Complex: Dog chewing a stick, lying on thick snow, surrounded by a wooden porch and snowflakes falling gently.
Example C
Simple: Smartphone screen showing a reply message being composed with the text: “@Annie @Miraz Smart replies are so great!” The interface includes buttons for formatting text and sending the message. The top bar shows a clock icon indicating 2:53 and a snowflake symbol.
Complex: Phone screen displays a messaging app interface, typing a reply mentioning Annie and Miraz. Message says Smart replies are so great. Keyboard and send button at the bottom.
Example D
Simple: The image shows a sign titled “Ski the Face: A Whiteface Lake Placid Snowsports School” mounted on a wooden wall. The sign helps skiers identify their ability level, ranging from “1. I have never skied” to “9. I ski all terrain with confidence,” categorized as Easy, Intermediate, and Advanced. It also includes “Park Smart” guidelines such as “Start Small,” “Make a Plan,” “Always Look,” “Respect,” and “Take it Easy.” The sign features various icons related to skiing.
Complex: Signboard displays ski levels, safety tips at a snowsports school on a wooden wall. Text reads: Ski the Face, Whiteface Lake Placid, What is your ski ability level, and Park Smart guidelines.
Example E
Simple: Cover image and review of the movie “The Wild Robot” directed by Chris Sanders, set to release in 2024. The cover features a large, round robot with glowing blue eyes peeking through the branches of a tree. Below the image is the title “The Wild Robot.” The review describes the movie as “cute” and “enjoyable,” mentioning that it explores the predator/prey dynamic of wild animals without sugarcoating. The reviewer notes some effective emotional moments, though finds them predictable, and gives a moderate thumbs-up rating.
Complex: Robot with glowing eyes sits surrounded by trees. Text: The Wild Robot, 2024, Chris Sanders. Review mentions the predator/prey dynamic, heartwarming moments, and mixed expectations. Rating options include thumbs up and down.
What do you think?
Personally, I’m pretty happy with the results of the simplified prompt. It’s certainly more wordy — probably because it includes no word limit — but not overly so. It also seems to do a better job of pulling text out of images with a good mix of relevant quotes and summaries.
One thing that gives me pause is that uses more quotation marks within the description. I’ve sometimes had issues with that, particularly when inserting the alt text into raw HTML (not Markdown syntax). But I’m willing to give it the benefit of the doubt for now, and can always add specific instructions around punctuation back in if needed.
While I was rather proud of the custom prompt that I wrote and how it referenced a framework that I read from Alex Chen about writing good alt text, the idea of not having to maintain it is certainly appealing. I’m also optimistic that by being more generic with the prompt but specifically telling it that it’s for alt text, the results will improve over time as it better understands what “good” alt text should look like. And as those trends and methods change over time, my shortcut will adapt alongside them.
The new version (1.3) with the simplified prompt and bulk image support is live in the Shortcuts Library. Please let me know what you think about the output of the new prompt, if you experience any issues with it, or have suggestions about how I could improve it.