Generate Alt Text with OpenAI Vision
Use Apple Shortcuts to automatically generate an image description using OpenAI’s GPT4.
This shortcut helps you to generate a quality image description to use as alt text for images you upload to the internet. Alt text helps people using screen readers to understand the context of an image, even if they can’t see it.
You can pass an image file or URL as input to this shortcut, and it will send it through OpenAI’s Vision API along with a specific prompt to generate a unique description for the image, including any text found in it.
It is intended that this image description be used as a starting point for your alt text. You should review it, and edit or add to it as needed.
The shortcut can work particularly well if you use it as a function within another shortcut to format the image’s description into the HTML or Markdown text for the uploaded image.
Please Note: This shortcut requires GPT-4 with vision, which “is currently available to all developers who have access to GPT-4 via the gpt-4-vision-preview model and the Chat Completions API which has been updated to support image inputs. This includes Pay-As-You-Go customers who have made a successful payment of $1 or more, or customers who subscribe to ChatGPT Plus.” Learn more: https://platform.openai.com/docs/guides/vision
You may have to pay for at least $1 worth of credit and then wait ~30 minutes or so for the API access to kick in. In my experience, it costs about $0.01 per image description that it generates on the pay-as-you-go plan.
Here’s the prompt that I use to coax Vision into creating image descriptions to my liking:
Please provide a functional, objective description of the provided image in no more than around 30 words so that someone who could not see it would be able to imagine it. If possible, follow an “object-action-context” framework: The object is the main focus. The action describes what’s happening, usually what the object is doing. The context describes the surrounding environment.
If there is text found in the image, it is very important that you transcribe all of it, even if it extends the word count beyond 30 words.
If there is no text found in the image, then there is no need to mention it.
You should not begin the description with any variation of “The image”.
If you have any suggestions for improvements, please let me know.
See It in Action
For example, it took this image:
and generated this description:
Two men stand arms crossed in front of a red wall with “LETTERKENNY” written on it. Text: “Letterkenny 2016-2023 — Jared Keeso & Jacob Tierney A wonderfully fast-paced, quick-witted, and surprisingly charming production. Great show that was well-executed and satisfyingly ended. Over and out.” Buttons below read “Poor,” “Okay,” and “Good,” with “Great” selected.
- An OpenAI API key with access to the GPT-4 with vision model
Thanks to TeamData on YouTube for step-by-step instructions on how to successfully organize the data to pass through this shortcut.
Thanks to Alex Chen for the object-action-context framework for writing a good image description.
Latest Release Notes
Version 1.1 - 2024-01-12
- Added an option to show the generated description at the end — a “test mode”, if you will.
- Added a step to convert .heic images to .jpeg before uploading them since the Vision API doesn’t support those. (Thanks @firstname.lastname@example.org)
- Download version 1.1
Version 1.0 - 2024-01-08
- Introducing ‘Generate Alt Text with OpenAI Vision’
- Download version 1.0
Thanks for checking out this shortcut! It’s part of the HeyDingus Shortcuts Library. If you’re sharing my shortcuts or modifying them (or see a bug or have a feature request), I’d love to hear from you — please give me a shout! And maybe consider a donation if you find this shortcut fun or useful. Thank you. ✌️