Captions
The caption feature renders visible text directly onto generated images — useful for posters, banners, social media graphics, and book covers where the title or tagline needs to appear in the image itself.
Enabling captions
In Step 3 (Generate), toggle Include Captions on. When enabled, any caption field in your image list will be composited onto the image after generation.
If an image row has no caption value, no overlay is applied — the image is saved as-is.
Adding captions to your image list
Add a caption column to your CSV or YAML file:
title,description,caption
"Summer Festival","Vibrant outdoor music festival at sunset","Summer Beats 2025"
"Product Launch","Clean minimal product shot on white","Now Available"
"Team Photo","Studio portrait of creative team",
Rows with an empty caption field generate normally without an overlay.
The caption field also accepts these aliases: text, overlay, label, tagline.
Caption options
When Include Captions is on, four additional controls appear:
Position — where the caption bar is placed on the image:
| Option | Description |
|---|---|
| Bottom | Bar along the bottom edge (default) |
| Top | Bar along the top edge |
| Middle | Lower-third placement |
| Diagonal | Angled band across the centre |
| Mix (all) | Cycles through all four positions across the batch |
Font — the typeface used for the caption text:
| Option | Typeface |
|---|---|
| Serif | Georgia — classic, editorial feel |
| Sans-serif | Arial — clean and modern |
| Impact | Impact — bold, poster-style |
Size — font size relative to the image width:
| Option | Approximate size on 1024px image |
|---|---|
| Normal | ~32px |
| Large | ~46px |
| X-Large | ~64px |
All variants — saves four copies of each image, one per position (Bottom, Top, Middle, Diagonal). No extra API cost — the image is generated once and composited four times.
Also save without caption — saves an additional plain copy of each image alongside the captioned version. Useful when you want both versions.
How it works
After the AI generates each image, the app reads the caption field and composites an SVG text overlay using sharp. The bar colour (dark or light) is chosen automatically by sampling the luminance of the region the bar will cover — so white text appears on dark areas and dark text on light areas.
The original API image is not modified. The captioned version is a new PNG written to your output folder.