Video Captioning API
Upload a video, get a captioned video back
A captioning service you call instead of building. Upload a clip, create a task, then poll the task status or get a webhook when the captioned MP4 is ready. ZapCap runs the transcription, caption layout, styling, and render so your product or CMS pipeline does not have to.
What you would build vs what you call
A captioning feature isn't one API call when you build it yourself — it's ASR, layout, styling, a render farm, a job queue, and retries. ZapCap is the whole step behind one task endpoint.
OPTION 01
- Returns text / SRT
- You build caption layout
- You build the render farm
- You build the job queue
OPTION 02
- Full control
- GPU / worker ops to run
- You own retries and scaling
- Months to ship and maintain
OPTION 03 · YOU ARE HERE
- Transcribe + style + render included
- Async tasks, webhook or poll
- Finished MP4 (or overlay) returned
- No ASR or render infra to run
OPTION 04
- Full video generation
- Captions are one element
- You assemble the timeline
- Heavier to add captioning
Async captioning in a few calls
POST the video, POST a task, then either poll GET the task status or attach a webhook. When the render completes, the task status carries a downloadUrl — drop the captioned MP4 straight into your pipeline.
- 1
Upload your video
POST the file to /videos. We stream it to storage and hand you back a videoId.
POSTPOST /videos - 2
Create the captioning task
One POST starts transcription, styling and rendering with your chosen template. Add a notification webhook to skip polling.
POSTPOST /videos/:id/task - 3
Receive the webhook
We POST status updates to your endpoint as the render moves through transcribing → rendering → completed.
HOOKPOST → your URL - 4
Download the finished render
Burned-in subtitles, served from a global CDN. No watermark. MP4 ready for any social platform.
GETGET renderUrl
POST /videos·Upload your video
Same upload, multiple caption styles
Pick the templateId that fits your product. The captioning task returns a fully rendered MP4 — no client-side compositing, no subtitle track for the player to honor.
Each output is the same source rendered with a different template. The render runs server-side as part of the task; you fetch the result by polling or webhook.
Default look with one field
Let users tweak the rest.
Ship a single templateId as your product default, or expose individual renderOptions so your end users can pick animation, emphasis, font, and position. All toggle independently on the task body.
- Templates — Beast, Hormozi, Tracy, Devin, plus 25 more (29 presets) — wire one as your default caption look.
- Animation — word-by-word pops, karaoke fill, fade in/out — toggle per task.
- Keyword emphasis — flag punchwords; ZapCap colors / scales / boxes them automatically.
- Layout — font, color, stroke, shadow, words per cue, vertical position with safe-zone math.
- Aspect ratios — render 9:16, 1:1, 16:9 from one uploaded source.
{
"templateId": "21327a45-df89-46bc-8d56-34b8d29d3a0e",
"renderOptions": {
"subsOptions": {
"emphasizeKeywords": true,
"animation": true,
"displayWords": 3
},
"styleOptions": {
"fontUppercase": true,
"fontShadow": "m"
}
}
}Auto-transcribe — or bring your own
Auto-transcribe
For most pipelines, let ZapCap transcribe, split, and time the captions. Surface the cues to your users for review, or render straight through.
- Inspect the transcript on the task status, edit via PUT /videos/:id/task/:taskId/transcript
- Let end users approve or correct cues before render
- Reuse one transcript across multiple template variants
Bring your own transcript
Already have cues from your own ASR, CMS, or translation vendor? Send them and skip transcription, so existing copy stays exactly as approved.
- Supported via the SRT-to-burned-in workflow
- Preserve approved product names, claims, disclaimers
- Render the same transcript into N styles for variants
Return the format your product needs
Burned-in MP4
Captions rendered into the frames — the default for clips you serve back to users for social, ads, and sharing.
Transparent overlay
A caption-only layer with alpha preserved, for products that composite captions over their own edits.
Green-screen layer
For downstream tools without alpha support. Caption layer on a #04F404 canvas you key out.
Burned-in is best for distribution-ready clips inside your product. If you also need an accessibility track, expose the underlying transcript and store an SRT/VTT alongside the MP4.
Caption every user, in their language
Set a target language on the task and ZapCap transcribes, translates, and lays out captions for that language. CJK and Thai use language-aware line-breaking — not whitespace splitting.
- Caption uploads from a global user base in their own language
- A brand-term dictionary biases transcription toward your product names
- Language-aware layout for Chinese, Japanese, Thai
Per-minute, usage-based credits
Pay for the minutes you caption. Pass the cost through to your users or absorb it — credits scale with the videos that flow through your pipeline.
- Top up credits to keep tasks flowing in production
- Volume credits available at scale
- No per-seat fee — pay for renders, not users
Indicative starting rate. Final pricing depends on render mode and output format. API access requires a Pro plan plus credits.
A video SaaS shipped captioning as a feature in days by calling the ZapCap API instead of building ASR, a render farm, and a job queue
User uploads create a task; a webhook returns the captioned MP4 into their app. The team skipped owning transcription models, ffmpeg workers, and retry logic, and focused on their product.
Create a captioning task in plain HTTP
Upload a video, create a task with your templateId, then poll the task status or receive a signed webhook when the captioned MP4 is ready.
- POST /videos for file uploads or POST /videos/url for hosted sources
- POST /videos/{videoId}/task starts transcription, styling, and rendering
- GET /videos/{videoId}/task/{taskId} returns status and downloadUrl
VIDEO_ID=$(curl -s -X POST "https://api.zapcap.ai/videos" \
-H "x-api-key: $ZAPCAP_KEY" \
-F "file=@clip.mp4" | jq -r .id)
TASK_ID=$(curl -s -X POST "https://api.zapcap.ai/videos/$VIDEO_ID/task" \
-H "x-api-key: $ZAPCAP_KEY" \
-H "Content-Type: application/json" \
-d '{
"templateId": "d46bb0da-cce0-4507-909d-fa8904fb8ed7",
"autoApprove": true,
"language": "en",
"notification": {
"type": "webhook",
"notificationsFor": ["render"],
"recipient": "https://your.app/api/zapcap-webhook"
}
}' | jq -r .taskId)
curl -H "x-api-key: $ZAPCAP_KEY" \
"https://api.zapcap.ai/videos/$VIDEO_ID/task/$TASK_ID"About the Video Captioning API
It is an HTTP interface for adding captions to video programmatically. With ZapCap you upload a video, create a task, and receive a finished captioned video — transcription, caption layout, styling, and rendering all happen server-side. That is different from a speech-to-text API, which returns only text or an SRT for you to render yourself.
Related caption rendering APIs
Subtitle API
The umbrella styled subtitle rendering API: video in, captioned video out.
Read moreWebhook Video Captioning
Async caption workflows with signed events, retries, and eventId-based dedupe.
Read moreBurned-In Subtitles API
Permanent, styled captions baked into an MP4 for social and ads.
Read moreAnimated Captions API
TikTok / Reels / Shorts caption styles, rendered into the frames.
Read moreFor AI video SaaS
Add caption rendering to your product without building ASR, workers, and ffmpeg.
Read morevs Creatomate API
Captioning service vs. general video automation: when each fits your pipeline.
Read moreAdd captioning to your product through the API
Create a key on a Pro plan and buy credits in the dashboard.