Video Captioning API

Upload a video, get a captioned video back

A captioning service you call instead of building. Upload a clip, create a task, then poll the task status or get a webhook when the captioned MP4 is ready. ZapCap runs the transcription, caption layout, styling, and render so your product or CMS pipeline does not have to.

Create API key Read API docs

Async tasks · webhook or poll · usage-based credits · no ASR or render infra to run

Workflow Demo Styling Transcripts Output modes Languages Pricing Proof Quickstart FAQ

Buy the captioning step

What you would build vs what you call

A captioning feature isn't one API call when you build it yourself — it's ASR, layout, styling, a render farm, a job queue, and retries. ZapCap is the whole step behind one task endpoint.

OPTION 01

Speech-to-text only

Whisper · Deepgram · AssemblyAI

Returns text / SRT
You build caption layout
You build the render farm
You build the job queue

OPTION 02

Self-hosted pipeline

ASR + ffmpeg + queue workers

Full control
GPU / worker ops to run
You own retries and scaling
Months to ship and maintain

OPTION 03 · YOU ARE HERE

ZapCap · captioning service

One task endpoint, finished video out

Transcribe + style + render included
Async tasks, webhook or poll
Finished MP4 (or overlay) returned
No ASR or render infra to run

OPTION 04

Video automation

Creatomate · Shotstack · JSON2Video

Full video generation
Captions are one element
You assemble the timeline
Heavier to add captioning

API workflow

Async captioning in a few calls

POST the video, POST a task, then either poll GET the task status or attach a webhook. When the render completes, the task status carries a downloadUrl — drop the captioned MP4 straight into your pipeline.

1
Upload your video
POST the file to /videos. We stream it to storage and hand you back a videoId.
POSTPOST /videos
2
Create the captioning task
One POST starts transcription, styling and rendering with your chosen template. Add a notification webhook to skip polling.
POSTPOST /videos/:id/task
3
Receive the webhook
We POST status updates to your endpoint as the render moves through transcribing → rendering → completed.
HOOKPOST → your URL
4
Download the finished render
Burned-in subtitles, served from a global CDN. No watermark. MP4 ready for any social platform.
GETGET renderUrl

Step 1 / 4·~2s

1import { readFileSync } from "node:fs";

3const form = new FormData();

4form.append(

5 "file",

6 new Blob([readFileSync("clip.mp4")]),

7 "clip.mp4",

8);

10const { id: videoId } = await fetch(

11 "https://api.zapcap.ai/videos",

12 {

13 method: "POST",

14 headers: { "x-api-key": process.env.ZAPCAP_KEY! },

15 body: form,

16 },

17).then(r => r.json());

POST /videos·Upload your video

Live render

Same upload, multiple caption styles

Pick the templateId that fits your product. The captioning task returns a fully rendered MP4 — no client-side compositing, no subtitle track for the player to honor.

Tracy preset

Beast preset

Hormozi preset

Devin preset

Each output is the same source rendered with a different template. The render runs server-side as part of the task; you fetch the result by polling or webhook.

Styling as an API primitive

Default look with one field

Let users tweak the rest.

Ship a single templateId as your product default, or expose individual renderOptions so your end users can pick animation, emphasis, font, and position. All toggle independently on the task body.

Templates — Beast, Hormozi, Tracy, Devin, plus 25 more (29 presets) — wire one as your default caption look.
Animation — word-by-word pops, karaoke fill, fade in/out — toggle per task.
Keyword emphasis — flag punchwords; ZapCap colors / scales / boxes them automatically.
Layout — font, color, stroke, shadow, words per cue, vertical position with safe-zone math.
Aspect ratios — render 9:16, 1:1, 16:9 from one uploaded source.

Try a style

Render options

{
  "templateId": "21327a45-df89-46bc-8d56-34b8d29d3a0e",
  "renderOptions": {
    "subsOptions": {
      "emphasizeKeywords": true,
      "animation": true,
      "displayWords": 3
    },
    "styleOptions": {
      "fontUppercase": true,
      "fontShadow": "m"
    }
  }
}

Two transcript paths

Auto-transcribe — or bring your own

Auto-transcribe

For most pipelines, let ZapCap transcribe, split, and time the captions. Surface the cues to your users for review, or render straight through.

Inspect the transcript on the task status, edit via PUT /videos/:id/task/:taskId/transcript
Let end users approve or correct cues before render
Reuse one transcript across multiple template variants

Bring your own transcript

Already have cues from your own ASR, CMS, or translation vendor? Send them and skip transcription, so existing copy stays exactly as approved.

Supported via the SRT-to-burned-in workflow
Preserve approved product names, claims, disclaimers
Render the same transcript into N styles for variants

Output modes

Return the format your product needs

Most common

Burned-in MP4

Captions rendered into the frames — the default for clips you serve back to users for social, ads, and sharing.

.mp4 · h264

Editor-friendly

Transparent overlay

A caption-only layer with alpha preserved, for products that composite captions over their own edits.

.mov ProRes 4444 · .webm VP9 alpha

Compatibility

Green-screen layer

For downstream tools without alpha support. Caption layer on a #04F404 canvas you key out.

.mp4 · #04F404 backdrop

Burned-in is best for distribution-ready clips inside your product. If you also need an accessibility track, expose the underlying transcript and store an SRT/VTT alongside the MP4.

Multilingual captioning

Caption every user, in their language

Set a target language on the task and ZapCap transcribes, translates, and lays out captions for that language. CJK and Thai use language-aware line-breaking — not whitespace splitting.

Caption uploads from a global user base in their own language
A brand-term dictionary biases transcription toward your product names
Language-aware layout for Chinese, Japanese, Thai

Captioned for everyone

为每个人加字幕

คำบรรยายสำหรับทุกคน

Simple API credits

Per-minute, usage-based credits

Pay for the minutes you caption. Pass the cost through to your users or absorb it — credits scale with the videos that flow through your pipeline.

Top up credits to keep tasks flowing in production
Volume credits available at scale
No per-seat fee — pay for renders, not users

$0.10 / min

Indicative starting rate. Final pricing depends on render mode and output format. API access requires a Pro plan plus credits.

View full pricing Talk to us about volume

Customer · Anonymized

A video SaaS shipped captioning as a feature in days by calling the ZapCap API instead of building ASR, a render farm, and a job queue

User uploads create a task; a webhook returns the captioned MP4 into their app. The team skipped owning transcription models, ffmpeg workers, and retry logic, and focused on their product.

Read case study

days

To ship captioning, not months

webhook

Async delivery into the app

no infra

No ASR or render farm to run

usage

Credits scale with uploads

Developer quickstart

Create a captioning task in plain HTTP

Upload a video, create a task with your templateId, then poll the task status or receive a signed webhook when the captioned MP4 is ready.

POST /videos for file uploads or POST /videos/url for hosted sources
POST /videos/{videoId}/task starts transcription, styling, and rendering
GET /videos/{videoId}/task/{taskId} returns status and downloadUrl

VIDEO_ID=$(curl -s -X POST "https://api.zapcap.ai/videos" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -F "file=@clip.mp4" | jq -r .id)

TASK_ID=$(curl -s -X POST "https://api.zapcap.ai/videos/$VIDEO_ID/task" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "d46bb0da-cce0-4507-909d-fa8904fb8ed7",
    "autoApprove": true,
    "language": "en",
    "notification": {
      "type": "webhook",
      "notificationsFor": ["render"],
      "recipient": "https://your.app/api/zapcap-webhook"
    }
  }' | jq -r .taskId)

curl -H "x-api-key: $ZAPCAP_KEY" \
  "https://api.zapcap.ai/videos/$VIDEO_ID/task/$TASK_ID"

About the Video Captioning API

It is an HTTP interface for adding captions to video programmatically. With ZapCap you upload a video, create a task, and receive a finished captioned video — transcription, caption layout, styling, and rendering all happen server-side. That is different from a speech-to-text API, which returns only text or an SRT for you to render yourself.

Keep exploring

Add captioning to your product through the API

Create a key on a Pro plan and buy credits in the dashboard.

Create API key Read API docs

Video Captioning API

Upload a video, get a captioned video back

What you would build vs what you call

Async captioning in a few calls

Upload your video

Create the captioning task

Receive the webhook

Download the finished render

Same upload, multiple caption styles

Default look with one field

Let users tweak the rest.