Video Captioning API

Upload a video, get a captioned video back

A captioning service you call instead of building. Upload a clip, create a task, then poll the task status or get a webhook when the captioned MP4 is ready. ZapCap runs the transcription, caption layout, styling, and render so your product or CMS pipeline does not have to.

Async tasks · webhook or poll · usage-based credits · no ASR or render infra to run
Buy the captioning step

What you would build vs what you call

A captioning feature isn't one API call when you build it yourself — it's ASR, layout, styling, a render farm, a job queue, and retries. ZapCap is the whole step behind one task endpoint.

OPTION 01

Speech-to-text only
Whisper · Deepgram · AssemblyAI
  • Returns text / SRT
  • You build caption layout
  • You build the render farm
  • You build the job queue

OPTION 02

Self-hosted pipeline
ASR + ffmpeg + queue workers
  • Full control
  • GPU / worker ops to run
  • You own retries and scaling
  • Months to ship and maintain

OPTION 03 · YOU ARE HERE

ZapCap · captioning service
One task endpoint, finished video out
  • Transcribe + style + render included
  • Async tasks, webhook or poll
  • Finished MP4 (or overlay) returned
  • No ASR or render infra to run

OPTION 04

Video automation
Creatomate · Shotstack · JSON2Video
  • Full video generation
  • Captions are one element
  • You assemble the timeline
  • Heavier to add captioning
API workflow

Async captioning in a few calls

POST the video, POST a task, then either poll GET the task status or attach a webhook. When the render completes, the task status carries a downloadUrl — drop the captioned MP4 straight into your pipeline.

  1. 1

    Upload your video

    POST the file to /videos. We stream it to storage and hand you back a videoId.

    POSTPOST /videos
  2. 2

    Create the captioning task

    One POST starts transcription, styling and rendering with your chosen template. Add a notification webhook to skip polling.

    POSTPOST /videos/:id/task
  3. 3

    Receive the webhook

    We POST status updates to your endpoint as the render moves through transcribing → rendering → completed.

    HOOKPOST → your URL
  4. 4

    Download the finished render

    Burned-in subtitles, served from a global CDN. No watermark. MP4 ready for any social platform.

    GETGET renderUrl
Step 1 / 4·~2s
1import { readFileSync } from "node:fs";
2
3const form = new FormData();
4form.append(
5 "file",
6 new Blob([readFileSync("clip.mp4")]),
7 "clip.mp4",
8);
9
10const { id: videoId } = await fetch(
11 "https://api.zapcap.ai/videos",
12 {
13 method: "POST",
14 headers: { "x-api-key": process.env.ZAPCAP_KEY! },
15 body: form,
16 },
17).then(r => r.json());

POST /videos·Upload your video

Live render

Same upload, multiple caption styles

Pick the templateId that fits your product. The captioning task returns a fully rendered MP4 — no client-side compositing, no subtitle track for the player to honor.

Tracy preset
Beast preset
Hormozi preset
Devin preset

Each output is the same source rendered with a different template. The render runs server-side as part of the task; you fetch the result by polling or webhook.

Styling as an API primitive

Default look with one field

Let users tweak the rest.

Ship a single templateId as your product default, or expose individual renderOptions so your end users can pick animation, emphasis, font, and position. All toggle independently on the task body.

  • Templates — Beast, Hormozi, Tracy, Devin, plus 25 more (29 presets) — wire one as your default caption look.
  • Animation — word-by-word pops, karaoke fill, fade in/out — toggle per task.
  • Keyword emphasis — flag punchwords; ZapCap colors / scales / boxes them automatically.
  • Layout — font, color, stroke, shadow, words per cue, vertical position with safe-zone math.
  • Aspect ratios — render 9:16, 1:1, 16:9 from one uploaded source.
Try a style
Render options
{
  "templateId": "21327a45-df89-46bc-8d56-34b8d29d3a0e",
  "renderOptions": {
    "subsOptions": {
      "emphasizeKeywords": true,
      "animation": true,
      "displayWords": 3
    },
    "styleOptions": {
      "fontUppercase": true,
      "fontShadow": "m"
    }
  }
}
Two transcript paths

Auto-transcribe — or bring your own

Auto-transcribe

For most pipelines, let ZapCap transcribe, split, and time the captions. Surface the cues to your users for review, or render straight through.

  • Inspect the transcript on the task status, edit via PUT /videos/:id/task/:taskId/transcript
  • Let end users approve or correct cues before render
  • Reuse one transcript across multiple template variants

Bring your own transcript

Already have cues from your own ASR, CMS, or translation vendor? Send them and skip transcription, so existing copy stays exactly as approved.

  • Supported via the SRT-to-burned-in workflow
  • Preserve approved product names, claims, disclaimers
  • Render the same transcript into N styles for variants
Output modes

Return the format your product needs

Most common

Burned-in MP4

Captions rendered into the frames — the default for clips you serve back to users for social, ads, and sharing.

.mp4 · h264
Editor-friendly

Transparent overlay

A caption-only layer with alpha preserved, for products that composite captions over their own edits.

.mov ProRes 4444 · .webm VP9 alpha
Compatibility

Green-screen layer

For downstream tools without alpha support. Caption layer on a #04F404 canvas you key out.

.mp4 · #04F404 backdrop

Burned-in is best for distribution-ready clips inside your product. If you also need an accessibility track, expose the underlying transcript and store an SRT/VTT alongside the MP4.

Multilingual captioning

Caption every user, in their language

Set a target language on the task and ZapCap transcribes, translates, and lays out captions for that language. CJK and Thai use language-aware line-breaking — not whitespace splitting.

  • Caption uploads from a global user base in their own language
  • A brand-term dictionary biases transcription toward your product names
  • Language-aware layout for Chinese, Japanese, Thai
Captioned for everyone
每个人加字幕
คำบรรยายสำหรับทุกคน
Simple API credits

Per-minute, usage-based credits

Pay for the minutes you caption. Pass the cost through to your users or absorb it — credits scale with the videos that flow through your pipeline.

  • Top up credits to keep tasks flowing in production
  • Volume credits available at scale
  • No per-seat fee — pay for renders, not users
$0.10 / min

Indicative starting rate. Final pricing depends on render mode and output format. API access requires a Pro plan plus credits.

Customer · Anonymized

A video SaaS shipped captioning as a feature in days by calling the ZapCap API instead of building ASR, a render farm, and a job queue

User uploads create a task; a webhook returns the captioned MP4 into their app. The team skipped owning transcription models, ffmpeg workers, and retry logic, and focused on their product.

days
To ship captioning, not months
webhook
Async delivery into the app
no infra
No ASR or render farm to run
usage
Credits scale with uploads
Developer quickstart

Create a captioning task in plain HTTP

Upload a video, create a task with your templateId, then poll the task status or receive a signed webhook when the captioned MP4 is ready.

  • POST /videos for file uploads or POST /videos/url for hosted sources
  • POST /videos/{videoId}/task starts transcription, styling, and rendering
  • GET /videos/{videoId}/task/{taskId} returns status and downloadUrl
VIDEO_ID=$(curl -s -X POST "https://api.zapcap.ai/videos" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -F "file=@clip.mp4" | jq -r .id)

TASK_ID=$(curl -s -X POST "https://api.zapcap.ai/videos/$VIDEO_ID/task" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "d46bb0da-cce0-4507-909d-fa8904fb8ed7",
    "autoApprove": true,
    "language": "en",
    "notification": {
      "type": "webhook",
      "notificationsFor": ["render"],
      "recipient": "https://your.app/api/zapcap-webhook"
    }
  }' | jq -r .taskId)

curl -H "x-api-key: $ZAPCAP_KEY" \
  "https://api.zapcap.ai/videos/$VIDEO_ID/task/$TASK_ID"

About the Video Captioning API

It is an HTTP interface for adding captions to video programmatically. With ZapCap you upload a video, create a task, and receive a finished captioned video — transcription, caption layout, styling, and rendering all happen server-side. That is different from a speech-to-text API, which returns only text or an SRT for you to render yourself.

Add captioning to your product through the API

Create a key on a Pro plan and buy credits in the dashboard.