Subtitle API

Video in, styled captioned video out

ZapCap renders finished, styled captioned videos through an API. Send a video URL, choose a caption style, and get a burned-in MP4 — or a transparent overlay — back. Not a transcript API. Not a generic video automation API.

Create API key Read API docs

Webhook-native, usage-based credits, bring your own transcript

Workflow Demo Styling Transcripts Output modes Languages Pricing Proof Quickstart FAQ

Not just transcription

Three APIs you might be choosing between

If you're shipping captions to viewers, only one of these returns a video they can actually watch.

OPTION 01

Speech-to-text

Whisper · Deepgram · AssemblyAI

Returns text or SRT
You handle timing
You handle styling
You handle rendering

OPTION 02

Subtitle file

SRT / VTT generators

Returns timed text file
No styling
Ignored on social
You handle burn-in

OPTION 03 · YOU ARE HERE

ZapCap · styled subtitle rendering

Finished video, not text

Returns finished MP4
Styled captions baked in
Transparent / green-screen
Webhook-native

OPTION 04

Video automation

Creatomate · Shotstack · JSON2Video

Full video generation
Build templates
JSON timeline assembly
Captions are one element

API workflow

How the Subtitle API works

Upload a clip, fire a task, get a webhook. We handle transcription, styling, and rendering — you get the MP4 when it's ready.

1
Upload your video
POST the file to /videos. We stream it to storage and hand you back a videoId.
POSTPOST /videos
2
Create the captioning task
One POST starts transcription, styling and rendering with your chosen template. Add a notification webhook to skip polling.
POSTPOST /videos/:id/task
3
Receive the webhook
We POST status updates to your endpoint as the render moves through transcribing → rendering → completed.
HOOKPOST → your URL
4
Download the finished render
Burned-in subtitles, served from a global CDN. No watermark. MP4 ready for any social platform.
GETGET renderUrl

Step 1 / 4·~2s

1import { readFileSync } from "node:fs";

3const form = new FormData();

4form.append(

5 "file",

6 new Blob([readFileSync("clip.mp4")]),

7 "clip.mp4",

8);

10const { id: videoId } = await fetch(

11 "https://api.zapcap.ai/videos",

12 {

13 method: "POST",

14 headers: { "x-api-key": process.env.ZAPCAP_KEY! },

15 body: form,

16 },

17).then(r => r.json());

POST /videos·Upload your video

Live render

Same clip, four caption styles, one field change

Switch the templateId in the request body. ZapCap returns a fully rendered MP4 — no client-side compositing.

Beast preset

Hormozi preset

Tracy preset

Devin preset

Source is a 9:16 vertical clip. Captions render into the final frames — downstream platforms don't need subtitle-track support.

Styling as an API primitive

Style with one field

Override at the leaf.

Send a templateId for one of our presets, or override individual renderOptions for full control. Animation, emoji, keyword highlighting, position, font and color all toggle independently.

Templates — Beast, Hormozi, Tracy, Devin, plus 25 more (29 presets). Each preset captures a complete look.
Animation — word-by-word pops, karaoke fill, fade in/out, scale bumps.
Keyword emphasis — flag punchwords; ZapCap colors / scales / boxes them automatically.
Layout — font, color, stroke, shadow, max words per cue, vertical position with safe-zone math.
Aspect ratios — render 9:16, 1:1, 16:9 from one source.

Try a style

Render options

{
  "templateId": "46d20d67-255c-4c6a-b971-31fddcfea7f0",
  "renderOptions": {
    "subsOptions": {
      "emphasizeKeywords": true,
      "animation": true,
      "displayWords": 3
    },
    "styleOptions": {
      "fontUppercase": true,
      "fontShadow": "l"
    }
  }
}

Two transcript paths

Subtitle API transcript options

Auto-transcribe

ZapCap transcribes, splits, and times captions automatically. Edit any cue, approve the transcript, and trigger a render against it.

Inspect the transcript on the task status, edit via PUT /videos/:id/task/:taskId/transcript
Edit cues before render, no extra credit cost
Reuse one transcript across multiple template variants

Bring your own transcript

Send approved cues — from your translation vendor, internal CMS, or Whisper pipeline — and skip transcription entirely.

Supported via the SRT-to-burned-in workflow
Preserve approved product names, claims, disclaimers
Render the same transcript into N styles for A/B variants

Output modes

Subtitle API output modes

Most common

Burned-in MP4

Captions permanently rendered into the source frames. The right choice for TikTok, Reels, Shorts, ad creative, and anywhere subtitle tracks get ignored.

.mp4 · h264

Editor-friendly

Transparent overlay

Caption layer only, alpha-channel preserved. Drop it on a timeline in Premiere, Final Cut, DaVinci, or CapCut — no chroma key needed.

.mov ProRes 4444 · .webm VP9 alpha

Compatibility

Green-screen layer

When the editor or live tool doesn't support alpha. Key out the green in OBS, switcher, or NLE.

.mp4 · #04F404 backdrop

Need an accessibility track too? Burned-in captions are best for social distribution; pair with SRT/VTT export where viewer controls are required.

Multilingual rendering

Translate, restyle, re-render

Set language on the task and ZapCap transcribes, translates, and lays out captions for that target. CJK and Thai use language-aware line-breaking — not whitespace splitting.

Source video in one language, output captions in another
A dictionary of brand terms biases transcription toward your product names
Language-specific guides: Chinese, Japanese, Thai

Ready in two mins

两分钟就好

พร้อมในสองนาที

Simple API credits

Per-minute, usage-based credits

Pay for what you render. Transparent overlays and high-resolution exports use different multipliers — see pricing for the full table.

Top up credits to keep tasks flowing in production
Volume credits available at scale
No per-seat fee — pay for renders, not users

$0.10 / min

Indicative starting rate. Final pricing depends on render mode and output format.

View full pricing Talk to us about volume

Customer · Anonymized

A European e-commerce group cut localized caption variant production from days to hours by moving captioning to the ZapCap API

Multi-market product videos now flow through one webhook-driven pipeline. Approved transcripts in, brand-styled MP4s out — across markets, with consistent caption styling.

Read case study

days → hrs

Localized variant lead time

1 source

Re-rendered into multiple markets

webhook

No polling, no manual exports

consistent

Brand style preserved across markets

Developer quickstart

Render styled subtitles from one task endpoint

ZapCap is plain HTTP: upload a video, create a captioning task, then collect the burned-in MP4, transparent overlay, or green-screen layer from status or webhook delivery.

POST /videos or /videos/url creates the source video record
POST /videos/{videoId}/task accepts templateId, language, transcript, and renderOptions
GET /videos/{videoId}/task/{taskId} returns status, transcript, and downloadUrl

VIDEO_ID=$(curl -s -X POST "https://api.zapcap.ai/videos/url" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://cdn.example.com/source.mp4"}' | jq -r .id)

TASK_ID=$(curl -s -X POST "https://api.zapcap.ai/videos/$VIDEO_ID/task" \
  -H "x-api-key: $ZAPCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "templateId": "d46bb0da-cce0-4507-909d-fa8904fb8ed7",
    "autoApprove": true,
    "language": "en",
    "renderOptions": {
      "subsOptions": { "animation": true, "displayWords": 3 }
    }
  }' | jq -r .taskId)

About the Subtitle API

A Subtitle API is an HTTP interface for adding subtitles to video programmatically. ZapCap is a styled subtitle rendering API — it accepts a video and returns a captioned MP4 (or a transparent overlay), with the captions baked into the frames. That is different from a transcription API (which returns text) or a subtitle-file API (which returns SRT/VTT). It is the subtitle-focused surface of the same video captioning API.

Both, depending on the task. The default output is a rendered video — MP4 burned-in, MOV ProRes 4444 with alpha, or WebM VP9 with alpha. We can also expose the underlying transcript so you can store an SRT/VTT alongside the video for accessibility or closed-caption use.

Yes. Upload approved cues from your translation vendor, your internal CMS, or your own Whisper pipeline, and ZapCap will render captions against them without retranscribing.

Attach a webhook notification on the task — a notification object with type "webhook" and your recipient URL. ZapCap POSTs a signed JSON payload to that URL when transcription and render complete. Verify the x-signature header with your shared secret, then fetch the renderUrl. Failed webhooks retry up to 5 times.

An STT API returns text or a timed transcript. You then have to handle caption layout, styling, font rendering, line-break logic, animation, burn-in, and delivery yourself. ZapCap returns a finished captioned video — those steps are part of the API.

Burned-in (open) captions are best for social and ad distribution, where the player will not surface a subtitle track. For accessibility compliance, where viewers need to turn captions on/off, pair the burned-in MP4 with an SRT or VTT file.

Keep exploring

Start rendering captions through the API

Create a key on a Pro plan and buy credits in the dashboard.

Create API key Read API docs

Subtitle API

Video in, styled captioned video out

Three APIs you might be choosing between

How the Subtitle API works

Upload your video

Create the captioning task

Receive the webhook

Download the finished render

Same clip, four caption styles, one field change

Style with one field

Override at the leaf.