Animated Captions API

The word-by-word caption look, rendered at scale

The bouncing, popping, karaoke-style captions that hold attention on TikTok, Reels, and Shorts — often searched as dynamic captions or dynamic subtitles, generated programmatically. Send a clip and a style, and ZapCap renders the animation into the frames. No frame-by-frame keyframing, no After Effects, no template-builder UI.

Word-pop · karaoke fill · keyword emphasis · animation baked into the MP4
Animation, not just text

How creators get this look — and why an API beats it at scale

The animated caption style is hand-built in editing apps one clip at a time. To produce it across hundreds of videos, you need it as a render parameter, not a manual edit.

OPTION 01

Manual caption editor
CapCut · Submagic · creator apps
  • Great for a single clip
  • Hand-edit every video
  • No programmatic access
  • Doesn't scale to a pipeline

OPTION 02

After Effects / keyframing
Motion graphics by hand
  • Total creative control
  • Slow, expensive per clip
  • No API to drive it
  • Hard to standardize a look

OPTION 03 · YOU ARE HERE

ZapCap · animated caption rendering
Animation as a render parameter
  • Word-pop, karaoke, scale bumps
  • Toggle animation per task
  • Rendered into the frames
  • Same look across every clip

OPTION 04

Static subtitle file
SRT / VTT generators
  • Timed text only
  • No animation
  • No styling
  • Off by default on social
API workflow

Animated captions in a few calls

Upload a clip, create a task with animation enabled and a templateId, then poll or get a webhook. ZapCap renders the word-by-word animation into the frames and hands you a finished MP4.

  1. 1

    Upload your video

    POST the file to /videos. We stream it to storage and hand you back a videoId.

    POSTPOST /videos
  2. 2

    Create the captioning task

    One POST starts transcription, styling and rendering with your chosen template. Add a notification webhook to skip polling.

    POSTPOST /videos/:id/task
  3. 3

    Receive the webhook

    We POST status updates to your endpoint as the render moves through transcribing → rendering → completed.

    HOOKPOST → your URL
  4. 4

    Download the finished render

    Burned-in subtitles, served from a global CDN. No watermark. MP4 ready for any social platform.

    GETGET renderUrl
Step 1 / 4·~2s
1import { readFileSync } from "node:fs";
2
3const form = new FormData();
4form.append(
5 "file",
6 new Blob([readFileSync("clip.mp4")]),
7 "clip.mp4",
8);
9
10const { id: videoId } = await fetch(
11 "https://api.zapcap.ai/videos",
12 {
13 method: "POST",
14 headers: { "x-api-key": process.env.ZAPCAP_KEY! },
15 body: form,
16 },
17).then(r => r.json());

POST /videos·Upload your video

Live render

Same hook, four animated styles

Each clip is the same source with animation on and a different templateId. The motion is rendered into the pixels — it plays the same in every feed, with no client-side animation engine.

Beast preset
Hormozi preset
Tracy preset
Devin preset

Source is a 9:16 vertical clip. Animated captions are baked into the frames, so the pop, fill, and emphasis survive download and re-upload to any platform.

Styling as an API primitive

Pick the motion

Tune the emphasis.

Send a templateId for a complete animated look, or control the motion directly. Animation, word-by-word reveal count, keyword emphasis, font, and position all toggle independently on the task body.

  • Animated templates — Beast, Hormozi, Tracy, Devin, plus 25 more (29 presets) — each captures a complete animated caption look.
  • Animation — word-by-word pops, karaoke fill, fade in/out, scale bumps — switched on per task.
  • displayWords — control how many words appear per cue so the reveal pacing fits the edit.
  • Keyword emphasis — flag punchwords; ZapCap colors / scales / boxes them so the hook lands.
  • Layout — font, color, stroke, shadow, vertical position with safe-zone math.
Try a style
Render options
{
  "templateId": "46d20d67-255c-4c6a-b971-31fddcfea7f0",
  "renderOptions": {
    "subsOptions": {
      "emphasizeKeywords": true,
      "animation": true,
      "displayWords": 2
    },
    "styleOptions": {
      "fontUppercase": true,
      "fontShadow": "l"
    }
  }
}
Two transcript paths

Auto-transcribe — or bring your own

Auto-transcribe

ZapCap transcribes and times the words so the animation lands on the right beat. Edit any cue, then render the animated captions against the approved transcript.

  • Inspect the transcript on the task status, edit via PUT /videos/:id/task/:taskId/transcript
  • Word-level timing drives the per-word animation
  • Re-render the same transcript into multiple animated styles

Bring your own transcript

Have approved cues already? Send them and ZapCap animates them without retranscribing — handy for scripted hooks and repurposed long-form clips.

  • Supported via the SRT-to-burned-in workflow
  • Preserve approved wording and punchlines
  • Animate the same script into N variants for A/B testing
Output modes

Animation baked in — overlay it if you prefer

Most common

Burned-in MP4

The animated captions are rendered into the source frames — the pop and fill play identically on TikTok, Reels, and Shorts, with sound off.

.mp4 · h264
Editor-friendly

Transparent overlay

The animated caption layer with alpha preserved, so you can drop the motion over your own cut in an NLE — no chroma key.

.mov ProRes 4444 · .webm VP9 alpha
Compatibility

Green-screen layer

For tools without alpha support. Animated captions on a #04F404 canvas you key out in your editor or live tool.

.mp4 · #04F404 backdrop

Burned-in animated captions are the standard for short-form social. Choose an alpha overlay only if you composite the animated captions over your own edit downstream.

Multilingual animation

Animate captions in any language

Set a target language and ZapCap transcribes, translates, and animates captions in that language. CJK and Thai use language-aware line-breaking so per-word animation lands on real word boundaries, not whitespace.

  • Repurpose one clip into animated captions for multiple markets
  • A brand-term dictionary keeps product names accurate in the animated text
  • Language-aware word splitting for Chinese, Japanese, Thai
Wait for it
一下
รอดูตอนจบ
Simple API credits

Per-minute, usage-based credits

Pay for the minutes you render. Animation is part of the render — no separate motion-graphics fee. See pricing for the full multiplier table.

  • Top up credits to keep clips flowing in production
  • Volume credits available at scale
  • No per-seat fee — pay for renders, not users
$0.10 / min

Indicative starting rate. Final pricing depends on render mode and output format. API access requires a Pro plan plus credits.

Customer · Anonymized

A short-form content studio produced animated captions across hundreds of clips a week by making the look a render parameter instead of a manual edit

Editors stopped hand-animating captions per video. One templateId and an animation flag now produce a consistent, on-brand animated look on every clip, rendered into the frames and ready to post.

hundreds
Clips a week, one consistent look
1 flag
Animation toggled per task
in-frame
Motion survives every re-upload
on-brand
Same style across all editors

About the Animated Captions API

Animated captions reveal and emphasize words with motion — word-by-word pops, karaoke-style fills, fades, and scale bumps — instead of showing a static block of text. Teams also call them dynamic captions or dynamic subtitles. ZapCap renders this animation into the video frames programmatically for TikTok, Reels, and Shorts.

Render animated captions through the API

Create a key on a Pro plan and buy credits in the dashboard.