Use case · Localization

Translate, restyle, and re-render subtitles with per-script layout

Multilingual rendering for CJK, Thai, and more

Translation is only half of localization; the subtitles still have to wrap, break, and render correctly in each script. ZapCap is the caption-rendering and translation API that handles the layout rules languages like Chinese, Japanese, and Thai actually need — not just space-splitting that breaks on the first character.

Per-script layout · translateTo · webhook delivery
The problem

Translation is easy; rendering it correctly is not

A localization pipeline that only translates produces subtitles that look wrong on screen. Each script has its own line-break and wrapping rules; reading speed differs; and the styling has to stay consistent while the language underneath changes completely.

  • Script-specific wrapping — CJK breaks by character, not by space — most engines get this wrong.
  • Thai line breaks — no spaces between words; naive wrapping mangles it.
  • Reading speed — characters-per-second limits differ by language.
  • Style consistency — one caption look across every language version.
  • Re-render churn — restyle once, re-render every language by hand.
  • Source of truth — one transcript, many translated renders to track.
Reference architecture

Where ZapCap sits in your localization pipeline

Drop the API behind your localization tooling. One source transcript drives a translated render per target language, each laid out with the rules its script needs.

Your source
Source video + transcript
Target languages
Source clip in your CDN
Your backend
Localization job runner
Holds ZAPCAP_API_KEY (server-only)
/webhooks/zapcap handler
ZapCap
POST /videos
Task per language · translateTo
Per-script renderUrl
Bring your own approved transcript via the transcript param when terminology must be exact; the dictionary param feeds names as transcription hints. Use base language codes like zh — not zh-CN / zh-TW.
API workflow

From source transcript to per-language render

Upload once, supply or approve the source transcript, then fan out a translated render per language. We handle the per-script layout; you collect the finished files via webhook.

  1. 1

    Upload your video

    POST the source clip to /videos and get back a videoId. The transcript can be auto-generated or supplied via the transcript param when terminology must be exact.

    POSTPOST /videos
  2. 2

    Create the captioning task

    Create one task per target language with translateTo. Reuse a single templateId so styling stays identical while the language changes — captions wrap per the rules of each script.

    POSTPOST /videos/:id/task
  3. 3

    Receive the webhook

    We POST status updates as each language moves through transcribing → rendering → completed. Map taskId → language so each render files itself correctly.

    HOOKPOST → your URL
  4. 4

    Download the finished render

    Each finished render is its own MP4, laid out for its script. Collect them as a localized set, no manual re-styling per language.

    GETGET renderUrl
Step 1 / 4·~2s
1import { readFileSync } from "node:fs";
2
3const form = new FormData();
4form.append(
5 "file",
6 new Blob([readFileSync("clip.mp4")]),
7 "clip.mp4",
8);
9
10const { id: videoId } = await fetch(
11 "https://api.zapcap.ai/videos",
12 {
13 method: "POST",
14 headers: { "x-api-key": process.env.ZAPCAP_KEY! },
15 body: form,
16 },
17).then(r => r.json());

POST /videos·Upload your video

State machine

Lifecycle of a per-language render

Each language is an independent task. Track them so your localization dashboard shows the full set filling in, language by language.

pending
transcribing
transcriptionCompleted
rendering
completed
In your dashboard
Show a per-language grid so localization owners see which renders are ready, which are still processing, and which need a transcript fix.
On webhook
Pull renderUrl and transcriptUrl, file them under the language, and mark the localized set one step closer to complete.
On failure
Re-run only the failed language; every other render in the set is unaffected.
Launch checklist

Before you localize your library

A short list to keep multilingual rendering correct per script and consistent across the set. Transcript, layout, and delivery in one place.

  • Approved source transcript Supply your own via the transcript param where terminology must be exact.
  • Terminology in your dictionary Names and product terms as transcription hints before translation.
  • One styling preset across languages A single templateId so every language version shares the same look.
  • Base language codes Use translateTo with codes like zh — not regional variants like zh-CN.
  • Per-language tracking Map taskId → language so each render files correctly.
  • Webhook signature verified Check x-signature on every payload; dedupe on eventId.
  • CJK / Thai spot-check Review wrapping on a sample render before localizing the whole library.
Build vs buy

The multilingual rendering stack, honestly

Build it yourself

In-house localization renderer

  • 1Translation wiring — vendors, terminology control, review loops.
  • 2Per-script layout engine — CJK character breaking, Thai word segmentation.
  • 3Reading-speed logic — characters-per-second limits per language.
  • 4Render workers — ffmpeg / libass with the right font coverage per script.
  • 5Style consistency — identical look across every language render.
  • 6Output storage — one source, many renders, organised per language.
  • 7Billing meter — per-minute counters across the library.
Use ZapCap

Multilingual rendering as a primitive

  • 1translateTo per task — a render per language, laid out for its script.
  • 2One templateId — consistent styling across the set.
  • 3Webhook handler — verify, file by language, assemble the set.
When ZapCap isn't the right answer: if you need a full subtitling workstation with translator seats and review tooling, a dedicated subtitle editor or video automation API may fit better. See our honest alternatives comparisons.

What changes when rendering becomes an API call.

1 source
Transcript drives every language
per language
One correctly-laid-out render each
per-script
CJK & Thai wrap correctly
~0 lines
Of ffmpeg / layout code
Customer · Anonymized

A localization team replaced a hand-tuned subtitle renderer with the ZapCap API and now produces per-language renders that wrap correctly in CJK and Thai from a single source transcript

Translation was never the blocker — getting subtitles to break and read correctly per script was. With layout handled by the API, the team renders the whole language set from one transcript and one styling preset.

1 transcript
Source of truth per video
per language
One render each
CJK / Thai
Wrapped per script
per-minute
Billing passes through cleanly

For localization teams

Yes. Captions are laid out with per-script rules, so Chinese, Japanese, and Korean break by character and Thai breaks at word boundaries rather than on spaces. That is the difference between subtitles that read naturally and ones that break mid-word.

Render subtitles that read right in every script

Backend-only API, webhook-native, from $0.10/min base usage pricing. One source transcript in, a correctly-laid-out render per language out.