Spaceless-language API · Thai · Lao · Khmer · Burmese

Southeast Asian Captioning API

Captions for scripts with no word spaces

Render styled captions for Southeast Asian scripts into video through one API. Thai, Lao, Khmer, and Burmese share a hard layout problem — they’re written scriptio continua with no spaces between words, and they stack vowels, tone marks, and subscript consonants into complex clusters. ZapCap breaks at word boundaries and keeps clusters whole; generic subtitle tools do neither.

Dictionary-based segmentation · cluster-safe shaping · brand-term dictionary
The shared problem

What Thai, Lao, Khmer, and Burmese have in common

These are Brahmic-family scripts written scriptio continua: no spaces between words. They also shape complex clusters — vowels and tone marks above and below a base consonant, plus stacked subscript consonants. A subtitle tool built for latin text breaks them in the same wrong ways.

No spaces between words

Word boundaries aren’t marked. Finding them requires dictionary-based segmentation per language; a whitespace splitter sees one long run and breaks it anywhere.

Complex cluster shaping

A base consonant carries stacked vowels, tone marks, and — in Khmer and Burmese — subscript consonants. A break inside a cluster detaches a mark or subscript and the glyph falls apart.

Reordering and ligatures

Some vowels are stored after the consonant but display before it, and clusters form ligatures. Layout has to happen on shaped glyphs, not raw code points — count characters and you count the wrong thing.

Bad vs good

The same transcript, two different renderers

Translation: identical. Layout: not. The same failures show up across SE-Asian scripts — here on Thai; the same segmentation and shaping logic carries to Lao, Khmer, and Burmese.

source transcript · เดี๋ยวก็เสร็จแล้วครับ

Generic subtitle API
123
เดี๋ยวก็เสร็
จแล้วครับ
1Mid-word break. เสร็จ is split between เสร็ and จ — there is no word boundary there.
2Cluster detached. The break lands inside a cluster, stranding combining marks from their base.
3No dictionary, no boundaries. A whitespace splitter has nothing to break on — so it breaks anywhere.
Result: unreadable at the join — a native reader stumbles on the broken word.
ZapCap rendering
เดี๋ยวก็เสร็จ
แล้วครับ
Word-boundary break. Dictionary segmentation breaks between เสร็จ and แล้ว — both whole words.
Clusters stay intact. Shaped glyphs keep their stacked marks and subscripts.
Two lines, phrase-aligned. The break lands at a natural phrase joint.
Result: reads like a hand-set subtitle. Keyword emphasis preserved through the API.
Under the hood

How ZapCap renders Southeast Asian captions

Dictionary-based segmentation

Word boundaries are found with a per-language dictionary before lines are wrapped. Breaks land between words, never mid-word — for Thai, Lao, Khmer, and Burmese.

Cluster-safe, shaped layout

Layout runs on shaped grapheme clusters, not raw code points. A base consonant keeps its vowels, tone marks, and subscript consonants — no break ever lands inside a cluster.

Reordering handled

Vowels that display before a consonant they’re stored after are positioned correctly, and ligatures form before measuring — so line width and breaks are computed on what actually renders.

Max two lines, phrase-aligned

When wrapping is needed, the break lands at a word or phrase boundary. Three-line cues are split into two cues instead.

Brand-term dictionary

Add product names and approved spellings to the dictionary — transcription hints that bias recognition toward your brand terms. Carry an approved transcript through and the text renders verbatim.

Per-script font fallback chains

Renders with Noto Sans Thai / Lao / Khmer / Myanmar stacks server-side, picked per language. No client-side font availability — what you see in the MP4 is what every viewer sees.

Standards we work to

Built on published references — not magic

SE-Asian line breaking and shaping aren’t bespoke. Unicode defines dictionary-based breaking and complex-script shaping for these scripts, and the W3C documents their layout requirements. ZapCap implements them; this page cites them.

If you’re evaluating us against another captioning vendor, ask how they find word boundaries and whether they break on shaped clusters. The answers tell you a lot.

REFERENCES
  • Unicode Line Breaking Algorithm (UAX #14) — defines dictionary-based breaking for Thai, Lao, Khmer, and Burmese. unicode.org/reports/tr14
  • Unicode · Grapheme Cluster Boundaries (UAX #29) — what counts as one cluster; the unit a break must not split. unicode.org/reports/tr29
  • W3C i18n — Southeast Asian layout requirements — word segmentation and complex-script shaping. w3c.github.io/iip
  • MDN · CSS word-break / line-break — language-aware line-breaking properties. developer.mozilla.org
Use it

Render SE-Asian captions in one task call

Set language on the render task. Optional: send an approved translation (skip retranscription) and a brand-term dictionary to bias recognition toward your product names and local-script spellings.

  • Set language "th" — Lao, Khmer, and Burmese render through the same pipeline
  • Bring your own translation via the SRT-to-burned-in workflow
  • Mix with caption style templates — SE-Asian scripts with a Beast preset work
  • One source video → multiple language renders, consistent style
// Southeast Asian caption render — set the language per source video
// No SDK — call the REST API with fetch.
// Thai -> th  (Lao / Khmer / Burmese rendered through the same pipeline)
// fetch IDs from GET /templates
const task = await fetch(`https://api.zapcap.ai/videos/${videoId}/task`, {
  method: 'POST',
  headers: { 'x-api-key': process.env.ZAPCAP_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    templateId: '<TEMPLATE_UUID>',
    language:   'th',
    renderOptions: {
      subsOptions: { emphasizeKeywords: true },
    },
    // Transcription hints — biases recognition toward your brand terms.
    dictionary: ['ACME', 'เทอร์โบแมกซ์'],
    notification: {
      type: 'webhook',
      notificationsFor: ['render'],
      recipient: 'https://acme.com/hooks/zapcap',
    },
  }),
}).then(r => r.json());

// Bring-your-own translation — render an approved transcript verbatim
const task2 = await fetch(`https://api.zapcap.ai/videos/${videoId}/task`, {
  method: 'POST',
  headers: { 'x-api-key': process.env.ZAPCAP_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    templateId: '<TEMPLATE_UUID>',
    language:   'th',
    transcript: seaApprovedTranscript,
    notification: {
      type: 'webhook',
      notificationsFor: ['render'],
      recipient: 'https://acme.com/hooks/zapcap',
    },
  }),
}).then(r => r.json());
QA checklist

Before you ship SE-Asian captions

Six checks the rendering pipeline can’t do for you.

  • Check every line break is at a word boundary. No word split across two lines in the rendered MP4 — these scripts have no spaces to fall back on.
  • Confirm clusters render whole. No detached tone marks, vowels, or subscript consonants at a line edge.
  • Verify reordering and ligatures. Pre-base vowels sit in the right place and ligatures form — check the shaped glyphs, not the code points.
  • Define a brand-term dictionary. Product names and approved local-script spellings bias transcription; bring an approved transcript for copy that must not be re-translated.
  • Safe-zone check on 9:16. Stacked clusters add visual height; overrun shows up faster than on latin scripts.
  • Native-speaker pass on the rendered MP4, not the source transcript. Segmentation and shaping problems only show up at render time.

Southeast Asian captioning questions

ZapCap finds word boundaries with a per-language dictionary before wrapping, so line breaks land between whole words — never mid-word — for Thai, Lao, Khmer, and Burmese.

Render Southeast Asian subtitles through the API

Dictionary-based word segmentation, cluster-safe complex-script shaping, a brand-term dictionary, and bring-your-own translation — all in one task call. Pricing is $0.10/min of rendered video.