Spaceless-language API · scriptio continua

Thai Subtitles API

Captions broken at words, not at glyphs

Render styled Thai captions into video through one API. ZapCap handles the parts generic subtitle tools get wrong on Thai — text with no spaces between words, breaks that land in the middle of a word or inside a tone-mark cluster, and the fact that the spaces you do see are sentence punctuation, not word gaps.

Dictionary-based segmentation · cluster-safe breaks · brand-term dictionary
The layout problem

Why generic subtitle APIs break on Thai

Thai is written scriptio continua: no spaces between words. The only spaces are phrase or sentence boundaries. A subtitle tool that breaks lines on whitespace, or by counting glyphs, has no idea where a word actually ends — and Thai stacks tone marks and vowels on top of consonants, so the wrong break can split a single character cluster.

No spaces between words

Word boundaries aren’t marked. Finding them requires dictionary-based segmentation; a whitespace splitter sees one long run and breaks it anywhere.

Tone-mark and vowel stacking

A base consonant carries vowels and tone marks above and below it (เดี๋ยว). A break inside that cluster detaches a mark from its base — it renders as a floating diacritic.

Spaces are punctuation

The spaces that do appear mark phrase or sentence ends. Treating them as word gaps — or collapsing them — changes how the line reads.

Bad vs good

The same translation, two different renderers

Translation: identical. Layout: not. Annotations call out the segmentation failures that break Thai captions in practice.

source transcript · เดี๋ยวก็เสร็จแล้วครับ

Generic subtitle API
123
เดี๋ยวก็เสร็
จแล้วครับ
1Mid-word break. เสร็จ is split between เสร็ and จ — there is no word boundary there.
2Cluster detached. The break lands inside a consonant cluster, stranding combining marks.
3No dictionary, no boundaries. A whitespace splitter has nothing to break on — so it breaks anywhere.
Result: unreadable at the join — a native reader stumbles on the broken word.
ZapCap rendering
เดี๋ยวก็เสร็จ
แล้วครับ
Word-boundary break. Dictionary segmentation breaks between เสร็จ and แล้ว — both whole words.
Clusters stay intact. Base consonants keep their tone marks and vowels.
Two lines, phrase-aligned. The break lands at a natural phrase joint.
Result: reads like a hand-set subtitle. Keyword emphasis preserved through the API.
Under the hood

How ZapCap renders Thai captions

Dictionary-based segmentation

Word boundaries are found with a Thai dictionary before lines are wrapped. Breaks land between words, never mid-word.

Cluster-safe breaks

A base consonant and its stacked vowels and tone marks are treated as one unit. A line never breaks inside a grapheme cluster.

Spaces read as phrase marks

The phrase- and sentence-marking spaces in Thai are respected as preferred break points — not collapsed and not mistaken for word gaps.

Max two lines, phrase-aligned

When wrapping is needed, the break lands at a word or phrase boundary. Three-line cues are split into two cues instead.

Brand-term dictionary

Add product names and approved Thai spellings to the dictionary — transcription hints that bias recognition toward your brand terms. Carry an approved transcript through and the text renders verbatim.

Font fallback chain

Renders with Noto Sans Thai / Sarabun / Tahoma stacks server-side. No client-side font availability — what you see in the MP4 is what every viewer sees.

Standards we work to

Built on published references — not magic

Thai line breaking isn’t bespoke. Unicode defines dictionary-based break behavior for SE-Asian scripts, the W3C documents Thai layout, and Netflix publishes professional Thai subtitling guidelines. ZapCap implements them; this page cites them.

If you’re evaluating us against another captioning vendor, ask how they find Thai word boundaries. The answers tell you a lot.

REFERENCES
  • Unicode Line Breaking Algorithm (UAX #14) — defines dictionary-based breaking for Thai, Lao, Khmer, and Burmese. unicode.org/reports/tr14
  • W3C i18n — Thai layout requirements — word segmentation and cluster handling. w3c.github.io/iip
  • Netflix · Thai timed text style guide — character-per-line limits, two-line max, line-break placement. partnerhelp.netflixstudios.com
  • MDN · CSS word-break / line-break — language-aware line-breaking properties. developer.mozilla.org
Use it

Render Thai captions in one task call

Set language on the render task. Optional: send an approved translation (skip retranscription) and a brand-term dictionary to bias recognition toward your product names and Thai spellings.

  • Set language "th" on the render task
  • Bring your own translation via the SRT-to-burned-in workflow
  • Mix with caption style templates — Thai with a Beast preset works
  • One source video → multiple language renders, consistent style
// Thai caption render — auto-translate path
// No SDK — call the REST API with fetch.
// fetch IDs from GET /templates
const task = await fetch(`https://api.zapcap.ai/videos/${videoId}/task`, {
  method: 'POST',
  headers: { 'x-api-key': process.env.ZAPCAP_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    templateId: '<TEMPLATE_UUID>',
    language:   'th',
    renderOptions: {
      subsOptions: { emphasizeKeywords: true },
    },
    // Transcription hints — biases recognition toward your brand terms.
    dictionary: ['ACME', 'เทอร์โบแมกซ์'],
    notification: {
      type: 'webhook',
      notificationsFor: ['render'],
      recipient: 'https://acme.com/hooks/zapcap',
    },
  }),
}).then(r => r.json());

// Bring-your-own translation
const task2 = await fetch(`https://api.zapcap.ai/videos/${videoId}/task`, {
  method: 'POST',
  headers: { 'x-api-key': process.env.ZAPCAP_KEY, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    templateId: '<TEMPLATE_UUID>',
    language:   'th',
    transcript: thApprovedTranscript,
    notification: {
      type: 'webhook',
      notificationsFor: ['render'],
      recipient: 'https://acme.com/hooks/zapcap',
    },
  }),
}).then(r => r.json());
QA checklist

Before you ship Thai captions

Six checks the rendering pipeline can’t do for you.

  • Check every line break is at a word boundary. No word split across two lines in the rendered MP4 — Thai has no spaces to fall back on.
  • Confirm no detached marks. Tone marks and vowels stay attached to their base consonant; no floating diacritics at a line edge.
  • Define a brand-term dictionary. Product names and approved Thai spellings bias transcription; bring an approved transcript for copy that must not be re-translated.
  • Cap at two lines, phrase-aligned. Split into multiple cues if a single one exceeds the per-line character count.
  • Safe-zone check on 9:16. Stacked tone marks add visual height; overrun shows up faster than on latin scripts.
  • Native-speaker pass on the rendered MP4, not the source transcript. Segmentation problems only show up at render time.

Thai subtitle API questions

ZapCap runs dictionary-based word segmentation on Thai text before wrapping lines, so breaks land between whole words — never mid-word and never inside a tone-mark cluster. A plain whitespace splitter cannot do this because Thai word boundaries are not marked by spaces.

Render Thai subtitles through the API

Dictionary-based word segmentation, cluster-safe line breaks, a brand-term dictionary, and bring-your-own translation — all in one task call. Pricing is $0.10/min of rendered video.