AI Video SaaS Ships Styled Captions in a Sprint

From "captions on the roadmap" to "captions in production."

A small AI video product replaced a half-built internal captioning stack with the ZapCap API. Engineering scope shrank from a multi-quarter rendering project to a webhook handler plus a style picker. Names are anonymized; identifying figures are ranged.

1 sprint

Spec → shipped feature

Two engineers, no infra hires

Render workers maintained

No ffmpeg / GPU queue

per-min

Pass-through billing

Marked up inside their plans

webhook

Async by default

One signed endpoint, retries built in

01The bottleneck

The team — a seed-stage AI video product for marketers — had captions on their public roadmap, voted high by every customer interview. The prototype existed: a Whisper call wired up to ffmpeg with a single hardcoded style, queued through a homegrown Redis worker, behind a polling endpoint.

It was the kind of "almost done" that doesn't ship. The product team wanted style variants. The infra team didn't want another queue. The CTO had no appetite for hiring against a captioning feature.

What the prototype could do

·One caption style, hardcoded font + color
·English-only, no language switching
·Polling-only — no webhook
·Captions sometimes overran the safe zone on 9:16 exports

What was missing to ship

·Style preset gallery in the editor UI
·Multilingual rendering with sane line breaks
·Async delivery without users staring at a spinner
·Per-render usage tracking for billing
·A render farm that wouldn't fall over on launch day

02The ZapCap workflow

Once they switched, the captioning path collapsed into three endpoints on their backend plus one ZapCap webhook.

Their existing CDN handled user uploads. Their backend forwarded the source URL to ZapCap, attached the user's selected style as a templateId, and stored the returned taskId against the user record. The webhook handler verified the signature, persisted the renderUrl, and pushed a notification.

Switching styles in the product UI became a one-field change in the request body. Multilingual? Set language. Transparent overlay for power users? Set outputMode.

Render path

userPOST /exportsPOST /videosPOST /videos/:id/taskwebhooknotify user

03Technical implementation

Two engineers, one sprint. The captioning feature shipped behind a feature flag, opened to a small fraction of users on a Wednesday, and rolled to everyone the following week.

Failure handling. The team treated ZapCap as a normal upstream dependency: signed payloads, eventId-based dedupe on their handler, and an alarm on consecutive 5xx. No render-pipeline runbook to maintain.

What was actually built

One backend route · POST /exports — wraps two ZapCap calls and persists the task ID against the user record.
One webhook handler · /hooks/zapcap — HMAC-verifies, updates the export row, notifies the user.
Style preset gallery — UI mirroring the ZapCap template list; one preset per editor card.
Failure UI — typed error codes from ZapCap mapped to user-readable messages and a retry button.
eventId dedupe on the webhook — webhook-side dedupe on retry storms.
Credit-balance check before queueing — surfaces an upgrade prompt instead of a mid-render failure.

04What changed

Operationally the bigger win wasn't the launch — it was that captioning stopped being a project. Caption rendering became a primitive the product team could use without filing a ticket. Adding new styles is a UI change, not an infra change.

Billing slotted in cleanly. API credits at $0.10/min passed through their plan tiers — exports beyond a quota became a one-line upgrade prompt instead of a feature failure.

Before

·Captioning marked "in progress" in roadmap for two quarters
·One hardcoded style, English only
·Polling spinner inside the editor
·Captions occasionally clipped the bottom safe zone
·Render queue paged the on-call engineer regularly

After

→Shipped in one sprint, two engineers
→Style preset gallery in the editor, multilingual rendering
→Async delivery — push notification on completion
→Captions respect the 9:16 safe zone in every template
→Zero render-related pages since launch

05In their words

“

We were six weeks into building a captioning stack and not visibly closer to launching one. Switching to the ZapCap API turned the roadmap item into a webhook handler. The product team picked styles; we shipped.

Engineering lead

AI video SaaS · seed-stage · anonymized

Anonymization note: name, logo, and product references withheld pending written customer permission. We'll attach the real attribution here once consent is confirmed. — ZapCap content team

Integration questions

In this anonymized example, two engineers shipped styled, multilingual caption rendering in a single sprint: one backend route, one signed webhook handler, and a style-preset gallery. No render workers, ffmpeg pipeline, or GPU queue were built or maintained.

No. The ZapCap API is webhook-native and needs no render pipeline of your own: forward a source URL plus a templateId, store the returned taskId, and handle one signed completion webhook. The render farm, fonts, and safe-zone layout are handled by ZapCap. (ZapCap also ships a web editor at zapcap.ai for teams who want a UI, but this integration uses the API.) The endpoints behind that integration are documented in ZapCap’s video captioning API overview.

Billing is usage-based at $0.10/min of rendered video. Teams can pass that per-minute cost through their own plan tiers and turn over-quota exports into an upgrade prompt rather than a feature failure.

Treat ZapCap as a normal upstream dependency: HMAC-verify the webhook payload, dedupe on eventId for retry storms, and alarm on consecutive 5xx. There is no render-pipeline runbook to maintain.

From "captions on the roadmap" to "captions in production."

01The bottleneck

02The ZapCap workflow

03Technical implementation

04What changed

05In their words

Where this story connects

Agency caption workflow case study

E-commerce video localization case study

Performance creative localization case study

Integration questions

Add caption rendering to your SaaS

From "captions on the roadmap" to "captions in production."

01The bottleneck

02The ZapCap workflow

03Technical implementation

04What changed

05In their words

Where this story connects

Agency caption workflow case study

E-commerce video localization case study

Performance creative localization case study

Integration questions

How long does it take to add captioning to a SaaS with the ZapCap API?

Do I have to build or maintain a render pipeline?

How does billing work for a SaaS that resells captioning?

How is reliability handled?

Add caption rendering to your SaaS