Subtitles
Overview
The Subtitles Task creates caption tracks (.vtt
or .srt
) for video or audio files.
It converts detected speech into synchronized subtitle text with timestamps that align with playback.
When a Subtitles task runs, it creates a Track file with kind: "subtitles"
and outputs a .vtt
or .srt
text file containing caption data.
Example Output
Creating a Subtitles Task
Subtitles tasks can be created directly for a file or URL.
They typically follow a prior Speech task to transcribe the spoken audio.
When the task completes, ittybit will create a Track file in your project
and send a webhook to your endpoint if webhook_url
is defined.
Webhook Example
File Structure
Property | Type | Description |
---|---|---|
id | string | Unique file ID for the subtitle track. |
object | string | Always "track". |
kind | string | Always "subtitles". |
language | string | Language code (ISO 639-1). |
format | string | Output format — "vtt" or "srt". |
filename | string | Name of the subtitle file. |
duration | number | Duration of the associated media file in seconds. |
filesize | number | Size of the subtitle file in bytes. |
url | string | Publicly accessible subtitle file URL. |
metadata | object | Reserved for additional details. |
created / updated | string (ISO 8601) | Timestamps for creation and last update. |
Supported Inputs
Subtitles can be generated from:
-
Video files:
.mp4
,.mov
,.webm
-
Audio files:
.mp3
,.wav
,.m4a
Example Workflow Integration
Subtitles tasks can be part of a broader Automation
that processes uploaded media automatically.
When a new media file is created, this automation will encode the video and generate subtitles automatically.
Example Output Format
Typical .vtt
output:
Common Use Cases
-
Generating closed captions for accessibility
-
Localizing content into multiple languages
-
Improving SEO and video searchability
-
Creating transcribed learning materials
Summary
The Subtitles task converts detected speech into time-coded captions for video or audio.
It produces .vtt
or .srt
files compatible with standard web and media players
and can be chained with other tasks in workflows or automations.