Publishing a video without subtitles in 2026 means leaving out 85% of viewers who watch content on mute on the subway, in the office or in bed. And creating subtitles by hand is still one of the most tedious tasks in the editing flow: timing each cue, breaking lines, translating to other languages. AI changes the equation: a well-made SRT or VTT file from a 20-minute video is now produced in under five.
This guide explains how to create SRT and VTT files with AI from any audio or video: technical differences between the two formats, code examples, tools, how to control sync and line breaks, how to translate subtitles into multiple languages while keeping timestamps, and how to load them into YouTube, Vimeo, Premiere and HTML5 players.
Article contents
- SRT vs VTT: technical differences
- When to use each format
- Internal structure of an SRT and a VTT
- Create SRT and VTT with VOCAP
- Loading VTT in HTML5 with <track>
- Uploading SRT/VTT to YouTube and Vimeo
- Importing SRT into Premiere and Final Cut
- Translating subtitles to other languages
- Best practices: length, timing, reading speed
- Frequently asked questions
SRT vs VTT: technical differences
Both are plain text files that match phrases to timestamps, but they belong to different generations. SRT (SubRip Text) was born in 2000 as the output format of the SubRip program for extracting subtitles from DVDs. VTT (WebVTT) is the modern W3C standard, designed for HTML5 players and the semantic web.
| Feature | SRT | VTT |
|---|---|---|
| Standard year | 2000 (de facto) | 2010 (W3C) |
| Extension | .srt | .vtt |
| Required header | No | Yes (WEBVTT) |
| Decimal separator | Comma (,) | Dot (.) |
| Native HTML5 (<track>) | Only with conversion | Yes, official |
| CSS styling | No | Yes, via ::cue |
| Cue positioning | No | Yes (line, position, align) |
| NOTE comments | No | Yes |
| Chapters / regions | No | Yes |
| YouTube support | Yes | Yes |
| Premiere / Final Cut support | Yes, native | Conversion recommended |
| Netflix / Disney+ support | Via IMSC/TTML conversion | Via IMSC/TTML conversion |
When to use each format
Practical rule: if the destination is an HTML5 player on your own site or a modern platform, export VTT. If the destination is a video editor (Premiere, Final Cut, DaVinci, CapCut), a social platform (YouTube, Vimeo, Facebook) or a desktop player (VLC, MX Player), export SRT. When in doubt, export SRT: it has wider historical compatibility and almost every tool knows how to convert it.
When to choose SRT
- Video editing: Premiere Pro and DaVinci Resolve import it into the timeline as an editable subtitle track
- Desktop players: VLC, MPC-HC, MX Player auto-detect it if it shares the filename with the .mp4
- Uploading to YouTube and Vimeo: both accept it without conversion
- Client delivery: it's the format almost everyone knows how to open
When to choose VTT
- Your own HTML5 player: the <track> element of <video> only officially accepts VTT
- Courses and LMS platforms: Moodle, Canvas, Coursera or your own video player prefer VTT
- Styled subtitles: if you need colors, positioning or italics without burning text into the video
- Chapter tracks: VTT supports <track kind="chapters"> for marker-based navigation
- Modern web apps: React, Vue or any framework using the browser's native player
Internal structure of an SRT and a VTT
Looking at the file from the inside helps you understand how AI builds the result and how to fix it if something gets out of order.
.srt file example
1 00:00:00,000 --> 00:00:03,200 Welcome to today's podcast. 2 00:00:03,500 --> 00:00:07,800 We're going to talk about how to create subtitles with AI. 3 00:00:08,000 --> 00:00:11,400 In five minutes you'll have an SRT file ready to use.
Each cue has three parts: an order number, a time range with the --> arrow and comma as the decimal separator, and the subtitle text (one or two lines maximum). A blank line separates cues.
.vtt file example
WEBVTT NOTE Subtitles generated by VOCAP 1 00:00:00.000 --> 00:00:03.200 Welcome to today's podcast. 2 00:00:03.500 --> 00:00:07.800 line:90% align:center We're going to talk about how to create subtitles with AI. 3 00:00:08.000 --> 00:00:11.400 <v Speaker1>In five minutes you'll have a VTT file ready to use.</v>
VTT requires the WEBVTT header as the first line, uses a dot as the decimal separator and allows extras: NOTE comments, cue positioning (line, align, position) and inline tags like <v Speaker> for speaker diarization.
Tip: never edit a .srt or .vtt in Word or Pages: they inject rich-text encoding that breaks players. Always use a plain text editor (VS Code, Sublime Text, Notepad++, BBEdit) and save as UTF-8 without BOM.
Create SRT and VTT with VOCAP
VOCAP generates both formats in the same transcription process, with phrase-level timestamps and respecting recommended lengths.
Upload the audio or video
Go to vocap.io/en/transcribe and drag the file. VOCAP accepts MP3, WAV, M4A, MP4, MOV, WebM, OGG, FLAC, AAC and OPUS up to 150 MB. If your video is larger, extract the audio with ffmpeg (ffmpeg -i video.mp4 -vn -acodec libmp3lame audio.mp3) and upload only the audio.
Wait for the transcription with timestamps
VOCAP uses OpenAI Whisper to transcribe and return phrase-level timestamps. For a 20-minute video, transcription takes between 3 and 5 minutes.
Export as SRT or VTT
In the results panel, click Export and choose the format. Segmentation is automatically adjusted: up to 42 characters per line, up to 6 seconds per cue, breaks at natural punctuation.
Review in a text editor
Open the .srt or .vtt in VS Code or Sublime Text. Confirm that timestamps are synced with the audio (you can paste the file into a player that loads subtitles to verify) and fix any proper nouns the AI may have transcribed incorrectly.
Load the file into your platform
Move on to the corresponding section: YouTube, Vimeo, Premiere or HTML5. Each one has a different upload flow described in the next sections.
Create Your First SRT/VTT Free
30 minutes of transcription with SRT and VTT export included. No credit card required.
Try VOCAP FreeLoading VTT in HTML5 with <track>
The native HTML5 player supports subtitles in a standard way thanks to the <track> element. It only accepts VTT.
<video controls width="720">
<source src="podcast.mp4" type="video/mp4">
<track
label="English"
kind="subtitles"
srclang="en"
src="podcast-en.vtt"
default>
<track
label="Español"
kind="subtitles"
srclang="es"
src="podcast-es.vtt">
<track
label="Chapters"
kind="chapters"
srclang="en"
src="podcast-chapters.vtt">
</video>
The default attribute marks the track activated when the video loads. If you serve the HTML from one domain and the VTT from another (e.g. a CDN), remember to configure crossorigin="anonymous" on the <video> and the Access-Control-Allow-Origin headers on the VTT server.
Common mistake: serving the .vtt with the wrong MIME type. Configure your server to return text/vtt; if it returns text/plain or application/octet-stream, Chrome and Firefox silently ignore the file. On Nginx: types { text/vtt vtt; }. On Apache: AddType text/vtt .vtt. On Vercel or Netlify it's set from the panel.
Styling VTT subtitles with CSS
video::cue {
background-color: rgba(0, 0, 0, 0.7);
color: #ffeb3b;
font-family: "Inter", sans-serif;
font-size: 1.1em;
text-shadow: 0 1px 2px #000;
}
video::cue(b) {
color: #ff5252;
}
Only VTT supports this level of control. If you export SRT and need styling, you'll have to burn it into the video with ffmpeg or tools like HandBrake.
Uploading SRT/VTT to YouTube and Vimeo
YouTube
- Go to YouTube Studio > Content > select your video
- Subtitles tab in the left bar
- Add language > pick the language of the file
- Click Add under "Subtitles" > Upload file
- Select "With timing" and upload the .srt or .vtt
- YouTube activates them instantly; the CC button on the player shows them
YouTube also generates automatic subtitles in its own system, but quality in English is 88-92% and in Spanish 75-85%. Uploading your own SRT generated by VOCAP gives accuracy above 95% and improves indexing of the video in search.
Vimeo
- Open the video in Vimeo and click Settings
- Distribution tab > Subtitles section
- Click + Add CC/Subtitles file
- Upload the .srt or .vtt and select the language
- Check "Available" so viewers can choose them
Importing SRT into Premiere and Final Cut
Premiere Pro
Since 2022, Premiere imports .srt files directly:
- Window > Text > Captions > Import from SRT
- Select the .srt file generated by VOCAP
- A new subtitle track appears on the timeline
- Each cue can be edited individually; drag the edges to adjust timing
- To export the video with burned subtitles, in the Export panel enable "Burn captions into video"
- To export as a separate sidecar subtitle file, choose "Create captions file"
Final Cut Pro
Final Cut prefers the iTT (iTunes Timed Text) format but accepts SRT with a workaround:
- File > Import > Captions
- Select the .srt; FCP converts it to iTT internally
- The track appears on the timeline with editable cues
- To export a CEA-608 or iTT track, use Share > Master File > Roles
DaVinci Resolve and CapCut
DaVinci Resolve imports SRT since version 18 (Edit > Import > Subtitles). CapCut Desktop and Web also support SRT since 2024 (timeline > Captions > Import file). On CapCut mobile, importing is more limited and it's better to generate subtitles from within the app from the audio.
Translating subtitles to other languages
The classic flow for translating subtitles was to run the SRT through a human translator or paste it cue by cue into DeepL. With AI, the process is reduced to a single step because VOCAP translates while keeping timestamps.
Transcribe the audio in its original language
For example, a podcast in English. VOCAP generates the SRT/VTT in English with timestamps.
Enable translation to the languages you need
Spanish, French, German, Italian, Portuguese or any of the 90+ supported languages. Each language generates an independent SRT/VTT file with the same timestamps.
Upload alternate tracks to YouTube or your player
YouTube lets you add as many languages as you want from Subtitles > Add language. In HTML5, simply add one <track> per language with the corresponding srclang attribute.
Why subtitle translation matters: a video with subtitles in 3 languages multiplies the potential reach by 3-5x. YouTube indexes by subtitle language, so a podcast in English with subs in Spanish and Portuguese will appear in searches across the three markets. The marginal cost with AI is cents per language; with a human translator it would be USD 50-100.
Best practices: length, timing, reading speed
The CSA (France), BBC (UK), Netflix Style Guide and CPL (Captioned Media Program in the US) guidelines agree on nearly everything.
| Rule | Recommended value | Why |
|---|---|---|
| Characters per line | Max 42 | Fits 16:9 screens without crowding |
| Lines per cue | Max 2 | More blocks the image |
| Duration per cue | 1-6 seconds | Comfortable reading time |
| Reading speed | < 17 characters/second | BBC and Netflix standard |
| Gap between cues | ≥ 80 ms | Avoids flickering between subtitles |
| Line break | At natural punctuation | Do not split phrases |
| Speaker identification | Only when confusing | Use "- " or <v> in VTT |
Subtitles made by hand
- 3-5 hours per hour of video
- Frequent sync errors
- Inconsistency between cues
- Translation multiplies cost per language
- Guaranteed boredom
Subtitles with VOCAP + AI
- 3-5 minutes per hour of video
- Perfect phrase-level sync
- CSA/BBC rules applied by default
- Translation to 90+ languages in the same step
- Free time for creative editing
Real use cases
Video podcasters
They turn YouTube and Spotify Video episodes into accessible, better-indexed content.
- SRT to upload to YouTube
- VTT for the podcast's own site
- Translations to Spanish and Portuguese
- Improves video SEO
Online courses and trainers
Generate accessible subtitles for their academies on Moodle, Teachable or their own site.
- VTT for HTML5 player
- Chapters in a separate VTT
- WCAG 2.2 compliance
- Students across languages
Reels and Shorts creators
Burned-in or sidecar subtitles for Instagram, TikTok and YouTube Shorts.
- SRT as source
- Burn-in via ffmpeg or CapCut
- Per-platform styling
- Boosts retention by 80%
Companies and corporate video
Onboarding, internal training, multilingual product videos.
- SRT for Premiere
- Translation to Spanish/French
- Intranet accessibility
- International leverage
Journalists and documentaries
Recorded interviews with exact subtitles for broadcast.
- SRT compatible with broadcast editors
- Speaker markers in VTT
- Quotes with exact timestamps
- Versioning to multiple languages
Streamers and gaming editors
Twitch and YouTube Gaming VODs with automatic subtitles.
- SRT from the long VOD
- Translation for global audience
- Better YouTube SEO
- Community accessibility
Generate Your SRT and VTT Subtitles in Minutes
Try VOCAP free: 30 minutes of transcription with SRT and VTT export included. No credit card. Works on Mac, Windows, Linux, iPhone and Android from Safari or Chrome.
Start FreeFrequently asked questions
What is the difference between SRT and VTT?
SRT (SubRip Text) is the older and most compatible format: it is supported by YouTube, Vimeo, Premiere, Final Cut, VLC, Netflix and pretty much every player. It uses comma as the decimal separator. VTT (WebVTT) is the modern web standard: HTML5 players use it via the <track> element, it supports CSS styles, on-screen text positioning and comments. It uses a dot as the decimal separator. For modern web use VTT; for edited video or platform uploads, use SRT.
Can I create an SRT directly from audio without video?
Yes. SRT and VTT are just text with timestamps; they don't contain video. VOCAP generates the file from any MP3, WAV, M4A or OGG. The audio is transcribed with Whisper, automatically segmented into 3-6 second cues and exported as .srt or .vtt ready to sync with the video you'll create afterwards or as a base for podcast subtitles.
How does automatic translation of an SRT to another language work?
VOCAP transcribes the audio in its original language and, in the same process, can translate the result to English, French, German, Italian, Portuguese or any of 90+ languages while keeping timestamps. Translation is done by Claude after transcription, sentence by sentence, so each cue keeps its time position. The result is two SRT/VTT files: original and translated.
How long should each subtitle line be?
CSA, BBC and Netflix style guides agree: maximum 42 characters per line, maximum 2 lines per cue, duration between 1 and 6 seconds, and reading speed below 17 characters per second. VOCAP segments automatically respecting these limits.
Why does YouTube accept SRT and VTT but display them differently?
YouTube ingests both formats but internally converts them to its own JSON3 format. The visual result is identical for the viewer. The practical difference is that VTT allows you to include metadata (NOTE), cue settings (position, alignment) and formatting (italic, bold) that SRT does not natively support.
Start Creating Professional Subtitles Today
30 minutes of transcription free with SRT and VTT export. No credit card required.
Try VOCAP Free