What is the difference between SRT and VTT?

SRT (SubRip Text) is the older and most compatible format: it is supported by YouTube, Vimeo, Premiere, Final Cut, VLC, Netflix and pretty much every player. It uses timestamps with a comma as the decimal separator (00:00:01,500). VTT (WebVTT) is the modern web standard: HTML5 players use it via the element, it supports CSS styles, on-screen positioning of text, regions, metadata and cue settings. It uses a dot as the decimal separator (00:00:01.500). For modern web use VTT; for edited video or uploads to platforms use SRT.

Can I create an SRT directly from audio without video?

Yes. SRT and VTT are just text with timestamps; they do not contain video. VOCAP generates the file from any MP3, WAV, M4A or OGG. The audio is transcribed with Whisper, automatically segmented into 3-6 second cues and exported as .srt or .vtt ready to sync with the video you will create afterwards or as a base for podcast subtitles.

How does automatic translation of an SRT to another language work?

VOCAP transcribes the audio in its original language and, in the same process, can translate the result to English, French, German, Italian, Portuguese or any of 90+ languages while keeping timestamps. The translation is done by Claude after the transcription, sentence by sentence, so each cue keeps its time position. The result is two SRT/VTT files: original and translated, which YouTube and HTML5 players can offer as alternative tracks.

How long should each subtitle line be?

CSA, BBC and Netflix style guides agree: maximum 42 characters per line, maximum 2 lines per cue, duration between 1 and 6 seconds, and a reading speed below 17 characters per second. VOCAP segments automatically respecting these limits, but if you rewrite a cue manually, keep the rule. Longer cues tire the viewer; cues that are too short flicker.

Why does YouTube accept SRT and VTT but display them differently?

YouTube ingests both formats but internally converts them to its own JSON3 format. The visual result is identical for the viewer. The practical difference is that VTT allows you to include metadata (NOTE), cue settings (position, alignment) and formatting (italic, bold) that SRT does not natively support. If you upload to YouTube and don't need styling, both work; if you want positioning or markup, use VTT.

How to Create SRT and VTT Subtitles with AI [2026 Guide]

Publishing a video without subtitles in 2026 means leaving out 85% of viewers who watch content on mute on the subway, in the office or in bed. And creating subtitles by hand is still one of the most tedious tasks in the editing flow: timing each cue, breaking lines, translating to other languages. AI changes the equation: a well-made SRT or VTT file from a 20-minute video is now produced in under five.

This guide explains how to create SRT and VTT files with AI from any audio or video: technical differences between the two formats, code examples, tools, how to control sync and line breaks, how to translate subtitles into multiple languages while keeping timestamps, and how to load them into YouTube, Vimeo, Premiere and HTML5 players.

85% of social videos are watched muted

12%+ extra retention with subtitles

3-5 min to generate SRT/VTT for 20-min video

Article contents

SRT vs VTT: technical differences
When to use each format
Internal structure of an SRT and a VTT
Create SRT and VTT with VOCAP
Loading VTT in HTML5 with <track>
Uploading SRT/VTT to YouTube and Vimeo
Importing SRT into Premiere and Final Cut
Translating subtitles to other languages
Best practices: length, timing, reading speed
Frequently asked questions

SRT vs VTT: technical differences

Both are plain text files that match phrases to timestamps, but they belong to different generations. SRT (SubRip Text) was born in 2000 as the output format of the SubRip program for extracting subtitles from DVDs. VTT (WebVTT) is the modern W3C standard, designed for HTML5 players and the semantic web.

Feature	SRT	VTT
Standard year	2000 (de facto)	2010 (W3C)
Extension	.srt	.vtt
Required header	No	Yes (WEBVTT)
Decimal separator	Comma (,)	Dot (.)
Native HTML5 (<track>)	Only with conversion	Yes, official
CSS styling	No	Yes, via ::cue
Cue positioning	No	Yes (line, position, align)
NOTE comments	No	Yes
Chapters / regions	No	Yes
YouTube support	Yes	Yes
Premiere / Final Cut support	Yes, native	Conversion recommended
Netflix / Disney+ support	Via IMSC/TTML conversion	Via IMSC/TTML conversion

When to use each format

Practical rule: if the destination is an HTML5 player on your own site or a modern platform, export VTT. If the destination is a video editor (Premiere, Final Cut, DaVinci, CapCut), a social platform (YouTube, Vimeo, Facebook) or a desktop player (VLC, MX Player), export SRT. When in doubt, export SRT: it has wider historical compatibility and almost every tool knows how to convert it.

When to choose SRT

Video editing: Premiere Pro and DaVinci Resolve import it into the timeline as an editable subtitle track
Desktop players: VLC, MPC-HC, MX Player auto-detect it if it shares the filename with the .mp4
Uploading to YouTube and Vimeo: both accept it without conversion
Client delivery: it's the format almost everyone knows how to open

When to choose VTT

Your own HTML5 player: the <track> element of <video> only officially accepts VTT
Courses and LMS platforms: Moodle, Canvas, Coursera or your own video player prefer VTT
Styled subtitles: if you need colors, positioning or italics without burning text into the video
Chapter tracks: VTT supports <track kind="chapters"> for marker-based navigation
Modern web apps: React, Vue or any framework using the browser's native player

Internal structure of an SRT and a VTT

Looking at the file from the inside helps you understand how AI builds the result and how to fix it if something gets out of order.

.srt file example

1
00:00:00,000 --> 00:00:03,200
Welcome to today's podcast.

2
00:00:03,500 --> 00:00:07,800
We're going to talk about how
to create subtitles with AI.

3
00:00:08,000 --> 00:00:11,400
In five minutes you'll have
an SRT file ready to use.

Each cue has three parts: an order number, a time range with the --> arrow and comma as the decimal separator, and the subtitle text (one or two lines maximum). A blank line separates cues.

.vtt file example

WEBVTT

NOTE Subtitles generated by VOCAP

1
00:00:00.000 --> 00:00:03.200
Welcome to today's podcast.

2
00:00:03.500 --> 00:00:07.800 line:90% align:center
We're going to talk about how
to create subtitles with AI.

3
00:00:08.000 --> 00:00:11.400
<v Speaker1>In five minutes you'll have a VTT file ready to use.</v>

VTT requires the WEBVTT header as the first line, uses a dot as the decimal separator and allows extras: NOTE comments, cue positioning (line, align, position) and inline tags like <v Speaker> for speaker diarization.

Tip: never edit a .srt or .vtt in Word or Pages: they inject rich-text encoding that breaks players. Always use a plain text editor (VS Code, Sublime Text, Notepad++, BBEdit) and save as UTF-8 without BOM.

Create SRT and VTT with VOCAP

VOCAP generates both formats in the same transcription process, with phrase-level timestamps and respecting recommended lengths.

Upload the audio or video

Go to vocap.io/en/transcribe and drag the file. VOCAP accepts MP3, WAV, M4A, MP4, MOV, WebM, OGG, FLAC, AAC and OPUS up to 150 MB. If your video is larger, extract the audio with ffmpeg (ffmpeg -i video.mp4 -vn -acodec libmp3lame audio.mp3) and upload only the audio.

Wait for the transcription with timestamps

VOCAP uses OpenAI Whisper to transcribe and return phrase-level timestamps. For a 20-minute video, transcription takes between 3 and 5 minutes.

Export as SRT or VTT

In the results panel, click Export and choose the format. Segmentation is automatically adjusted: up to 42 characters per line, up to 6 seconds per cue, breaks at natural punctuation.

Review in a text editor

Open the .srt or .vtt in VS Code or Sublime Text. Confirm that timestamps are synced with the audio (you can paste the file into a player that loads subtitles to verify) and fix any proper nouns the AI may have transcribed incorrectly.

Load the file into your platform

Move on to the corresponding section: YouTube, Vimeo, Premiere or HTML5. Each one has a different upload flow described in the next sections.

Create Your First SRT/VTT Free

30 minutes of transcription with SRT and VTT export included. No credit card required.

Try VOCAP Free

Loading VTT in HTML5 with <track>

The native HTML5 player supports subtitles in a standard way thanks to the <track> element. It only accepts VTT.

<video controls width="720">
  <source src="podcast.mp4" type="video/mp4">

  <track
    label="English"
    kind="subtitles"
    srclang="en"
    src="podcast-en.vtt"
    default>

  <track
    label="Español"
    kind="subtitles"
    srclang="es"
    src="podcast-es.vtt">

  <track
    label="Chapters"
    kind="chapters"
    srclang="en"
    src="podcast-chapters.vtt">
</video>

The default attribute marks the track activated when the video loads. If you serve the HTML from one domain and the VTT from another (e.g. a CDN), remember to configure crossorigin="anonymous" on the <video> and the Access-Control-Allow-Origin headers on the VTT server.

Common mistake: serving the .vtt with the wrong MIME type. Configure your server to return text/vtt; if it returns text/plain or application/octet-stream, Chrome and Firefox silently ignore the file. On Nginx: types { text/vtt vtt; }. On Apache: AddType text/vtt .vtt. On Vercel or Netlify it's set from the panel.

Styling VTT subtitles with CSS

video::cue {
  background-color: rgba(0, 0, 0, 0.7);
  color: #ffeb3b;
  font-family: "Inter", sans-serif;
  font-size: 1.1em;
  text-shadow: 0 1px 2px #000;
}

video::cue(b) {
  color: #ff5252;
}

Only VTT supports this level of control. If you export SRT and need styling, you'll have to burn it into the video with ffmpeg or tools like HandBrake.

Uploading SRT/VTT to YouTube and Vimeo

YouTube

Go to YouTube Studio > Content > select your video
Subtitles tab in the left bar
Add language > pick the language of the file
Click Add under "Subtitles" > Upload file
Select "With timing" and upload the .srt or .vtt
YouTube activates them instantly; the CC button on the player shows them

YouTube also generates automatic subtitles in its own system, but quality in English is 88-92% and in Spanish 75-85%. Uploading your own SRT generated by VOCAP gives accuracy above 95% and improves indexing of the video in search.

Vimeo

Open the video in Vimeo and click Settings
Distribution tab > Subtitles section
Click + Add CC/Subtitles file
Upload the .srt or .vtt and select the language
Check "Available" so viewers can choose them

Importing SRT into Premiere and Final Cut

Premiere Pro

Since 2022, Premiere imports .srt files directly:

Window > Text > Captions > Import from SRT
Select the .srt file generated by VOCAP
A new subtitle track appears on the timeline
Each cue can be edited individually; drag the edges to adjust timing
To export the video with burned subtitles, in the Export panel enable "Burn captions into video"
To export as a separate sidecar subtitle file, choose "Create captions file"

Final Cut Pro

Final Cut prefers the iTT (iTunes Timed Text) format but accepts SRT with a workaround:

File > Import > Captions
Select the .srt; FCP converts it to iTT internally
The track appears on the timeline with editable cues
To export a CEA-608 or iTT track, use Share > Master File > Roles

DaVinci Resolve and CapCut

DaVinci Resolve imports SRT since version 18 (Edit > Import > Subtitles). CapCut Desktop and Web also support SRT since 2024 (timeline > Captions > Import file). On CapCut mobile, importing is more limited and it's better to generate subtitles from within the app from the audio.

Translating subtitles to other languages

The classic flow for translating subtitles was to run the SRT through a human translator or paste it cue by cue into DeepL. With AI, the process is reduced to a single step because VOCAP translates while keeping timestamps.

Transcribe the audio in its original language

For example, a podcast in English. VOCAP generates the SRT/VTT in English with timestamps.

Enable translation to the languages you need

Spanish, French, German, Italian, Portuguese or any of the 90+ supported languages. Each language generates an independent SRT/VTT file with the same timestamps.

Upload alternate tracks to YouTube or your player

YouTube lets you add as many languages as you want from Subtitles > Add language. In HTML5, simply add one <track> per language with the corresponding srclang attribute.

Why subtitle translation matters: a video with subtitles in 3 languages multiplies the potential reach by 3-5x. YouTube indexes by subtitle language, so a podcast in English with subs in Spanish and Portuguese will appear in searches across the three markets. The marginal cost with AI is cents per language; with a human translator it would be USD 50-100.

Best practices: length, timing, reading speed

The CSA (France), BBC (UK), Netflix Style Guide and CPL (Captioned Media Program in the US) guidelines agree on nearly everything.

Rule	Recommended value	Why
Characters per line	Max 42	Fits 16:9 screens without crowding
Lines per cue	Max 2	More blocks the image
Duration per cue	1-6 seconds	Comfortable reading time
Reading speed	< 17 characters/second	BBC and Netflix standard
Gap between cues	≥ 80 ms	Avoids flickering between subtitles
Line break	At natural punctuation	Do not split phrases
Speaker identification	Only when confusing	Use "- " or `<v>` in VTT

Subtitles made by hand

3-5 hours per hour of video
Frequent sync errors
Inconsistency between cues
Translation multiplies cost per language
Guaranteed boredom

Subtitles with VOCAP + AI

3-5 minutes per hour of video
Perfect phrase-level sync
CSA/BBC rules applied by default
Translation to 90+ languages in the same step
Free time for creative editing

Real use cases

Video podcasters

They turn YouTube and Spotify Video episodes into accessible, better-indexed content.

SRT to upload to YouTube
VTT for the podcast's own site
Translations to Spanish and Portuguese
Improves video SEO

Online courses and trainers

Generate accessible subtitles for their academies on Moodle, Teachable or their own site.

VTT for HTML5 player
Chapters in a separate VTT
WCAG 2.2 compliance
Students across languages

Reels and Shorts creators

Burned-in or sidecar subtitles for Instagram, TikTok and YouTube Shorts.

SRT as source
Burn-in via ffmpeg or CapCut
Per-platform styling
Boosts retention by 80%

Companies and corporate video

Onboarding, internal training, multilingual product videos.

SRT for Premiere
Translation to Spanish/French
Intranet accessibility
International leverage

Journalists and documentaries

Recorded interviews with exact subtitles for broadcast.

SRT compatible with broadcast editors
Speaker markers in VTT
Quotes with exact timestamps
Versioning to multiple languages

Streamers and gaming editors

Twitch and YouTube Gaming VODs with automatic subtitles.

SRT from the long VOD
Translation for global audience
Better YouTube SEO
Community accessibility

Generate Your SRT and VTT Subtitles in Minutes

Try VOCAP free: 30 minutes of transcription with SRT and VTT export included. No credit card. Works on Mac, Windows, Linux, iPhone and Android from Safari or Chrome.

Start Free

Frequently asked questions

Start Creating Professional Subtitles Today

30 minutes of transcription free with SRT and VTT export. No credit card required.