Home Pricing Blog Contact

How to Create SRT and VTT Subtitles with AI in 2026

May 22, 2026 By VOCAP 12 min read

Publishing a video without subtitles in 2026 means leaving out 85% of viewers who watch content on mute on the subway, in the office or in bed. And creating subtitles by hand is still one of the most tedious tasks in the editing flow: timing each cue, breaking lines, translating to other languages. AI changes the equation: a well-made SRT or VTT file from a 20-minute video is now produced in under five.

This guide explains how to create SRT and VTT files with AI from any audio or video: technical differences between the two formats, code examples, tools, how to control sync and line breaks, how to translate subtitles into multiple languages while keeping timestamps, and how to load them into YouTube, Vimeo, Premiere and HTML5 players.

85% of social videos are watched muted
12%+ extra retention with subtitles
3-5 min to generate SRT/VTT for 20-min video

SRT vs VTT: technical differences

Both are plain text files that match phrases to timestamps, but they belong to different generations. SRT (SubRip Text) was born in 2000 as the output format of the SubRip program for extracting subtitles from DVDs. VTT (WebVTT) is the modern W3C standard, designed for HTML5 players and the semantic web.

Feature SRT VTT
Standard year 2000 (de facto) 2010 (W3C)
Extension .srt .vtt
Required header No Yes (WEBVTT)
Decimal separator Comma (,) Dot (.)
Native HTML5 (<track>) Only with conversion Yes, official
CSS styling No Yes, via ::cue
Cue positioning No Yes (line, position, align)
NOTE comments No Yes
Chapters / regions No Yes
YouTube support Yes Yes
Premiere / Final Cut support Yes, native Conversion recommended
Netflix / Disney+ support Via IMSC/TTML conversion Via IMSC/TTML conversion

When to use each format

Practical rule: if the destination is an HTML5 player on your own site or a modern platform, export VTT. If the destination is a video editor (Premiere, Final Cut, DaVinci, CapCut), a social platform (YouTube, Vimeo, Facebook) or a desktop player (VLC, MX Player), export SRT. When in doubt, export SRT: it has wider historical compatibility and almost every tool knows how to convert it.

When to choose SRT

When to choose VTT

Internal structure of an SRT and a VTT

Looking at the file from the inside helps you understand how AI builds the result and how to fix it if something gets out of order.

.srt file example

1
00:00:00,000 --> 00:00:03,200
Welcome to today's podcast.

2
00:00:03,500 --> 00:00:07,800
We're going to talk about how
to create subtitles with AI.

3
00:00:08,000 --> 00:00:11,400
In five minutes you'll have
an SRT file ready to use.

Each cue has three parts: an order number, a time range with the --> arrow and comma as the decimal separator, and the subtitle text (one or two lines maximum). A blank line separates cues.

.vtt file example

WEBVTT

NOTE Subtitles generated by VOCAP

1
00:00:00.000 --> 00:00:03.200
Welcome to today's podcast.

2
00:00:03.500 --> 00:00:07.800 line:90% align:center
We're going to talk about how
to create subtitles with AI.

3
00:00:08.000 --> 00:00:11.400
<v Speaker1>In five minutes you'll have a VTT file ready to use.</v>

VTT requires the WEBVTT header as the first line, uses a dot as the decimal separator and allows extras: NOTE comments, cue positioning (line, align, position) and inline tags like <v Speaker> for speaker diarization.

Tip: never edit a .srt or .vtt in Word or Pages: they inject rich-text encoding that breaks players. Always use a plain text editor (VS Code, Sublime Text, Notepad++, BBEdit) and save as UTF-8 without BOM.

Create SRT and VTT with VOCAP

VOCAP generates both formats in the same transcription process, with phrase-level timestamps and respecting recommended lengths.

1

Upload the audio or video

Go to vocap.io/en/transcribe and drag the file. VOCAP accepts MP3, WAV, M4A, MP4, MOV, WebM, OGG, FLAC, AAC and OPUS up to 150 MB. If your video is larger, extract the audio with ffmpeg (ffmpeg -i video.mp4 -vn -acodec libmp3lame audio.mp3) and upload only the audio.

2

Wait for the transcription with timestamps

VOCAP uses OpenAI Whisper to transcribe and return phrase-level timestamps. For a 20-minute video, transcription takes between 3 and 5 minutes.

3

Export as SRT or VTT

In the results panel, click Export and choose the format. Segmentation is automatically adjusted: up to 42 characters per line, up to 6 seconds per cue, breaks at natural punctuation.

4

Review in a text editor

Open the .srt or .vtt in VS Code or Sublime Text. Confirm that timestamps are synced with the audio (you can paste the file into a player that loads subtitles to verify) and fix any proper nouns the AI may have transcribed incorrectly.

5

Load the file into your platform

Move on to the corresponding section: YouTube, Vimeo, Premiere or HTML5. Each one has a different upload flow described in the next sections.

Create Your First SRT/VTT Free

30 minutes of transcription with SRT and VTT export included. No credit card required.

Try VOCAP Free

Loading VTT in HTML5 with <track>

The native HTML5 player supports subtitles in a standard way thanks to the <track> element. It only accepts VTT.

<video controls width="720">
  <source src="podcast.mp4" type="video/mp4">

  <track
    label="English"
    kind="subtitles"
    srclang="en"
    src="podcast-en.vtt"
    default>

  <track
    label="Español"
    kind="subtitles"
    srclang="es"
    src="podcast-es.vtt">

  <track
    label="Chapters"
    kind="chapters"
    srclang="en"
    src="podcast-chapters.vtt">
</video>

The default attribute marks the track activated when the video loads. If you serve the HTML from one domain and the VTT from another (e.g. a CDN), remember to configure crossorigin="anonymous" on the <video> and the Access-Control-Allow-Origin headers on the VTT server.

Common mistake: serving the .vtt with the wrong MIME type. Configure your server to return text/vtt; if it returns text/plain or application/octet-stream, Chrome and Firefox silently ignore the file. On Nginx: types { text/vtt vtt; }. On Apache: AddType text/vtt .vtt. On Vercel or Netlify it's set from the panel.

Styling VTT subtitles with CSS

video::cue {
  background-color: rgba(0, 0, 0, 0.7);
  color: #ffeb3b;
  font-family: "Inter", sans-serif;
  font-size: 1.1em;
  text-shadow: 0 1px 2px #000;
}

video::cue(b) {
  color: #ff5252;
}

Only VTT supports this level of control. If you export SRT and need styling, you'll have to burn it into the video with ffmpeg or tools like HandBrake.

Uploading SRT/VTT to YouTube and Vimeo

YouTube

  1. Go to YouTube Studio > Content > select your video
  2. Subtitles tab in the left bar
  3. Add language > pick the language of the file
  4. Click Add under "Subtitles" > Upload file
  5. Select "With timing" and upload the .srt or .vtt
  6. YouTube activates them instantly; the CC button on the player shows them

YouTube also generates automatic subtitles in its own system, but quality in English is 88-92% and in Spanish 75-85%. Uploading your own SRT generated by VOCAP gives accuracy above 95% and improves indexing of the video in search.

Vimeo

  1. Open the video in Vimeo and click Settings
  2. Distribution tab > Subtitles section
  3. Click + Add CC/Subtitles file
  4. Upload the .srt or .vtt and select the language
  5. Check "Available" so viewers can choose them

Importing SRT into Premiere and Final Cut

Premiere Pro

Since 2022, Premiere imports .srt files directly:

  1. Window > Text > Captions > Import from SRT
  2. Select the .srt file generated by VOCAP
  3. A new subtitle track appears on the timeline
  4. Each cue can be edited individually; drag the edges to adjust timing
  5. To export the video with burned subtitles, in the Export panel enable "Burn captions into video"
  6. To export as a separate sidecar subtitle file, choose "Create captions file"

Final Cut Pro

Final Cut prefers the iTT (iTunes Timed Text) format but accepts SRT with a workaround:

  1. File > Import > Captions
  2. Select the .srt; FCP converts it to iTT internally
  3. The track appears on the timeline with editable cues
  4. To export a CEA-608 or iTT track, use Share > Master File > Roles

DaVinci Resolve and CapCut

DaVinci Resolve imports SRT since version 18 (Edit > Import > Subtitles). CapCut Desktop and Web also support SRT since 2024 (timeline > Captions > Import file). On CapCut mobile, importing is more limited and it's better to generate subtitles from within the app from the audio.

Translating subtitles to other languages

The classic flow for translating subtitles was to run the SRT through a human translator or paste it cue by cue into DeepL. With AI, the process is reduced to a single step because VOCAP translates while keeping timestamps.

1

Transcribe the audio in its original language

For example, a podcast in English. VOCAP generates the SRT/VTT in English with timestamps.

2

Enable translation to the languages you need

Spanish, French, German, Italian, Portuguese or any of the 90+ supported languages. Each language generates an independent SRT/VTT file with the same timestamps.

3

Upload alternate tracks to YouTube or your player

YouTube lets you add as many languages as you want from Subtitles > Add language. In HTML5, simply add one <track> per language with the corresponding srclang attribute.

Why subtitle translation matters: a video with subtitles in 3 languages multiplies the potential reach by 3-5x. YouTube indexes by subtitle language, so a podcast in English with subs in Spanish and Portuguese will appear in searches across the three markets. The marginal cost with AI is cents per language; with a human translator it would be USD 50-100.

Best practices: length, timing, reading speed

The CSA (France), BBC (UK), Netflix Style Guide and CPL (Captioned Media Program in the US) guidelines agree on nearly everything.

Rule Recommended value Why
Characters per line Max 42 Fits 16:9 screens without crowding
Lines per cue Max 2 More blocks the image
Duration per cue 1-6 seconds Comfortable reading time
Reading speed < 17 characters/second BBC and Netflix standard
Gap between cues ≥ 80 ms Avoids flickering between subtitles
Line break At natural punctuation Do not split phrases
Speaker identification Only when confusing Use "- " or <v> in VTT

Subtitles made by hand

  • 3-5 hours per hour of video
  • Frequent sync errors
  • Inconsistency between cues
  • Translation multiplies cost per language
  • Guaranteed boredom

Subtitles with VOCAP + AI

  • 3-5 minutes per hour of video
  • Perfect phrase-level sync
  • CSA/BBC rules applied by default
  • Translation to 90+ languages in the same step
  • Free time for creative editing

Real use cases

Video podcasters

They turn YouTube and Spotify Video episodes into accessible, better-indexed content.

  • SRT to upload to YouTube
  • VTT for the podcast's own site
  • Translations to Spanish and Portuguese
  • Improves video SEO

Online courses and trainers

Generate accessible subtitles for their academies on Moodle, Teachable or their own site.

  • VTT for HTML5 player
  • Chapters in a separate VTT
  • WCAG 2.2 compliance
  • Students across languages

Reels and Shorts creators

Burned-in or sidecar subtitles for Instagram, TikTok and YouTube Shorts.

  • SRT as source
  • Burn-in via ffmpeg or CapCut
  • Per-platform styling
  • Boosts retention by 80%

Companies and corporate video

Onboarding, internal training, multilingual product videos.

  • SRT for Premiere
  • Translation to Spanish/French
  • Intranet accessibility
  • International leverage

Journalists and documentaries

Recorded interviews with exact subtitles for broadcast.

  • SRT compatible with broadcast editors
  • Speaker markers in VTT
  • Quotes with exact timestamps
  • Versioning to multiple languages

Streamers and gaming editors

Twitch and YouTube Gaming VODs with automatic subtitles.

  • SRT from the long VOD
  • Translation for global audience
  • Better YouTube SEO
  • Community accessibility

Generate Your SRT and VTT Subtitles in Minutes

Try VOCAP free: 30 minutes of transcription with SRT and VTT export included. No credit card. Works on Mac, Windows, Linux, iPhone and Android from Safari or Chrome.

Start Free

Frequently asked questions

What is the difference between SRT and VTT?

SRT (SubRip Text) is the older and most compatible format: it is supported by YouTube, Vimeo, Premiere, Final Cut, VLC, Netflix and pretty much every player. It uses comma as the decimal separator. VTT (WebVTT) is the modern web standard: HTML5 players use it via the <track> element, it supports CSS styles, on-screen text positioning and comments. It uses a dot as the decimal separator. For modern web use VTT; for edited video or platform uploads, use SRT.

Can I create an SRT directly from audio without video?

Yes. SRT and VTT are just text with timestamps; they don't contain video. VOCAP generates the file from any MP3, WAV, M4A or OGG. The audio is transcribed with Whisper, automatically segmented into 3-6 second cues and exported as .srt or .vtt ready to sync with the video you'll create afterwards or as a base for podcast subtitles.

How does automatic translation of an SRT to another language work?

VOCAP transcribes the audio in its original language and, in the same process, can translate the result to English, French, German, Italian, Portuguese or any of 90+ languages while keeping timestamps. Translation is done by Claude after transcription, sentence by sentence, so each cue keeps its time position. The result is two SRT/VTT files: original and translated.

How long should each subtitle line be?

CSA, BBC and Netflix style guides agree: maximum 42 characters per line, maximum 2 lines per cue, duration between 1 and 6 seconds, and reading speed below 17 characters per second. VOCAP segments automatically respecting these limits.

Why does YouTube accept SRT and VTT but display them differently?

YouTube ingests both formats but internally converts them to its own JSON3 format. The visual result is identical for the viewer. The practical difference is that VTT allows you to include metadata (NOTE), cue settings (position, alignment) and formatting (italic, bold) that SRT does not natively support.

Start Creating Professional Subtitles Today

30 minutes of transcription free with SRT and VTT export. No credit card required.

Try VOCAP Free
Try VOCAP free 15 min transcription
Start Free →