Since 28 June 2025, the European Accessibility Act (EAA) is mandatory across the EU. If your company sells digital services — e-commerce, banking, e-learning, streaming, transport, ebooks — your videos must carry accessible subtitles. It is not a recommendation: fines reach 600,000 EUR in Spain (Law 11/2023) and similar penalties exist across the rest of the EU.
But "accessible subtitles" is not the same as "subtitles". WCAG 2.2 requires captions with speaker identification, sound descriptions and minimum contrast. YouTube auto-captioning does not comply. This guide walks you through the full workflow with AI + human review.
The Legal Framework: EAA + WCAG + EN 301 549
Three regulations overlap. Know each one — auditors check all three:
- European Accessibility Act (EU Directive 2019/882): mandatory since 28 June 2025 for B2C digital services in the EU. Transposed in Spain via Law 11/2023 and Royal Decree 193/2023.
- WCAG 2.2 (W3C): the technical standard. The relevant criteria for video are 1.2.2 (Captions prerecorded), 1.2.4 (Captions live) and 1.2.5 (Audio description). Level AA is the legal minimum.
- EN 301 549 v3.2.1: the harmonised European standard that references WCAG and applies to ICT products. The one cited by public auditors.
Who is obliged: companies with more than 10 employees or more than 2M EUR annual turnover selling e-commerce, banking, streaming, e-learning, ebooks, transport tickets or telecoms in the EU. Microenterprises are exempt in services but NOT in products.
Subtitles vs Closed Captions (SDH)
This is the most common confusion and the one that fails audits:
BASIC SUBTITLES: - Dialogue or translation only - No speaker identification - No sound descriptions - DO NOT comply with WCAG 1.2.2 - Designed for foreign language
CLOSED CAPTIONS / SDH: - Dialogue + relevant sounds - Speaker identified (name or color) - [music], [laughter], [phone] - Comply with WCAG 1.2.2 level A - For deaf and hard of hearing - Closed caption (toggleable)
To comply with WCAG level AA you always need SDH closed captions, never plain subtitles. The cost difference is minimal if you use AI auto-subtitles and enrich them afterwards.
AI Workflow Step by Step
1. Transcribe with timestamps (5 min): Upload the video to VOCAP and get word-level timestamps. This is the base — no precise timing, no decent subtitle. See how to transcribe with timestamps.
2. Segment into blocks (10 min): Each block lasts 1-3 seconds and has max 2 lines of 37-42 characters. Blocks must respect natural pauses and syntactic units.
3. Identify speakers (5 min): If there are several speakers, name prefix or color change. Automatic diarization gets you 80% there.
4. Add non-verbal descriptions (5 min): Review the audio and mark sounds relevant to the narrative: [suspense music], [door slams], [nervous laugh], [silence]. Skip irrelevant ambient sounds.
5. Export SRT or WebVTT (1 min): SRT for YouTube/Vimeo; WebVTT for custom HTML5 players. Both are plain editable text.
6. Validate and publish (5 min): Validate with WAVE, axe DevTools or your CMS native accessibility evaluator. Upload as closed track, never burned.
Technical Format: SRT, VTT, Timing
The technical rules that must pass any audit:
Technical specification
TIMING: - Block duration: 1-3 sec (max 6s) - Reading speed: 160-180 wpm - Gap between blocks: min 80ms - Sync: max 100ms offset TEXT FORMAT: - Max 2 lines per block - Max 37-42 chars per line - Do not split words across lines - Full punctuation - Uppercase only for essential emphasis VISUAL: - Min contrast 4.5:1 vs background - Translucent dark background recommended - Sans-serif, min 4% of video height - Do not overlap critical on-screen text - Position: bottom third centered
The WebVTT format is more powerful than SRT — it supports positioning, CSS styles and metadata. If your player supports it, prefer it. For step-by-step subtitle creation see the AI subtitles for video guide.
Non-Verbal Descriptions and Speakers
This is the block that fails most audits. The rule is: everything a hearing person perceives without seeing must be readable by a deaf person.
Music
[suspense music], [upbeat music], [sad music]. Never just "[music]" — the emotion matters for the narrative.
Narrative sounds
[door slams], [phone rings], [footsteps approaching]. Only those affecting plot or context.
Tone of voice
(whispering), (shouting), (sarcastically). Mark when tone changes meaning of dialogue.
Speakers
"MARIA: Hello" or per-speaker color. Essential when 2+ voices are off-screen.
Start with accurate transcription: VOCAP gives you word-level timestamps and diarization.
Try VOCAP for freeWCAG 2.2 AA Compliance Checklist
Per-video checklist before publishing
WCAG 1.2.2 - Captions (prerecorded): [ ] Captions available (no raw auto-captions) [ ] Sync < 100ms offset [ ] Speaker identification [ ] Relevant sounds described [ ] Closed (toggleable) WCAG 1.4.3 - Contrast: [ ] Min 4.5:1 ratio vs background [ ] Opaque or semi-transparent dark background WCAG 1.4.4 - Resize text: [ ] Subtitles scale to 200% without breaking WCAG 1.2.5 - Audio description (AAA): [ ] Audio description if visual info is critical [ ] Separate AD track or AD version FORMAT: [ ] SRT or WebVTT, never burned in [ ] Language declared (lang="en") [ ] Speed < 180 wpm
For full video validation (not just subtitles) use the general content accessibility guide.
Mistakes That Put You Outside the Law
AVOID: - YouTube auto-captions without review - Burned-in subtitles (open caption) - Dialogue only, no non-verbal descriptions - No speaker ID with 2+ voices - Single line of 80 characters - No contrast with light background - Timing off by more than 100ms - Language not declared in file - "Accessible" for marketing without SDH
DO: - AI + human review every time - Closed caption SRT or WebVTT - SDH: speakers + relevant sounds - 2 lines max of 37-42 characters - 4.5:1 contrast with background - Word-level sync accuracy - Language declared in VTT - Max 160-180 wpm - Quarterly internal audit + change log
Frequently Asked Questions
What is the difference between subtitles and closed captions (SDH)?
Basic subtitles only transcribe or translate dialogue. Closed captions or SDH also include descriptions of relevant sounds [suspense music], speaker identification and emotions [nervous laugh]. WCAG 1.2.2 requires captions, not basic subtitles.
Does the European Accessibility Act force me to add subtitles?
If your company has more than 10 employees or more than 2M EUR turnover and sells B2C digital services in the EU (e-commerce, banking, streaming, e-learning, transport, ebooks), since 28 June 2025 you must comply with EN 301 549 which references WCAG 2.1 level AA. In Spain fines reach 600,000 EUR (Law 11/2023). Microenterprises in services are exempt, but NOT in digital products.
Is YouTube auto-captioning enough?
No. WCAG 1.2.2 requires captions accurate enough to understand the content. YouTube auto-captions have errors in proper nouns, technical terms, mixed languages and they do not mark sounds or speakers. If audited, they fail. AI transcription + human review + SDH enrichment is the right path.
Which format: SRT, VTT or burned-in subtitles?
SRT for broad compatibility (YouTube, Vimeo, nearly everything). WebVTT for HTML5 web with CSS styles and positioning. Never burn subtitles into the video (open caption) if you want accessibility compliance: users must be able to toggle them. Always closed caption.
How much does it cost to make a video accessible with AI?
With VOCAP, transcribing 1h of video costs from 1 EUR. SDH editing + validation adds 30-45 minutes per 10-minute video. For 20 videos/month with a freelance editor (25-35 EUR/h) it works out at 250-350 EUR/month vs 1,500-2,500 EUR/month for traditional professional captioning. Clear ROI vs the risk of a 600,000 EUR fine.
Accessibility is no longer optional. And transcription is step one.
Generate transcriptions with precise timestamps and diarization to produce SDH subtitles that comply with WCAG 2.2 level AA.
15 free minutes · No credit card · From EUR1/hour
Start for free