Home Pricing Blog Contact

Accessible Subtitles WCAG 2.2 and EAA: How to Comply Without Going Crazy

Since 28 June 2025, the European Accessibility Act (EAA) is mandatory across the EU. If your company sells digital services — e-commerce, banking, e-learning, streaming, transport, ebooks — your videos must carry accessible subtitles. It is not a recommendation: fines reach 600,000 EUR in Spain (Law 11/2023) and similar penalties exist across the rest of the EU.

But "accessible subtitles" is not the same as "subtitles". WCAG 2.2 requires captions with speaker identification, sound descriptions and minimum contrast. YouTube auto-captioning does not comply. This guide walks you through the full workflow with AI + human review.

€600k
Max EAA fine Spain
87M
People with disabilities in EU
4.5:1
WCAG AA min contrast

Three regulations overlap. Know each one — auditors check all three:

Who is obliged: companies with more than 10 employees or more than 2M EUR annual turnover selling e-commerce, banking, streaming, e-learning, ebooks, transport tickets or telecoms in the EU. Microenterprises are exempt in services but NOT in products.

Subtitles vs Closed Captions (SDH)

This is the most common confusion and the one that fails audits:

BASIC SUBTITLES:
- Dialogue or translation only
- No speaker identification
- No sound descriptions
- DO NOT comply with WCAG 1.2.2
- Designed for foreign language
CLOSED CAPTIONS / SDH:
- Dialogue + relevant sounds
- Speaker identified (name or color)
- [music], [laughter], [phone]
- Comply with WCAG 1.2.2 level A
- For deaf and hard of hearing
- Closed caption (toggleable)

To comply with WCAG level AA you always need SDH closed captions, never plain subtitles. The cost difference is minimal if you use AI auto-subtitles and enrich them afterwards.

AI Workflow Step by Step

1. Transcribe with timestamps (5 min): Upload the video to VOCAP and get word-level timestamps. This is the base — no precise timing, no decent subtitle. See how to transcribe with timestamps.

2. Segment into blocks (10 min): Each block lasts 1-3 seconds and has max 2 lines of 37-42 characters. Blocks must respect natural pauses and syntactic units.

3. Identify speakers (5 min): If there are several speakers, name prefix or color change. Automatic diarization gets you 80% there.

4. Add non-verbal descriptions (5 min): Review the audio and mark sounds relevant to the narrative: [suspense music], [door slams], [nervous laugh], [silence]. Skip irrelevant ambient sounds.

5. Export SRT or WebVTT (1 min): SRT for YouTube/Vimeo; WebVTT for custom HTML5 players. Both are plain editable text.

6. Validate and publish (5 min): Validate with WAVE, axe DevTools or your CMS native accessibility evaluator. Upload as closed track, never burned.

Pro tip: A 10-minute video = 30 minutes of full workflow (transcription + editing + SDH + validation). Without AI it would be 2 hours. With AI + human review, a single editor can ship 20-40 accessible videos per month.

Technical Format: SRT, VTT, Timing

The technical rules that must pass any audit:

Technical specification

TIMING:
- Block duration: 1-3 sec (max 6s)
- Reading speed: 160-180 wpm
- Gap between blocks: min 80ms
- Sync: max 100ms offset

TEXT FORMAT:
- Max 2 lines per block
- Max 37-42 chars per line
- Do not split words across lines
- Full punctuation
- Uppercase only for essential emphasis

VISUAL:
- Min contrast 4.5:1 vs background
- Translucent dark background recommended
- Sans-serif, min 4% of video height
- Do not overlap critical on-screen text
- Position: bottom third centered

The WebVTT format is more powerful than SRT — it supports positioning, CSS styles and metadata. If your player supports it, prefer it. For step-by-step subtitle creation see the AI subtitles for video guide.

Non-Verbal Descriptions and Speakers

This is the block that fails most audits. The rule is: everything a hearing person perceives without seeing must be readable by a deaf person.

Music

[suspense music], [upbeat music], [sad music]. Never just "[music]" — the emotion matters for the narrative.

Narrative sounds

[door slams], [phone rings], [footsteps approaching]. Only those affecting plot or context.

Tone of voice

(whispering), (shouting), (sarcastically). Mark when tone changes meaning of dialogue.

Speakers

"MARIA: Hello" or per-speaker color. Essential when 2+ voices are off-screen.

Start with accurate transcription: VOCAP gives you word-level timestamps and diarization.

Try VOCAP for free

WCAG 2.2 AA Compliance Checklist

Per-video checklist before publishing

WCAG 1.2.2 - Captions (prerecorded):
[ ] Captions available (no raw auto-captions)
[ ] Sync < 100ms offset
[ ] Speaker identification
[ ] Relevant sounds described
[ ] Closed (toggleable)

WCAG 1.4.3 - Contrast:
[ ] Min 4.5:1 ratio vs background
[ ] Opaque or semi-transparent dark background

WCAG 1.4.4 - Resize text:
[ ] Subtitles scale to 200% without breaking

WCAG 1.2.5 - Audio description (AAA):
[ ] Audio description if visual info is critical
[ ] Separate AD track or AD version

FORMAT:
[ ] SRT or WebVTT, never burned in
[ ] Language declared (lang="en")
[ ] Speed < 180 wpm

For full video validation (not just subtitles) use the general content accessibility guide.

Mistakes That Put You Outside the Law

AVOID:
- YouTube auto-captions without review
- Burned-in subtitles (open caption)
- Dialogue only, no non-verbal descriptions
- No speaker ID with 2+ voices
- Single line of 80 characters
- No contrast with light background
- Timing off by more than 100ms
- Language not declared in file
- "Accessible" for marketing without SDH
DO:
- AI + human review every time
- Closed caption SRT or WebVTT
- SDH: speakers + relevant sounds
- 2 lines max of 37-42 characters
- 4.5:1 contrast with background
- Word-level sync accuracy
- Language declared in VTT
- Max 160-180 wpm
- Quarterly internal audit + change log

Frequently Asked Questions

What is the difference between subtitles and closed captions (SDH)?

Basic subtitles only transcribe or translate dialogue. Closed captions or SDH also include descriptions of relevant sounds [suspense music], speaker identification and emotions [nervous laugh]. WCAG 1.2.2 requires captions, not basic subtitles.

Does the European Accessibility Act force me to add subtitles?

If your company has more than 10 employees or more than 2M EUR turnover and sells B2C digital services in the EU (e-commerce, banking, streaming, e-learning, transport, ebooks), since 28 June 2025 you must comply with EN 301 549 which references WCAG 2.1 level AA. In Spain fines reach 600,000 EUR (Law 11/2023). Microenterprises in services are exempt, but NOT in digital products.

Is YouTube auto-captioning enough?

No. WCAG 1.2.2 requires captions accurate enough to understand the content. YouTube auto-captions have errors in proper nouns, technical terms, mixed languages and they do not mark sounds or speakers. If audited, they fail. AI transcription + human review + SDH enrichment is the right path.

Which format: SRT, VTT or burned-in subtitles?

SRT for broad compatibility (YouTube, Vimeo, nearly everything). WebVTT for HTML5 web with CSS styles and positioning. Never burn subtitles into the video (open caption) if you want accessibility compliance: users must be able to toggle them. Always closed caption.

How much does it cost to make a video accessible with AI?

With VOCAP, transcribing 1h of video costs from 1 EUR. SDH editing + validation adds 30-45 minutes per 10-minute video. For 20 videos/month with a freelance editor (25-35 EUR/h) it works out at 250-350 EUR/month vs 1,500-2,500 EUR/month for traditional professional captioning. Clear ROI vs the risk of a 600,000 EUR fine.

Accessibility is no longer optional. And transcription is step one.

Generate transcriptions with precise timestamps and diarization to produce SDH subtitles that comply with WCAG 2.2 level AA.

15 free minutes · No credit card · From EUR1/hour

Start for free
Try VOCAP free 15 min transcription
Start Free →