A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Video accessibility

From WHATWG Wiki
Jump to navigation Jump to search

These use cases are broken into two categories: Acessibility, and the broader Universality. Due to their nature and similarity to real accessibility use cases, it is sometimes useful to consider possible solutions to accessibility in the context of the more general universality issues as well.

Authoring Use Cases

An author with limited resources publishes a video log

An author records a regular video log with his web cam on his home computer. He has limited technical experience with video editing software and does not have sufficient time, skill or financial resources available to create and publish closed captions on the video.

However, as part of his routine, the author types up a script of what he is going to say before recording, and he wants to make it available for people who cannot or do not want to watch the video. In addition to the video, he generally publishes a short introductory blog entry within which to include the video and a caption below.

The author is not particularly skilled with HTML either, and generally uses the WYSIWYG editor within his CMS to create and publish his blog entires, and copies and pastes the markup given to him by the video hosting service.

The author wants to put the transcript on a separate page rather than within the main blog entry, but still wants to make it easy for anyone to access.

Author embedding third-party media, but attempting to keep the web page accessible

An author is embedding a third-party video into his website. The video does not contain captions or audio descriptions, and the author has no means of modifying the resource itself. But the author wants to provide some form of alternative or auxiliary content for people who cannot see or hear the video.

A commercial TV network publishes news reports on their website

A commercial TV network produces and broadcasts regular news reports, which are broadcast with closed captions. Selected reports are made available via their website following their TV broadcast. As they already have the captions available, it's relatively easy for them to publish them on the web too.

The network wants to provide an easy way for users to turn on captioning if they need or want it, and would prefer to have the UI for that integrated within their own customised video controls.

A video hosting service allows authors to add captions and subtitles to their videos

A video hosting service which hosts videos created by their users allows captions and subtitles to be created and added to the videos in their account. Most videos are publicly available and the service wants to allow viewers to enable captions or subtitles if they need them and, for subtitles, select their language.

The service allows videos to be embedded within 3rd party websites and cannot rely on the publishers of those sites providing customised controls, although they can if they wish. The videos need to have native controls provided by the browser, which allow the user to select the appropriate captions or subtitles for their needs.

End User Use Cases (Accessibility)

A deaf or hearing impaired user viewing a video

A user who is unable to hear due to physical disability chooses to watch a video. The video is an interview between 2 people, discussing a topic that the user is interested in. The video has been provided with associated closed captions and the user would like to have those turned on so that he may understand speech and other significant sounds within the video.

A blind user listening to a video's soundtrack

A blind user cannot see the video, but is still able to hear the audio, chooses to listen to the sound track of a video anyway. The video is a "webisode" (A web episode - like a TV episode, but on the web) of a drama series the user enjoys watching. The video conveys some important information visually in the video, which is not made apparent in the main sound track, such as what the characters are doing and where they are. But the video also contains an alternative or complementary track for audio descriptions, which describes significant parts of the video. The user wishes to enable the audio descriptions to more easily comprehend the content of the video.

There are two varations of this use case:

  1. The computer that the user is using is primarily meant for his own use, and generally not shared with other non-disabled users, and it would be convenient if this was not required to be a manual selection each time.
  2. The computer is shared between the disabled user and other non-disabled people. The accessibility features used by the disabled user are turned when used by others, and it would be inconvenient if the audio descriptions were automatically selected for them too.

A deaf and blind user is unable to see or hear the video

A deaf and blind user cannot see or hear the video, and primarily accesses content using a braille reader. The video is a tutorial and demonstration about how to perform a particular science experiment at home. The user wishes to understand the content of the video so that he can teach the experiment to his nephew. The user needs to read the full text description, which describes and transcribes significant aural and visual content in the video. He does not need the video to be loaded, but instead needs to be provided with easy access to the full text description.

Variation: If a full-text description or transcript isn't provided but captioning is provided, it should be possible to extract the captions and render them as braille.

Related End User Use Cases (Universality)

A user's sound equipment is unavailable, muted, or has low volume

A user who is unable to hear the audio in the video well because his computer lacks audio equipment, such as a sound card, headphones or speakers; or because the volume needs to be kept down low in the user's environment. The video has been provided with either closed captions or same-language subtitles. Similarly to a hearing impaired user, the user would like to have the ability to turn on captions or subtitles so they may more easily understand what is being said.

A user wants to access a transcript

A user chooses not to watch a video, regardless of whether or not they are physically able. The video is of an election campaign speech and has been published with a transcript of the speech alongside. The user wishes to thoroughly review the speech and compare it with those from the other candidates. He accesses the transcript and prints it out, to enable him to read it and make notes or highlight important sections.

A non-disabled user wants to read a full text description

A video of a slide presentation has been published, but the user wishes to read the full text description instead. The description includes both a transcript of what the presenter said and images of the slides. The user chooses to read all or part of the full text description, instead of watching the whole video.

Summary of Problems

  1. A user who is deaf or hearing impaired needs a way to express his preference for captioning so that he is not required to manually enable them each time.
  2. A user without sound equipment on a particular device may also wish to express the preference for captioning to avoid manual selection each time.
  3. A user who is only temporarily unable to hear the audio due to low or muted volume needs a way to manually enable closed captions or same-language subtitle tracks on a per-video basis, if raising the volume is not practical.
  4. A blind user needs a way to express his preference for audio descriptions so that he is not required to manually select the descriptions each time.
  5. A user needs to have a way to manually enable or disable an alternative or complementary audio track containing audio descriptions.
  6. A user, regardless of disability, needs a way to access a video transcript or full text description.
  7. A user needs a way to prevent their browser from automatically downloading video that they don't want or need.


The following requirements are divided into two categories for authoring and end users. Authoring requirements are derived from the the authoring use cases and relate to the needs or content producers, publishers, HTML authors, CMS vendors and video hosting sites. The end user requirements are derived from the end user use cases and relate to the ability of various users to access the published accessibility features.

Authoring Requirements

  • Transcripts:
    • A way to link from within the blog entry to the transcript.
    • Easy for all people to access.
    • It needs to be reasonably clear that the link is to a transcript for the video.
    • Can't require too much, if any, manual editing of HTML markup
  • Captions and Subtitles:
    • Ability to provide custom caption/subtitle controls
    • Ability to use native browser controls for selecting captions/subtitles
  • Audio Descriptions:
    • Ability to provide custom contols to toggle audio descriptions on/off during playback.

End User Requirements

  • Transcripts and Full Text Descriptions:
    • The ability to easily access an associated transcript
    • An association between the video and the link to a transcript or full text description
  • Captions and Subtitles:
    • The ability to manually select a caption or subtitle track
    • The ability to express a preference for turning on captions automatically
  • Audio Descriptions:
    • The ability to express preference for turning on audio descriptions automatically
    • The ability to manually toggle alternative or complementary audio tracks containing audio descriptions.
    • The ability to select an alternative video source which contains audio descriptions

Proposed Solutions

(to be completed)

Existing Technical Solutions for Video

There appears to be consensus among the WCAG Samurai and the current WGAC 2.0 draft that the primary ways of making video with soundtrack accessible are to provide captioning for the deaf and audio description for the blind. (The WCAG 2.0 draft also mentions full-text alternative as an alternative to audio description.)

Presumably, the captioning and audio description need to be “closed” (off-by-default, available on request) as content providers might hesitate presenting captions to those who do hear the soundtrack or audio description to those who already see the video track.

Closed Captioning

Technically this is timed text presented in sync with the video track.

It is assumed to be in the same language as the main soundtrack. Content-wise, it is expected to mention semantically important non-verbal sounds and identify speakers when the video doesn't make it clear who is talking.

In terms of player app decisions, this track shouldn't be presented by default but it should play if the user has opted in (perhaps via a permanent setting) to showing captioning. Also, if the player app knows that audio output has been turned off either in the app or in the OS, it might make sense to turn on captioning in that case as well.

Video Format Support


CMML has been put forward as the timed text format for Ogg. (How to mark as closed captions?)


3GPP Timed Text aka. MPEG-4 part 17 is the timed text format for MP4. (How to mark as closed captions?)


Matroska (MKV) can support almost any subtitle format, and allows for a language tag and a human readable description to be associated with it.

Closed Audio Description

Technically this is a second sound track presented in sync with the main sound track.

It is assumed to be in the same language as the main soundtrack.

In terms of player app decisions, this track shouldn't be presented by default but it should play if the user has opted in (perhaps via a permanent setting) to playing audio descriptions. Also, if the player app knows that a screen reader is in use, it might make sense to use that as a cue of turning on audio descriptions.

The state of the art for authoring audio descriptions is MAGpie, which exports a SMIL file that can be imported into QuickTime or RealPlayer for integration into the original video file.

Video Format Support


How to flag a second sound track (Speex?) as closed audio description?


How to flag a second sound track as closed audio description?

Data Placement in the Web Context

Should the above-mentioned tracks be muxed into the main video file (Pro: all tracks travel together; Con: off-by-default tracks take bandwidth)? Or should they be separate HTTP resources (Pro: bandwidth optimization; Con: Web-specific content assembly from many files may not survive downloading to disk, etc.) Also note that separate files makes it easier for a 3rd party to add tracks -- but these tracks may be commentary rather than captions, so this is both a pro and a con.

An alternative is to have the separate resources on the server (or even: on diverse servers) and the server offers to the user agent the selection of tracks available for the video. When the user agent replies with what it requires (based on a user's preference settings), the server dynamically creates the video/audio with all required tracks and delivers it as one muxed file with a header that explains what is contained. If required, the separate HTTP resources are still accessible separately.

Related Non-Accessibility Features

There are technically similar non-accessibility (i.e. not related to addressing needs arising from a disability) features related to translation.

Translation Subtitles

A site in language A might want to embed a video with the soundtrack in language B but subtitles in language A. For example, a Finnish-language site embedding an English-language video would want to have Finnish subtitles. Unlike captions, these subtitles should be on by default and being able to suppress the subtitles is considered an additional nice-to-have feature.

There are also same-language subtitles (e.g. French subtitles with French-language soundtrack) for language learners. Unlike captions, same-language subtitles don't inform the reader about non-verbal sounds or identify speakers.

Subtitles need different track metadata so that they can be displayed by default. (Due to concerns about the reliability of subtitling technology, many content providers probably opt to burn the subtitles into the video track as part of the image data, even though this disturbs video compression.)

Alternative Dubbed Sound Tracks

Due to bandwidth concerns, Web content providers will probably opt to provide separate video files for dubbed languages.