A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Video captioning

From WHATWG Wiki
Revision as of 20:45, 24 February 2009 by Millam (talk | contribs) (New page: The purpose of this page is to lay out and hammer down a specification for implementing captioning, subtitling, and timed text support for media HTML elements: both Video and Audio. This i...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The purpose of this page is to lay out and hammer down a specification for implementing captioning, subtitling, and timed text support for media HTML elements: both Video and Audio. This is a work in progress, and is being maintained by User:Millam


  • Timed Text: Text that occurs at specified times during playback of the media element.
  • Captions: A transcription of all spoken words and relevant sounds, targeted at deaf and hearing impaired audiences. Also known as "Subtitles for the Deaf and Hard of Hearing" (SDH or SDHH)
  • Subtitles: A transcription of spoken words. Frequently translated. May include translations of on-screen text (e.g: "Hello, my name is Greg" tag -> "Hola, me llamo Greg")
  • Closed Captioning: When the captions are separate from the media stream, and can be toggled on and off.
  • Open Captioning: When the captions are "burned" into the video, and thus can't be toggled on and off.
  • External captions: When the captions are kept in a file separate from the media file. (e.g: .srt, .ass)
  • Included captions: When the captions are included in the video file, whether as a separate text track or as 'prerendered' captions. (Described in Caption Formats, below)
  • Un-styled captions: Captions that have no styling information. Just plain text.
  • Styled captions: Captions that have one or more bits of styling: Color, Positioning (relative to the video), 'Karaoke' color changing, animation effects, etc.
  • Rollup captions: Captions that 'roll up'. Typically, there's a window of 3 lines, and as captions are added to it, they are added to the bottom line, pushing older text up, or out of frame. Most typically used with news stations, soap operas, and broadcast events that are captioned live.

From the UserAgent perspective, Timed Text, Captions and Subtitles are functionally identical, the only difference is their described content. For the purpose of this document, I will be using "captions" and "captioning" to refer to all of the above.

Caption Formats

One of the largest barriers to adding captioning to standards in the digital age is the sheer number of formats available. The below list is a small sampling.

  • 608/708, or "Line 21" captions: Designed for Television, caption data is encoded in scanline 21. Text here is often broken up, and drawn by command.
  • "Prerendered" captions. Designed for DVDs, these are actually transparent video frames that are drawn on top of the video.
  • Subrip (.sub or .srt). Plain text files, where each separate caption consists of three or more lines: The number of the caption (optional), the start time (and optional end time), and the caption text, unstyled. Very easy to read and write by hand.
  • SSAV4 (.ssav4 or .ass). A very flexible, styled format. Very verbose, authors usually prefer to use caption editors to create and edit these files.


This document aims to describe a method of adding captioning support to all HTML5 user agents for all media elements. (Currently: audio and video elements). This will require adding subtitling support to the user agent instead of to a media plugin, and will integrate it with the JavaScript engines identically across each UA.


This section describes how the page author and the page viewer utilize captions.


The author writes an HTML5 page includes a media element (<video>...</video> or <audio>...</audio>) and wishes to add captioning using external captioning in one or more languages. For a single language in a standalone media tag, the author will include a 'subs="..."' tag to define the location of a single captioning file. For multiple languages, the author will use a new element, <track>, to define a caption track.


The user then visits the author's page. The user agent loads the