A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Timed tracks: Difference between revisions
(Fixing merely the most obvious errors. Fixing the whole list would take all week) |
|||
Line 9: | Line 9: | ||
==== Structure ==== | ==== Structure ==== | ||
* multiple | * multiple rendered titles | ||
* per-segment time in/out cues | * per-segment time in/out cues | ||
* inline time cues for karaoke | * inline time cues for karaoke | ||
* bidi, newlines, ruby | * bidi, newlines, ruby | ||
* | * full typographic features | ||
* positioning | |||
==== Positioning ==== | ==== Positioning ==== | ||
* vertical: top/middle/bottom/ | * vertical: top/middle/bottom/absolute/relative (default bottom) | ||
* horizontal: left/center/right/ | * horizontal: left/center/right/absolute/relative (default center for subtitles only) | ||
* text alignment: horizontal text: left, right, center; vertical text: top, middle, bottom | * text alignment: horizontal text: left, right, center, at tab stop; vertical text: top, middle, bottom, at tab stop | ||
* display modes: replace previous text, scroll previous text up and add to bottom | * display modes: replace previous text, add to previous text, scroll previous text up and add to bottom, paint text on to create single block, clear screen | ||
Multiple voices placed in adjacent places would need to automatically stack so they don't overlap. Multiple segments with overlapping times would need to be stacked so they don't overlap. | |||
==== | (Relative positions could work like background-position in CSS.) | ||
==== Typography ==== | |||
===== Inline ===== | ===== Inline ===== | ||
* text | * horizontal and vertical text directions, with changes of same, must be supported | ||
* some cases use ruby | * some cases use ruby | ||
* | * italics and typeface changes (and, rarely, underlining and small caps) necessary | ||
===== Global ===== | ===== Global ===== | ||
* color | * color (foreground, background mask, drop shadow, outline, other) is needed for readability on different types of video | ||
* | * cannot ship without native ability to create and alter background mask, which is not the largest rectangle that can be drawn around all displayed text | ||
* providing a | * Webfonts desirable | ||
* providing a classname to style each voice could likely be sufficient for authors who want overall formatting control (this would also allow user overrides conveniently) | |||
=== Audio Description === | |||
Cannot work from the assumption that we are providing only a script for a computer to read out loud, a use case never attempted in the real world. Actual audio descriptions are prerecorded voices mixed into main audio. Redefining the spec to pretend that a script for a computer to read aloud constitutes “audio description” stands to be rejected by actual blind users. | |||
=== Dubbing === | |||
If our spec defines audio description (added audio track), it must also define dubbing (another added audio track). | |||
* | === Combinations === | ||
All combinations of features must be possible and no combination must be banned. Common use cases include: | |||
* dubbing with captioning | |||
* dubbing with subtitling | |||
* captioning with subtitling | |||
* audio description with dubbing | |||
* audio description with captioning | |||
* audio description with subtitling | |||
=== HTML === | === HTML === | ||
Line 47: | Line 64: | ||
* format for external subtitles/captions | * format for external subtitles/captions | ||
* format for external audio | * format for external audio description | ||
* some mechanism for text in the page to be used instead of external files, for subtitles/captions or audio | * some mechanism for text in the page to be used instead of external files, for subtitles/captions or audio description | ||
* an API to allow a segment to be dynamically inserted into the rendering on the fly | * an API to allow a segment to be dynamically inserted into the rendering on the fly | ||
Line 55: | Line 72: | ||
* native rendering of subtitles | * native rendering of subtitles | ||
* native rendering of audio | * native rendering of audio description | ||
* native rendering of multiple audio or video tracks, to allow pre-recorded audio | * native rendering of multiple audio or video tracks, to allow pre-recorded audio description to be mixed in and sign language video to be overlaid | ||
* a way to hook into this to manually render timed tracks | * a way to hook into this to manually render timed tracks | ||
Line 63: | Line 80: | ||
<img src="http://docs.google.com/drawings/pub?id=1GR6Pzq0GY2n1sx_ZjDfuICM2LnXxLVxzvyl4kuQy-48&w=640&h=480"> | <img src="http://docs.google.com/drawings/pub?id=1GR6Pzq0GY2n1sx_ZjDfuICM2LnXxLVxzvyl4kuQy-48&w=640&h=480"> | ||
Somebody needs to write an alt text for this image. | |||
=== Declaring timed tracks === | === Declaring timed tracks === | ||
Each timed track is either enabled, in which case it is downloaded, triggers events, and if appropriate is rendered by the user agent; or disabled, in which case it does nothing | Each timed track is either: | ||
*enabled, in which case it is downloaded, triggers events, and if appropriate is rendered by the user agent; or | |||
*disabled, in which case it does nothing | |||
The enabled/disabled state is by default based on user preferences and the kind of timed track as described below, but can be overridden on a per-track basis. | |||
Each timed track has a kind which is one of: | Each timed track has a kind which is one of: | ||
* for visual display (subtitles, captions, translations), enabled based on user preferences, shows in video playback area | * for visual display (subtitles, captions, translations), enabled based on user preferences, shows in video playback area | ||
* for audio playback (audio | * for audio playback (audio description, dubbing), enabled based on user preferences, renders as audio | ||
* for navigation (chapter titles), enabled by default, shows in UA UI | * for navigation (chapter titles), enabled by default, shows in UA UI | ||
* for off-video display (lyrics), disabled by default in this version, not shown by UA | * for off-video display (lyrics), disabled by default in this version, not shown by UA | ||
Line 88: | Line 112: | ||
The media resource can also imply certain timed tracks based on data in the media resource. | The media resource can also imply certain timed tracks based on data in the media resource. | ||
The script can also add | The script can also add “virtual” timed tracks dynamically. | ||
=== CSS extensions for styling captions === | === CSS extensions for styling captions === | ||
[TK] | |||
=== DOM API === | === DOM API === | ||
[TK] | |||
=== Other minor things === | === Other minor things === | ||
Line 108: | Line 129: | ||
== Open issues == | == Open issues == | ||
How do we handle sign language tracks? | * How do we handle sign-language tracks? | ||
* Do we handle multiple alternate audio tracks from this or is that restricted to in-band data? (in the media resource) | |||
Do we handle multiple alternate audio tracks from this or is that restricted to in-band data? (in the media resource) |
Revision as of 18:25, 21 April 2010
This page contains notes for the development of the first version of timed track features in HTML.
See also use cases for timed tracks rendered over video by the UA, use cases for API-level access to timed tracks.
Requirements
Subtitle/Caption/Karaoke File Format
Structure
- multiple rendered titles
- per-segment time in/out cues
- inline time cues for karaoke
- bidi, newlines, ruby
- full typographic features
- positioning
Positioning
- vertical: top/middle/bottom/absolute/relative (default bottom)
- horizontal: left/center/right/absolute/relative (default center for subtitles only)
- text alignment: horizontal text: left, right, center, at tab stop; vertical text: top, middle, bottom, at tab stop
- display modes: replace previous text, add to previous text, scroll previous text up and add to bottom, paint text on to create single block, clear screen
Multiple voices placed in adjacent places would need to automatically stack so they don't overlap. Multiple segments with overlapping times would need to be stacked so they don't overlap.
(Relative positions could work like background-position in CSS.)
Typography
Inline
- horizontal and vertical text directions, with changes of same, must be supported
- some cases use ruby
- italics and typeface changes (and, rarely, underlining and small caps) necessary
Global
- color (foreground, background mask, drop shadow, outline, other) is needed for readability on different types of video
- cannot ship without native ability to create and alter background mask, which is not the largest rectangle that can be drawn around all displayed text
- Webfonts desirable
- providing a classname to style each voice could likely be sufficient for authors who want overall formatting control (this would also allow user overrides conveniently)
Audio Description
Cannot work from the assumption that we are providing only a script for a computer to read out loud, a use case never attempted in the real world. Actual audio descriptions are prerecorded voices mixed into main audio. Redefining the spec to pretend that a script for a computer to read aloud constitutes “audio description” stands to be rejected by actual blind users.
Dubbing
If our spec defines audio description (added audio track), it must also define dubbing (another added audio track).
Combinations
All combinations of features must be possible and no combination must be banned. Common use cases include:
- dubbing with captioning
- dubbing with subtitling
- captioning with subtitling
- audio description with dubbing
- audio description with captioning
- audio description with subtitling
HTML
- an API and UI for exposing what timed tracks exist and selectively enabling/disabling them
- format for external subtitles/captions
- format for external audio description
- some mechanism for text in the page to be used instead of external files, for subtitles/captions or audio description
- an API to allow a segment to be dynamically inserted into the rendering on the fly
- an API for exposing what the currently relevant segments of each timed track are
- a way to hook into this mechanism to advance slides
- native rendering of subtitles
- native rendering of audio description
- native rendering of multiple audio or video tracks, to allow pre-recorded audio description to be mixed in and sign language video to be overlaid
- a way to hook into this to manually render timed tracks
Architecture
<img src="http://docs.google.com/drawings/pub?id=1GR6Pzq0GY2n1sx_ZjDfuICM2LnXxLVxzvyl4kuQy-48&w=640&h=480">
Somebody needs to write an alt text for this image.
Declaring timed tracks
Each timed track is either:
- enabled, in which case it is downloaded, triggers events, and if appropriate is rendered by the user agent; or
- disabled, in which case it does nothing
The enabled/disabled state is by default based on user preferences and the kind of timed track as described below, but can be overridden on a per-track basis.
Each timed track has a kind which is one of:
- for visual display (subtitles, captions, translations), enabled based on user preferences, shows in video playback area
- for audio playback (audio description, dubbing), enabled based on user preferences, renders as audio
- for navigation (chapter titles), enabled by default, shows in UA UI
- for off-video display (lyrics), disabled by default in this version, not shown by UA
- for metadata (slide timings, annotation data for app-rendered annotations), enabled by default, not shown by UA
Tracks that are for visual display or audio playback have additionally a user-facing label and a language.
Tracks that are for visual display have an additional boolean indicating if they include sound effects and speaker identification (intended for the deaf, hard of hearing, or people with sound muted) or not (i.e. translations intended for people with audio enabled but who cannot understand the language, or karaoke lyrics).
Each timed track associated with a media resource, like the media resource itself, can have multiple sources.
Each source for a timed track has:
- URL
- type (if there are multiple sources)
- media
The media resource can also imply certain timed tracks based on data in the media resource.
The script can also add “virtual” timed tracks dynamically.
[TK]
DOM API
[TK]
Other minor things
We need to make sure that media playback is paused until all enabled timed tracks are locally available.
Open issues
- How do we handle sign-language tracks?
- Do we handle multiple alternate audio tracks from this or is that restricted to in-band data? (in the media resource)