A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Video Overlay: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
 
(17 intermediate revisions by one other user not shown)
Line 1: Line 1:
<overlay> is a proposal to display and style subtitles/captions for <video> in a uniform way, regardless of whether they are in-band or from an external resource. It also doubles as a container to overlay arbitrary (HTML) content on a <video>, enabling advanced scripted overlays such as karaoke animations or visual annotations.
There are several types of content which should be displayed overlayed on &lt;video> elements, including subtitles/captions from various sources, scripted controls and more advanced scripted content such as karaoke or timed annotations.
 
== Background ==


On top of what proposals such as &lt;itext> already support (with another syntax), this proposal is mostly concerned with styling, specifically consistent CSS styling of captions/subtitles and a hook into fullscreen rendering which is necessary for providing the same experience (in terms of controls, subtitles, etc) regardless of display mode.


== Use Case Description ==
== Use Case Description ==
''Complete description of the use case.''
There are several distinct use cases addressed by this proposal:
* Linking &lt;video> with external captions/subtitles for native fetching/decoding/syncing by the UA.
* Styling captions/subtitles with CSS, regardless of their source.
* Allowing scripts to operate on captions/subtitles in a uniform manner, regardless of their source.
Possible sources of captions/subtitles include in-band (e.g. embedded in an MPEG-4 or Ogg stream), external (e.g. SRT or DXFP) or scripted (e.g. extracted from an on-page transcript) captions/subtitles.


=== Current Limitations ===
=== Current Limitations ===
HTML5 currently lacks convenient markup and interfaces to handle at least three things:
HTML5 currently lacks convenient markup and/or interfaces to handle at least these things:
* Syncing and displaying external subtitles/captions with <video>
* Syncing and styling external subtitles/captions with &lt;video>
* Styling in-band subtitles/captions from media resources
* Styling in-band subtitles/captions from media resources
*  
* Rendering scripted controls on top of &lt;video> and positioning them to bottom.
* Placing
* Callbacks at specific times for scripted subtitles/captions (previously possible with "cue ranges")
* Allowing any overlay (controls/captions/etc) to be retained in fullscreen mode.


=== Current Usage and Workarounds ===
=== Current Usage and Workarounds ===
''Some evidence that this feature is desperately needed on the webYou may provide a separate examples page for listing these.''
Currently no browser supports rendering in-band subtitles and a workaround would involve extracting the captions on the server-side and sending them in another format. Fullscreen support is still immature, but there is no possible workaround for having scripted captions or controls appear in fullscreen display.
 
==== Scripted Captions ====
In Silvia's [http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html &lt;itext> demo] external SRT subtitles are fetched with XHR, parsed with JavaScript and finally synced in the timeupdate event. Using the timeupdate event is sub-optimal because it isn't guaranteed to fire any more often than every 250 ms, which isn't enough for fast-paced dialog.
 
==== Scripted Controls ====
In order to overlay scripted controls on top of &lt;video>, a wrapping &lt;div> and some CSS is needed:
&lt;div style="position:relative;width:400px;height:300px">
  &lt;video src="video.ogv" style="width:100%;height:100%">&lt;/video>
  &lt;div class="controls" style="position:absolute;bottom:0;left:0;right:0">
    &lt;!-- actual controls here -->
  &lt;/div>
  &lt;/div>
This isn't terrible, but requires the size of the video to be known or be fixed to a certain size as above.


=== Benefits ===
=== Benefits ===
 
&lt;overlay> provides a single container for styling for all kinds of overlay content. The alternative would be to have one markup for in-band and external captions/subtitles (e.g. &lt;itext>) and another solution for scripted captions/subtitles/controls/annotations, even though the problem solved is mostly exactly the same.
''Explanation of how and why new markup would be useful.''


=== Requests for this Feature ===
=== Requests for this Feature ===
* &lt;overlay> [http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0098.html suggested] by Philip Jägenstedt (Opera)


* <cite>[http://example.com Source]</cite> <blockquote><p>I would like this feature ...</p></blockquote>
:''TODO: Find the many mails related to some of the features addressed by &lt;overlay>''


== Proposed Solutions ==
== Proposed Solutions ==


=== My Solution ===
=== &lt;overlay> ===
:''Brief description of the solution and of how it address the problem at hand.''
The &lt;overlay> element is used as a child of &lt;video>. It can optionally refer to an external source, which should be in a format supported by the UA. Example:
&lt;video src="video.ogv">
  &lt;overlay src="captions.srt">
&lt;/video>
Possibly, one could allow &lt;overlay> to have &lt;source> element children, similar to &lt;video>. The purpose would be group resources which are mutually exclusive, e.g. subtitles in different languages:
&lt;video src="video.ogv">
  &lt;overlay>
    &lt;source src="captions-english.srt" lang="en">&lt;/source>
    &lt;source src="captions-simplified-chinese.srt" lang="zh-Hans">&lt;/source>
  &lt;/overlay>
&lt;/video>
If necessary, one could also &lt;source> to provide the same resource in multiple formats for fallback purposes:
&lt;video src="video.ogv">
  &lt;overlay>
    &lt;source src="captions.srt" type="text/x-srt">&lt;/source>
    &lt;source src="captions.xml" type="application/ttaf+xml">&lt;/source>
  &lt;/overlay>
&lt;/video>
 
When &lt;overlay> does not point to an external resource, its content should instead be displayed. By updating the content with scripts, the possibilities are many:
&lt;video src="video.ogv">
  &lt;overlay>&lt;!-- content goes here -->&lt;/overlay>
&lt;/video>
&lt;script>
  var v = document.querySelector("video");
  var ol = v.querySelector("overlay");
  v.ontimeupdate = function() {
    ol.textContent = someInterestingText();
  }
&lt;/script>


==== Processing Model ====
==== Processing Model ====
:''Explanation of the changes introduced by this solution. It explains how the document is processed, and how errors are handled. This should be very clear, including things such as event timing if the solution involves events, how to create graphs representing the data in the case of semantic proposals, etc.''
===== Resource Selection =====
If the &lt;overlay> element is allowed to reference external resource using &lt;source>, a resource selection algorithm must be defined. Other proposals have included variations on the theme of letting the UA to automatically select the language and type of timed text most suitable for the user (e.g. French subtitles for French-reading users and captions for hard-of-hearing users). Unlike [http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-algorithm resource selection for media elements], this would require the &lt;source> candidates to be evaluated in another order than strict document order. There are certain complications to this, which may or may not be justified:
* Requiring UAs to keep (or act as if they keep) a priority-sorted list of the candidates and keep that in sync with DOM modifications, so that they can fall back to the next best if a resource is unavailable or undecodable.
* Relying on the Accept-Language setting which is often wrong or adding new language preferences to the UA which are likely to fail in the same ways Accept-Language has.
This proposal does not include a solution, implementor experience on this is probably the best way of finding out what makes sense and not.
 
===== Styling =====
It needs to be defined exactly what &lt;overlay> is to &lt;video>. The simplest may be acting as if &lt;video> were the containing block for &lt;overlay> elements, so that e.g. applying <tt>position:absolute;bottom:0;left:0;right:0</tt> on &lt;overlay> causes it to be position at the bottom of the parent &lt;video>.
 
:''Should &lt;overlay> be a block or an inline element?''


==== Limitations ====  
==== Limitations ====  
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''
In fullscreen mode, the size of the (virtual) containing box for &lt;overlay> will change, which may require authors to write more complex CSS to handle gracefully.
 
Certain subtitle/caption formats provide their own styling which would interfere with CSS. For complex formats, ignoring CSS may be the only option.


==== Implementation ====  
==== Implementation ====  
:''Description of how and why browser vendors would take advantage of this feature.''
Browser vendors get the following benefits:
* Since overlay content is explicitly marked up, it's easier to ensure that scripted overlays do not interfere with the native controls simply by placing native controls on top of the overlays.
 
==== Adoption ====
Authors get the following benefits:
* With HD video becoming more and more common, most UAs will likely provide a fullscreen modes for &lt;video>. This proposal gives authors a way to provide scripted enhancements even in fullscreen mode.
* Simple, stylable subtitles/captions without using scripts.
* Less work to position scripted controls on top of &lt;video>
 
=== DOM API ===
HTML5 used to have a cue ranges feature with this API:
 
void addCueRange(in DOMString className, in DOMString id,
                  in float start, in float end, in boolean pauseOnExit,
                  in CueRangeCallback enterCallback, in CueRangeCallback exitCallback);
void removeCueRanges(in DOMString className);
 
This was removed from the spec partly because it's script-only, i.e. it isn't used for notifications of the progress of external captions/subtitles/etc (see [http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/021963.html Remove addCueRange/removeCueRanges]). A DOM interface similar to this is now needed, but with some modification. Note that <tt>pauseOnExit</tt> is not needed as playing a particular time range can be done using [http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/ Media Fragments].
 
Many variations on the cue range theme is possible, here is one:
 
interface CueRangeEvent : Event {
  readonly attribute DOMString className;
  readonly attribute double startTime;
  readonly attribute double endTime;
  readonly attribute DOMString text;
};
 
void addCueRange(in DOMString className, in float start, in float end, in DOMString text);
void removeCueRanges(in DOMString className);
 
The events 'cuerangeenter' and 'cuerangeleave' would be fired as appropriate. The idea is that the same event would be fired for in-band, external and scripted captions/subtitles. The event target and/or className could be used to distinguish between them.
 
:''TODO: Better interface/method/event names and code examples''
 
=== Baseline Formats ===
For &lt;overlay src="captions"> or any other solution referencing external captions/subtitles to be interoperable, a baseline format is required.
 
==== SRT ====
SRT is probably the most widely supported and also one of the simplest subtitle formats. However, all applications seems to parse it differently. In order to use it on the web we need an SRT parsing algorithm spec. Writing such a spec would entail gather lots and lots of SRT files from the wild and trying to define the algorithm such that it is compatible with as many as possible. Many SRT files has some limited HTML markup like &lt;b> and &lt;i>. A decision would have to be made whether or not to include that in the parsing algorithm.


==== Adoption ====  
==== W3C Timed Text ====
:''Reasons why page authors would use this solution.''
The [http://www.w3.org/AudioVideo/TT/ W3C Timed Text (TT) Working Group] is producing [http://www.w3.org/TR/ttaf1-dfxp/ Timed Text (TT) Authoring Format 1.0 – Distribution Format Exchange Profile (DFXP)]. It is "a content type that represents timed text media for the purpose of interchange among authoring systems". The features are essentially the union of all features of all other timed text formats, which unsurprisingly makes it quite complex. While it can be used as an distribution format, using it as a baseline format requires that browser vendors show interest in implementing it.


== References ==
== References ==
Line 50: Line 146:
* [http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/ The model of a time-linear media resource for HTML5]
* [http://blog.gingertech.net/2009/11/23/model-of-a-time-linear-media-resource/ The model of a time-linear media resource for HTML5]
* [http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/ Manifests for exposing the structure of a Composite Media Resource]
* [http://blog.gingertech.net/2009/11/25/manifests-exposing-structure-of-a-composite-media-resource/ Manifests for exposing the structure of a Composite Media Resource]
Silvia Pfeiffer's <itext> proposals:
Silvia Pfeiffer's &lt;itext> proposals:
* [https://wiki.mozilla.org/Accessibility/HTML5_captions HTML5 captions] ([https://wiki.mozilla.org/Accessibility/Experiment1_feedback feedback])
* [https://wiki.mozilla.org/Accessibility/HTML5_captions HTML5 captions] ([https://wiki.mozilla.org/Accessibility/Experiment1_feedback feedback])
* [https://wiki.mozilla.org/Accessibility/HTML5_captions_v2 HTML5 captions v2]
* [https://wiki.mozilla.org/Accessibility/HTML5_captions_v2 HTML5 captions v2] ([https://wiki.mozilla.org/Accessibility/Experiment2_feedback feedback])
Mailing lists:
Mailing lists:
* [http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-May/014887.html re-thinking "cue ranges"] from David Singer (Apple)
* [http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-May/014887.html re-thinking "cue ranges"] from David Singer (Apple)
* [http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/021963.html Remove addCueRange/removeCueRanges] from Philip Jägenstedt (Opera)
* [http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0089.html timing model of the media resource in HTML5] from Silvia Pfeiffer
* [http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0089.html timing model of the media resource in HTML5] from Silvia Pfeiffer
* [http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0098.html initial <overlay> suggestion] from Philip Jägenstedt (Opera) with feedback from Eric Carlson (Apple)
* [http://lists.w3.org/Archives/Public/public-html-a11y/2009Nov/0098.html initial &lt;overlay> suggestion] from Philip Jägenstedt (Opera) with feedback from Eric Carlson (Apple)


[[Category:Feature Request|empty Template]]
[[Category:Proposals]]

Latest revision as of 12:13, 26 January 2011

There are several types of content which should be displayed overlayed on <video> elements, including subtitles/captions from various sources, scripted controls and more advanced scripted content such as karaoke or timed annotations.

On top of what proposals such as <itext> already support (with another syntax), this proposal is mostly concerned with styling, specifically consistent CSS styling of captions/subtitles and a hook into fullscreen rendering which is necessary for providing the same experience (in terms of controls, subtitles, etc) regardless of display mode.

Use Case Description

There are several distinct use cases addressed by this proposal:

  • Linking <video> with external captions/subtitles for native fetching/decoding/syncing by the UA.
  • Styling captions/subtitles with CSS, regardless of their source.
  • Allowing scripts to operate on captions/subtitles in a uniform manner, regardless of their source.

Possible sources of captions/subtitles include in-band (e.g. embedded in an MPEG-4 or Ogg stream), external (e.g. SRT or DXFP) or scripted (e.g. extracted from an on-page transcript) captions/subtitles.

Current Limitations

HTML5 currently lacks convenient markup and/or interfaces to handle at least these things:

  • Syncing and styling external subtitles/captions with <video>
  • Styling in-band subtitles/captions from media resources
  • Rendering scripted controls on top of <video> and positioning them to bottom.
  • Callbacks at specific times for scripted subtitles/captions (previously possible with "cue ranges")
  • Allowing any overlay (controls/captions/etc) to be retained in fullscreen mode.

Current Usage and Workarounds

Currently no browser supports rendering in-band subtitles and a workaround would involve extracting the captions on the server-side and sending them in another format. Fullscreen support is still immature, but there is no possible workaround for having scripted captions or controls appear in fullscreen display.

Scripted Captions

In Silvia's <itext> demo external SRT subtitles are fetched with XHR, parsed with JavaScript and finally synced in the timeupdate event. Using the timeupdate event is sub-optimal because it isn't guaranteed to fire any more often than every 250 ms, which isn't enough for fast-paced dialog.

Scripted Controls

In order to overlay scripted controls on top of <video>, a wrapping <div> and some CSS is needed:

<div style="position:relative;width:400px;height:300px">
  <video src="video.ogv" style="width:100%;height:100%"></video>
  <div class="controls" style="position:absolute;bottom:0;left:0;right:0">
    <!-- actual controls here -->
  </div>
</div>

This isn't terrible, but requires the size of the video to be known or be fixed to a certain size as above.

Benefits

<overlay> provides a single container for styling for all kinds of overlay content. The alternative would be to have one markup for in-band and external captions/subtitles (e.g. <itext>) and another solution for scripted captions/subtitles/controls/annotations, even though the problem solved is mostly exactly the same.

Requests for this Feature

  • <overlay> suggested by Philip Jägenstedt (Opera)
TODO: Find the many mails related to some of the features addressed by <overlay>

Proposed Solutions

<overlay>

The <overlay> element is used as a child of <video>. It can optionally refer to an external source, which should be in a format supported by the UA. Example:

<video src="video.ogv">
  <overlay src="captions.srt">
</video>

Possibly, one could allow <overlay> to have <source> element children, similar to <video>. The purpose would be group resources which are mutually exclusive, e.g. subtitles in different languages:

<video src="video.ogv">
  <overlay>
    <source src="captions-english.srt" lang="en"></source>
    <source src="captions-simplified-chinese.srt" lang="zh-Hans"></source>
  </overlay>
</video>

If necessary, one could also <source> to provide the same resource in multiple formats for fallback purposes:

<video src="video.ogv">
  <overlay>
    <source src="captions.srt" type="text/x-srt"></source>
    <source src="captions.xml" type="application/ttaf+xml"></source>
  </overlay>
</video>

When <overlay> does not point to an external resource, its content should instead be displayed. By updating the content with scripts, the possibilities are many:

<video src="video.ogv">
  <overlay><!-- content goes here --></overlay>
</video>
<script>
  var v = document.querySelector("video");
  var ol = v.querySelector("overlay");
  v.ontimeupdate = function() {
    ol.textContent = someInterestingText();
  }
</script>

Processing Model

Resource Selection

If the <overlay> element is allowed to reference external resource using <source>, a resource selection algorithm must be defined. Other proposals have included variations on the theme of letting the UA to automatically select the language and type of timed text most suitable for the user (e.g. French subtitles for French-reading users and captions for hard-of-hearing users). Unlike resource selection for media elements, this would require the <source> candidates to be evaluated in another order than strict document order. There are certain complications to this, which may or may not be justified:

  • Requiring UAs to keep (or act as if they keep) a priority-sorted list of the candidates and keep that in sync with DOM modifications, so that they can fall back to the next best if a resource is unavailable or undecodable.
  • Relying on the Accept-Language setting which is often wrong or adding new language preferences to the UA which are likely to fail in the same ways Accept-Language has.

This proposal does not include a solution, implementor experience on this is probably the best way of finding out what makes sense and not.

Styling

It needs to be defined exactly what <overlay> is to <video>. The simplest may be acting as if <video> were the containing block for <overlay> elements, so that e.g. applying position:absolute;bottom:0;left:0;right:0 on <overlay> causes it to be position at the bottom of the parent <video>.

Should <overlay> be a block or an inline element?

Limitations

In fullscreen mode, the size of the (virtual) containing box for <overlay> will change, which may require authors to write more complex CSS to handle gracefully.

Certain subtitle/caption formats provide their own styling which would interfere with CSS. For complex formats, ignoring CSS may be the only option.

Implementation

Browser vendors get the following benefits:

  • Since overlay content is explicitly marked up, it's easier to ensure that scripted overlays do not interfere with the native controls simply by placing native controls on top of the overlays.

Adoption

Authors get the following benefits:

  • With HD video becoming more and more common, most UAs will likely provide a fullscreen modes for <video>. This proposal gives authors a way to provide scripted enhancements even in fullscreen mode.
  • Simple, stylable subtitles/captions without using scripts.
  • Less work to position scripted controls on top of <video>

DOM API

HTML5 used to have a cue ranges feature with this API:

void addCueRange(in DOMString className, in DOMString id,
                 in float start, in float end, in boolean pauseOnExit,
                 in CueRangeCallback enterCallback, in CueRangeCallback exitCallback);
void removeCueRanges(in DOMString className);

This was removed from the spec partly because it's script-only, i.e. it isn't used for notifications of the progress of external captions/subtitles/etc (see Remove addCueRange/removeCueRanges). A DOM interface similar to this is now needed, but with some modification. Note that pauseOnExit is not needed as playing a particular time range can be done using Media Fragments.

Many variations on the cue range theme is possible, here is one:

interface CueRangeEvent : Event {
  readonly attribute DOMString className;
  readonly attribute double startTime;
  readonly attribute double endTime;
  readonly attribute DOMString text;
};
void addCueRange(in DOMString className, in float start, in float end, in DOMString text);
void removeCueRanges(in DOMString className);

The events 'cuerangeenter' and 'cuerangeleave' would be fired as appropriate. The idea is that the same event would be fired for in-band, external and scripted captions/subtitles. The event target and/or className could be used to distinguish between them.

TODO: Better interface/method/event names and code examples

Baseline Formats

For <overlay src="captions"> or any other solution referencing external captions/subtitles to be interoperable, a baseline format is required.

SRT

SRT is probably the most widely supported and also one of the simplest subtitle formats. However, all applications seems to parse it differently. In order to use it on the web we need an SRT parsing algorithm spec. Writing such a spec would entail gather lots and lots of SRT files from the wild and trying to define the algorithm such that it is compatible with as many as possible. Many SRT files has some limited HTML markup like <b> and <i>. A decision would have to be made whether or not to include that in the parsing algorithm.

W3C Timed Text

The W3C Timed Text (TT) Working Group is producing Timed Text (TT) Authoring Format 1.0 – Distribution Format Exchange Profile (DFXP). It is "a content type that represents timed text media for the purpose of interchange among authoring systems". The features are essentially the union of all features of all other timed text formats, which unsurprisingly makes it quite complex. While it can be used as an distribution format, using it as a baseline format requires that browser vendors show interest in implementing it.

References

Silvia Pfeiffer's blog posts:

Silvia Pfeiffer's <itext> proposals:

Mailing lists: