A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Video Metrics

From WHATWG Wiki
Jump to navigation Jump to search

Related HTML WG bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12399


Requirements

For several reasons, we need to expose the performance of media elements to JavaScript.

One concrete use case is that content publishers want to understand the quality of their content as being played back by their users and how much a user is actually playing back. For example, if a video always goes into buffering mode after 1 min for all users - maybe there is a problem in the encoding, or the video is too big for the typical bandwidth/CPU combination. Also, publishers want to track the metrics of how much of their video and audio files is actually being watched.

A further use case is HTTP adaptive streaming, where an author wants to manually implement an algorithm for switching between different resources of different bandwidth or screen size. For example, if the user goes full screen and the user's machine and bandwidth allow for it, the author might want to switch to a higher resolution video.

Note that whenever bitrates are reported it needs to be clear how the bitrate is calculated. For example if it is an average then average over what time interval. If it is some kind of peak bitrate, then what's the window size over which the peak was calculated or other definition. A raw bitrate alone is not very meaningful.

Collection of Proposals/Implementations

Mozilla have implemented the following statistics into Firefox:

  • mozParsedFrames - number of frames that have been demuxed and extracted out of the media.
  • mozDecodedFrames - number of frames that have been decoded - converted into YCbCr.
  • mozPresentedFrames - number of frames that have been presented to the rendering pipeline for rendering - were "set as the current image".
  • mozPaintedFrames - number of frames which were presented to the rendering pipeline and ended up being painted on the screen. Note that if the video is not on screen (e.g. in another tab or scrolled off screen), this counter will not increase.
  • mozPaintDelay - the time delay between presenting the last frame and it being painted on screen (approximately).


Webkit have implemented these:

  • webkitAudioBytesDecoded - number of audio bytes that have been decoded.
  • webkitVideoBytesDecoded - number of video bytes that have been decoded.
  • webkitDecodedFrames - number of frames that have been demuxed and extracted out of the media.
  • webkitDroppedFrames - number of frames that were decoded but not displayed due to performance issues.


JW Player (using actionscript) broadcasts the following QOS metrics for both RTMP dynamic and HTTP adaptive:

  • bandwidth: server-client data rate, in kilobytespersecond.
  • latency: client-server-client roundtrip time, in milliseconds.
  • frameDropRate: number of frames not presented to the viewer, in frames per second.
  • screenWidth / screenHeight: dimensions of the video viewport, in pixels. Changes e.g. when the viewer jumps fullscreen.
  • qualityLevel: index of the currently playing quality level (see below).

Bandwidth and droprate are running metrics (averaged out). Latency and dimensions are sampled (taken once). For RTMP dynamic, the metrics are broadcast at a settable interval (default 2s). For HTTP adaptive, metrics are calculated and broadcast upon completion of a fragment load.

Separately, JW Player broadcasts a SWITCH event at the painting of a frame that has a different qualityLevel than the preceding frame(s). While the metrics.qualityLevel tells developers the qualityLevel of the currently downloading buffer/fragment, the SWITCH event tells developers the exact point in time where the viewer experiences a jump in video quality. This event also helps developers correlate the value of frameDropRate to the currently playing qualityLevel (as opposed to the currently loading one). Depending upon buffer, fragment and GOP size, the time delta between a change in metrics.qualityLevel and SWITCH.qualityLevel may vary from a few seconds to a few minutes.

Finally, JW Player accepts and exposes per video an array with quality levels (the distinct streams of a video between which the player can switch). For each quality level, properties like bitrate, framerate, height and width are available. The plain mapping using qualityLevel works b/c JW Player to date solely supports single A/V muxed dynamic/adaptive videos - no multi track.


For HTTP adaptive streaming the following statistics have been proposed:

  • downloadRate: The current server-client bandwidth (read-only).
  • videoBitrate: The current video bitrate (read-only).
  • droppedFrames: The total number of frames dropped for this playback session (read-only).
  • decodedFrames: The total number of frames decoded for this playback session (read-only).
  • height: The current height of the video element (already exists).
  • videoHeight: The current height of the videofile (already exists).
  • width: The current width of the video element (already exists).
  • videoWidth: The current width of the videofile (already exists).


Further, a requirement to expose playback rate statistics has come out of issue-147:

  • currentPlaybackRate: the rate at which the video/audio is currently playing back


Here are a few metrics that measure the QoS that a user receives:

  • playerLoadTime
  • streamBitrate

(user interaction and playthrough can be measured using existing events)


MPEG DASH defines quality metrics for adaptive streaming at several levels

  • What is presented to the user i.e. which portions of which versions of the streams and when they were presented. This information implicitly includes within it information like startup delay, timing and duration of pauses due to buffer exhaustion and overall quality (since it includes when the rate adaptation changes happen)
    • This could be very simply represented as a sequence of ( Time, StreamId, Playback Rate ) tuples, one for every point in time at which the playback rate or stream changed
  • Buffer levels over time within the player
  • Performance of the network stack. This includes, for each HTTP request
    • The URL and byte range requested
    • The time when the request was sent, when the response started to arrive and when the response was completed
    • The amount of data received
    • HTTP response code and, if applicable, redirect URL
    • A more detailed trace of data arrival rate, for example bytes received in each 1s or 100ms interval

All of this information is intended for performance monitoring purposes rather than to inform real-time action by the player. It's useful to separate these kinds of information. Performance monitoring information can be reported in batches to the application for reporting back to the server.

HTTP performance information is sometimes collected at the server side. However, with adaptive streaming where streams are constructed at the client from many HTTP requests this becomes more difficult since HTTP requests for a single viewing session may be spread across multiple servers (or even multiple CDNs). It becomes more important as streaming services evolve to collect this information from the client.

The DASH specification also includes information about the TCP level: what TCP connections were established and which HTTP requests were sent on which connection.

The first kind of information above (what is presented) is almost already available based on video element events (changes in the current playback rate). The exception is rate adaptation changes.


Network error codes

It's difficult to define an exhaustive, implementation-independent list of errors. A common solution is to report an "error chain" which is a sequence of increasingly-specific error codes, each of which is the "cause" of the more general error preceding it in the chain. At the end of the chain, implementation-specific error codes can be used. Simple applications can interpret the earlier, standardized, high-level errors. Commercial applications may have an incentive to interpret some of the implementation-specific ones - at least the ones they see often.