A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

SRT research: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
m (categorize)
 
(11 intermediate revisions by 5 users not shown)
Line 4: Line 4:
* [http://zuggy.wz.cz/dvd.php SubRip] (Windows only)
* [http://zuggy.wz.cz/dvd.php SubRip] (Windows only)
* [http://home.gna.org/subtitleeditor/ Subtitle Editor] (For UNIX/GTK+2/GStreamer)
* [http://home.gna.org/subtitleeditor/ Subtitle Editor] (For UNIX/GTK+2/GStreamer)
*: Does not appear to support writing any styling tags or other than the basic times+text.


=== Test cases ===
=== Test cases ===
Line 9: Line 10:


== Interpreters ==
== Interpreters ==
http://ale5000.altervista.org/subtitles.htm tests many quirky SRT features in many popular players.
For each interpreter, please link to a page with videos showing how the test files are rendered, or describe the results.
For each interpreter, please link to a page with videos showing how the test files are rendered, or describe the results.


Line 28: Line 31:
* 1-015 - PASS unknown settings ignored
* 1-015 - PASS unknown settings ignored
* 1-016 - Leading WEBSRT header ignored
* 1-016 - Leading WEBSRT header ignored
* 1-018 - blank line necessary between cues


=== Totem Movie Player ===
=== Totem Movie Player ===


Totem is the media player Ubuntu ships by default.
Totem is the media player Ubuntu ships by default. It is based on GStreamer, so other software using GStreamer should have similar results.


Totem does not seem to support SRT files with less than 3 cues and sometimes asks if you want to install the application/x-subrip plugin which it cannot find.
Totem does not seem to support SRT files with less than 3 cues and sometimes asks if you want to install the application/x-subrip plugin which it cannot find.
Line 62: Line 66:
* 1-011 - FAIL: only "P" appears
* 1-011 - FAIL: only "P" appears
* 1-012 - FAIL: all text is unstyled. displayed text, in order, is: "Formatting test", "italics", "bold", "underline", "font", "font size=3", "font color=#00FF00", 'font color="#00FF00"', "font color=green", 'font color="green"'
* 1-012 - FAIL: all text is unstyled. displayed text, in order, is: "Formatting test", "italics", "bold", "underline", "font", "font size=3", "font color=#00FF00", 'font color="#00FF00"', "font color=green", 'font color="green"'
* 1-013 -  
* 1-013 - FAIL: all text is unstyled, all tags are stripped. needs more analysis?
* 1-014 -  
* 1-014 - PASS: probably, exact timing only verified "by eye"
* 1-015 -  
* 1-015 - PASS: Line 1..5 appears with no garbage and timing seems correct
* 1-016 -
* 1-016 - PASS: leading WEBSRT is ignored


=== Test cases ===
=== Test cases ===
# http://www.hixie.ch/tests/adhoc/srt/ (Theora test video: http://senduit.com/057437)
# http://www.hixie.ch/tests/adhoc/srt/


= Existing content =
= Existing content =
To avoid sample biases, where possible, this should be based on automated searches.
To avoid sample biases, where possible, this should be based on automated searches.
* ...
 
== OpenSubtitles ==
http://blog.foolip.org/2010/08/20/srt-research/
 
Analysis of 10000 files provided by OpenSubtitles. Notable results:
* Only 6.66% were valid UTF-8
* 17.07% had overlapping cues
* Only 0.38% had something other than whitespace trailing the timings
* 55.25% used some kind of markup, the most common being <i>, <b>, <font ...> and <u>
 
== Random notes in #whatwg ==
http://krijnhoetmer.nl/irc-logs/whatwg/20100630#l-104
 
# [01:18] <zcorpan_> Hixie: the first random srt i download has <i> wrapping multiple lines 
# [01:18] <zcorpan_> (well, the second random. the first didn't have any markup) 
# [01:19] <zcorpan_> http://www.opensubtitles.org/en/download/sub/3695049 
# [01:19] <zcorpan_> i have no idea which encoding that one is using 
# [01:21] <Hixie> zcorpan_: ah, excellent, good to know, thanks 
# [01:26] <zcorpan_> it seems existing srts use different legacy encodings and don't declare it :( 
# [01:43] * zcorpan_ finds an srt with Traducerea ∫i adaptarea: <font color=#ff99cc>Kprice</font> 
# [01:43] * zcorpan_ <font color=#ffffff>Subtitr„ri-Noi Team</font> 
# [19:27] * zcorpan_ finds an srt with <b><font color="#00afad">Join us on Facebook ! 
# [19:27] * zcorpan_ Squadra Dell'Ombra</font></b> 
# [19:27] <zcorpan_> Hixie: ^ 
# [19:33] <zcorpan_> Hixie: http://www.tvsubtitles.net/subtitle-132566.html has <i>Et maintenant "Les transcroyables 
# [19:33] <zcorpan_> exploits de Zapp Brannigan" 
# [19:33] <zcorpan_> (i.e. unclosed <i> 
# [19:33] <zcorpan_> ) 
# [19:34] <Hixie> zcorpan_: i'm intentionally not going to be supporting attributes at this point 
# [19:34] <Hixie> zcorpan_: and will support unclosed <i>s 
# [19:36] <zcorpan_> Hixie: it was just the first srt i found with <b> 
# [19:36] <Hixie> ah 
# [19:36] <zcorpan_> {\pos(192,240)}On a des photos. 
# [19:36] <zcorpan_> http://www.tvsubtitles.net/subtitle-132551.html 
# [19:37] <zcorpan_> <i>{\a6}TY, 
# [19:37] <zcorpan_> L'ASSISTANT DU CORONER</i> 
# [19:39] <Hixie> wow, i wonder what UA supports that 
# [19:46] <zcorpan_> VLC doesn't support it 
# [20:00] <zcorpan_> MPlayer seems to support {\pos(192,240)} 
# [20:01] <zcorpan_> and ignores {\a6}, or replaces it with a space or something 
# [20:10] <zcorpan_> Hixie: all subtitles from tvsubtitles.net seem to have 9999 
# [20:10] <zcorpan_> 00:00:0,500 --> 00:00:2,00 
# [20:10] <zcorpan_> <font color="#ffff00" size=14>www.tvsubtitles.net</font> 
# [20:10] <zcorpan_> at the *end* of the file 
# [20:16] <Hixie> yeah, i think the parser will likely support out-of-order cues 
# [20:17] <Hixie> (in fact it already does) 
# [20:18] <Hixie> it doesn't support times with only one digit for the seconds or two digits for the thousandths, though 
# [20:18] <Hixie> do UAs support that? 
# [20:18] <Hixie> it's trivial for me to add support if necessary 
# [20:20] <zcorpan_> VLC supports single-digit seconds 
# [20:21] <zcorpan_> MPlayer too 
# [20:25] <zcorpan_> heh, vlc interprets 00:00:00,5000 as if it were 00:00:05,000 
# [20:29] <zcorpan_> mplayer also interprets 00:00:00,5000 as if it were 00:00:05,000 
# [20:33] <zcorpan_> ok vlc interprets 00:00:01,99 as 00:00:01,099 
# [20:36] <zcorpan_> vlc seems to just overlap without changing position
 
[[Category:Proposals]]

Latest revision as of 12:19, 26 January 2011

Implementations

Authoring tools

  • SubRip (Windows only)
  • Subtitle Editor (For UNIX/GTK+2/GStreamer)
    Does not appear to support writing any styling tags or other than the basic times+text.

Test cases

  • ...

Interpreters

http://ale5000.altervista.org/subtitles.htm tests many quirky SRT features in many popular players.

For each interpreter, please link to a page with videos showing how the test files are rendered, or describe the results.

VLC

  • 1-001 - PASS (basic one-cue)
  • 1-002 - PASS (basic one-cie two-line)
  • 1-003 - PASS (basic two-cue)
  • 1-004 - no X1: support apparent; positioning information ignored
  • 1-005 - PASS non-numeric IDs supported
  • 1-006 - PASS IDs optional
  • 1-007 - PASS (control)
  • 1-008 - PASS out-of-order IDs ignored
  • 1-009 - FAIL non-chronological titles strangely skipped
  • 1-010 - FAIL non-chronological titles strangely skipped
  • 1-011 - PASS simultaneous titles supported
  • 1-012 - <i>, <b>, <u> supported; <font> parsed but ignored with no effect
  • 1-013 - inline formatting supported in a somewhat buggy fashion; end tags implied; unknown tags ignored if matched only; known tags auto-close
  • 1-014 - PASS decimal separator supported
  • 1-015 - PASS unknown settings ignored
  • 1-016 - Leading WEBSRT header ignored
  • 1-018 - blank line necessary between cues

Totem Movie Player

Totem is the media player Ubuntu ships by default. It is based on GStreamer, so other software using GStreamer should have similar results.

Totem does not seem to support SRT files with less than 3 cues and sometimes asks if you want to install the application/x-subrip plugin which it cannot find.

  • 1-004 - no support for X1.
  • 1-005 - no support for non-numeric IDs
  • 1-006 - no support for missing IDs
  • 1-007 - PASS
  • 1-008 - PASS - out of order IDs ignored
  • 1-009 - FAIL - everything but ---- and ---4 is not shown
  • 1-010 - FAIL - everything but ---- and ---4 is not shown
  • 1-011 - FAIL - only the "P" is displayed
  • 1-012 - only italics and underline work (with non-default font settings bold also works)
  • 1-013 - end tags implied at end of cue; numbers following < causes < to be emitted, <b <i> gives <b followed by text in italics; unknown tags ignored
  • 1-014 - PASS - afaict
  • 1-015 - PASS
  • 1-016 - not recognized as SRT

MPlayer

  • 1-001 - PASS
  • 1-002 - PASS
  • 1-003 - PASS
  • 1-004 - FAIL: X/Y is ignored
  • 1-005 - PASS
  • 1-006 - PASS
  • 1-007 - PASS
  • 1-008 - PASS
  • 1-009 - PASS
  • 1-010 - PASS
  • 1-011 - FAIL: only "P" appears
  • 1-012 - FAIL: all text is unstyled. displayed text, in order, is: "Formatting test", "italics", "bold", "underline", "font", "font size=3", "font color=#00FF00", 'font color="#00FF00"', "font color=green", 'font color="green"'
  • 1-013 - FAIL: all text is unstyled, all tags are stripped. needs more analysis?
  • 1-014 - PASS: probably, exact timing only verified "by eye"
  • 1-015 - PASS: Line 1..5 appears with no garbage and timing seems correct
  • 1-016 - PASS: leading WEBSRT is ignored

Test cases

  1. http://www.hixie.ch/tests/adhoc/srt/

Existing content

To avoid sample biases, where possible, this should be based on automated searches.

OpenSubtitles

http://blog.foolip.org/2010/08/20/srt-research/

Analysis of 10000 files provided by OpenSubtitles. Notable results:

  • Only 6.66% were valid UTF-8
  • 17.07% had overlapping cues
  • Only 0.38% had something other than whitespace trailing the timings
  • 55.25% used some kind of markup, the most common being <i>, <b>, <font ...> and <u>

Random notes in #whatwg

http://krijnhoetmer.nl/irc-logs/whatwg/20100630#l-104

# [01:18] <zcorpan_> Hixie: the first random srt i download has <i> wrapping multiple lines  
# [01:18] <zcorpan_> (well, the second random. the first didn't have any markup)  
# [01:19] <zcorpan_> http://www.opensubtitles.org/en/download/sub/3695049  
# [01:19] <zcorpan_> i have no idea which encoding that one is using  
# [01:21] <Hixie> zcorpan_: ah, excellent, good to know, thanks  
# [01:26] <zcorpan_> it seems existing srts use different legacy encodings and don't declare it :(  

# [01:43] * zcorpan_ finds an srt with Traducerea ∫i adaptarea: <font color=#ff99cc>Kprice</font>  
# [01:43] * zcorpan_ <font color=#ffffff>Subtitr„ri-Noi Team</font>  

# [19:27] * zcorpan_ finds an srt with <b><font color="#00afad">Join us on Facebook !  
# [19:27] * zcorpan_ Squadra Dell'Ombra</font></b>  
# [19:27] <zcorpan_> Hixie: ^  
# [19:33] <zcorpan_> Hixie: http://www.tvsubtitles.net/subtitle-132566.html has <i>Et maintenant "Les transcroyables  
# [19:33] <zcorpan_> exploits de Zapp Brannigan"  
# [19:33] <zcorpan_> (i.e. unclosed <i>  
# [19:33] <zcorpan_> )  
# [19:34] <Hixie> zcorpan_: i'm intentionally not going to be supporting attributes at this point  
# [19:34] <Hixie> zcorpan_: and will support unclosed <i>s  
# [19:36] <zcorpan_> Hixie: it was just the first srt i found with <b>  
# [19:36] <Hixie> ah  
# [19:36] <zcorpan_> {\pos(192,240)}On a des photos.  
# [19:36] <zcorpan_> http://www.tvsubtitles.net/subtitle-132551.html  
# [19:37] <zcorpan_> <i>{\a6}TY,  
# [19:37] <zcorpan_> L'ASSISTANT DU CORONER</i>  
# [19:39] <Hixie> wow, i wonder what UA supports that  
# [19:46] <zcorpan_> VLC doesn't support it  

# [20:00] <zcorpan_> MPlayer seems to support {\pos(192,240)}  
# [20:01] <zcorpan_> and ignores {\a6}, or replaces it with a space or something  
# [20:10] <zcorpan_> Hixie: all subtitles from tvsubtitles.net seem to have 9999  
# [20:10] <zcorpan_> 00:00:0,500 --> 00:00:2,00  
# [20:10] <zcorpan_> <font color="#ffff00" size=14>www.tvsubtitles.net</font>  
# [20:10] <zcorpan_> at the *end* of the file  
# [20:16] <Hixie> yeah, i think the parser will likely support out-of-order cues  
# [20:17] <Hixie> (in fact it already does)  
# [20:18] <Hixie> it doesn't support times with only one digit for the seconds or two digits for the thousandths, though  
# [20:18] <Hixie> do UAs support that?  
# [20:18] <Hixie> it's trivial for me to add support if necessary  
# [20:20] <zcorpan_> VLC supports single-digit seconds  
# [20:21] <zcorpan_> MPlayer too  
# [20:25] <zcorpan_> heh, vlc interprets 00:00:00,5000 as if it were 00:00:05,000  
# [20:29] <zcorpan_> mplayer also interprets 00:00:00,5000 as if it were 00:00:05,000  
# [20:33] <zcorpan_> ok vlc interprets 00:00:01,99 as 00:00:01,099  
# [20:36] <zcorpan_> vlc seems to just overlap without changing position