A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

CDATA Escapes: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
(Added Philip's image)
(→‎Proposal: Note that it is a FAIL)
Line 23: Line 23:
* Make the closing condition for <!-- ... --> in iframe, noembed and noframes match the comment closing conditions exactly.
* Make the closing condition for <!-- ... --> in iframe, noembed and noframes match the comment closing conditions exactly.
* Remove <!-- ... --> escapes from script and style and introduce a novel string literal detector heuristic.
* Remove <!-- ... --> escapes from script and style and introduce a novel string literal detector heuristic.
'''The proposal below is a failure. Due to lone quotes appearing inside regexp literals on the same line with the closing </script> in the wild according to Philip.'''


===String Literal Detector Heuristic===
===String Literal Detector Heuristic===
Line 135: Line 137:
===Heuristic Design Notes===
===Heuristic Design Notes===


The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue.)
The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue. '''This assumption was very bad and makes the whole thing fail.''')

Revision as of 12:39, 12 August 2009

Requirements

Hard Requirements

  • It must be possible to have the string "</script>" in a string literal in inline JavaScript without having to use JS-level escapes. (This possibility may be limited to scripts that use the <!-- ... --> "Hide from old browsers" pattern.)
  • It must be possible to have "<!--" and "-->" in string literals in inline JavaScript without having to use JS-level escapes.
  • Must not rewind and reparse with different rules.

Medium Requirements

  • It should be possible to have the string <!-- in xmp without having the rest of the page eaten up into xmp element.
  • It should be possible to have <!-- near the start of a script or style element without having a matching --> and still the trailing part of the page shouldn't get eaten up into the script or style element.
  • Pages authored naively for HTML5-parsing-enabled UAs shouldn't be XSS risks in legacy UAs.
  • When the author uses comment-like syntax in the fallback markup in iframe, noembed or noframes, the comment-like syntax should span the same character run that it would if it were parsed as markup.

Nice to Have Requirements

  • It would be nice for the rest of the page not to get eaten up when the author omits </title> accidentally or mistypes it as <title>.

Proposal

  • Remove <!-- ... --> escapes from title, textarea and xmp.
  • Make the closing condition for <!-- ... --> in iframe, noembed and noframes match the comment closing conditions exactly.
  • Remove <!-- ... --> escapes from script and style and introduce a novel string literal detector heuristic.

The proposal below is a failure. Due to lone quotes appearing inside regexp literals on the same line with the closing </script> in the wild according to Philip.

String Literal Detector Heuristic

CDATA

<
TAG_OPEN_NON_PCDATA with CDATA as return state
/
CDATA_SLASH
"
CDATA_DOUBLE_QUOTED
'
CDATA_SINGLE_QUOTED
Anything else
Stay

CDATA_SLASH

<
TAG_OPEN_NON_PCDATA with CDATA as return state
/
CDATA_LINE_COMMENT
*
CDATA_COMMENT
Anything else
CDATA

CDATA_DOUBLE_QUOTED

"
Line feed
CDATA
\
CDATA_DOUBLE_QUOTED_BACKSLASH
Anything else
Stay

CDATA_SINGLE_QUOTED

'
Line feed
CDATA
\
CDATA_SINGLE_QUOTED_BACKSLASH
Anything else
Stay

CDATA_DOUBLE_QUOTED_BACKSLASH

Line feed
CDATA
Anything else
CDATA_DOUBLE_QUOTED

CDATA_SINGLE_QUOTED_BACKSLASH

Line feed
CDATA
Anything else
CDATA_SINGLE_QUOTED

CDATA_LINE_COMMENT

Line feed
CDATA
<
TAG_OPEN_NON_PCDATA with CDATA_LINE_COMMENT as return state
Anything else
Stay

CDATA_COMMENT

*
CDATA_COMMENT_ASTERISK
<
TAG_OPEN_NON_PCDATA with CDATA_COMMENT as return state
Anything else
Stay

CDATA_COMMENT_ASTERISK

/
CDATA
<
TAG_OPEN_NON_PCDATA with CDATA_COMMENT as return state
Anything else
CDATA_COMMENT

As Image

The transitions as an image.

Heuristic Design Notes

The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue. This assumption was very bad and makes the whole thing fail.)