A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
CDATA Escapes: Difference between revisions
Jump to navigation
Jump to search
(→Proposal: Design Notes) |
(Added Philip's image) |
||
Line 128: | Line 128: | ||
<dd>CDATA_COMMENT | <dd>CDATA_COMMENT | ||
</dl> | </dl> | ||
===As Image=== | |||
[http://philip.html5.org/misc/cdata.png The transitions as an image.] | |||
===Heuristic Design Notes=== | ===Heuristic Design Notes=== | ||
The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue.) | The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue.) |
Revision as of 12:27, 12 August 2009
Requirements
Hard Requirements
- It must be possible to have the string "</script>" in a string literal in inline JavaScript without having to use JS-level escapes. (This possibility may be limited to scripts that use the <!-- ... --> "Hide from old browsers" pattern.)
- It must be possible to have "<!--" and "-->" in string literals in inline JavaScript without having to use JS-level escapes.
- Must not rewind and reparse with different rules.
Medium Requirements
- It should be possible to have the string <!-- in xmp without having the rest of the page eaten up into xmp element.
- It should be possible to have <!-- near the start of a script or style element without having a matching --> and still the trailing part of the page shouldn't get eaten up into the script or style element.
- Pages authored naively for HTML5-parsing-enabled UAs shouldn't be XSS risks in legacy UAs.
- When the author uses comment-like syntax in the fallback markup in iframe, noembed or noframes, the comment-like syntax should span the same character run that it would if it were parsed as markup.
Nice to Have Requirements
- It would be nice for the rest of the page not to get eaten up when the author omits </title> accidentally or mistypes it as <title>.
Proposal
- Remove <!-- ... --> escapes from title, textarea and xmp.
- Make the closing condition for <!-- ... --> in iframe, noembed and noframes match the comment closing conditions exactly.
- Remove <!-- ... --> escapes from script and style and introduce a novel string literal detector heuristic.
String Literal Detector Heuristic
CDATA
<
- TAG_OPEN_NON_PCDATA with CDATA as return state
/
- CDATA_SLASH
"
- CDATA_DOUBLE_QUOTED
'
- CDATA_SINGLE_QUOTED
- Anything else
- Stay
CDATA_SLASH
<
- TAG_OPEN_NON_PCDATA with CDATA as return state
/
- CDATA_LINE_COMMENT
*
- CDATA_COMMENT
- Anything else
- CDATA
CDATA_DOUBLE_QUOTED
"
- Line feed
- CDATA
\
- CDATA_DOUBLE_QUOTED_BACKSLASH
- Anything else
- Stay
CDATA_SINGLE_QUOTED
'
- Line feed
- CDATA
\
- CDATA_SINGLE_QUOTED_BACKSLASH
- Anything else
- Stay
CDATA_DOUBLE_QUOTED_BACKSLASH
- Line feed
- CDATA
- Anything else
- CDATA_DOUBLE_QUOTED
CDATA_SINGLE_QUOTED_BACKSLASH
- Line feed
- CDATA
- Anything else
- CDATA_SINGLE_QUOTED
CDATA_LINE_COMMENT
- Line feed
- CDATA
<
- TAG_OPEN_NON_PCDATA with CDATA_LINE_COMMENT as return state
- Anything else
- Stay
CDATA_COMMENT
*
- CDATA_COMMENT_ASTERISK
<
- TAG_OPEN_NON_PCDATA with CDATA_COMMENT as return state
- Anything else
- Stay
CDATA_COMMENT_ASTERISK
/
- CDATA
<
- TAG_OPEN_NON_PCDATA with CDATA_COMMENT as return state
- Anything else
- CDATA_COMMENT
As Image
Heuristic Design Notes
The heuristic doesn't attempt to detect single or double quotes appearing inside a regular expression literal, because detecting regular expression literals properly is complicated and it's unlikely that </script> would appear in a string literal on the same line with such a regular expression. (I'm assuming the inline minified JS with a minifier that can't escape </script> is a non-issue.)