A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Talk:Sanitization rules: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
(New page: Is the data URI scheme safe? Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indi...)
 
No edit summary
 
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Is the data URI scheme safe?
== Is the data URI scheme safe? ==


  Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indicates the scheme is not safe.
* Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indicates the scheme is not safe.
** Looking at that wikipedia page, <code>data</code> could only be added if it were followed by an asterisk, kinda like the 756* that I see popping up all over the place these days.  In particular, I don't see the use case which would justify the investment in sanitizing <code>text/html</code> encoded as a data URI.  Not that it would be difficult, just hard to justify.  Perhaps a section could be added which lists safe content types when included in data URIs. -- [[User:Rubys|Rubys]] 03:48, 9 August 2007 (UTC)
* Data URIs should be santizable on a per-MIME type basis.  Until a vulnerability is found for text/plain mime types data URIs should be allowed, but other MIME types should be not allowed by default.  Other, safer types could then be allowed via white list. -- [[User:Enricopulatzo|Enricopulatzo]] 16:49, 9 August 2007 (UTC)
** The word "default" puzzles me here.  The common use case here is small GIFs, JPEGs, and PNGs to be directly embedded in places like CSS and <img> tags.  If the associated MIME-types were to be white listed, under what condition would they '''not''' be allowed through? -- [[User:Rubys|Rubys]] 10:30, 10 August 2007 (UTC)
*** One would assume you'd get yourself an actual checker that understands the PNG, JPEG or GIF format, decode the base64 encoder, run it through, and if it were in a valid format, allow it through. This is not strictly necessary: the browser will do this too and fail to display an image if it's not in a valid format. &mdash; <span style="font-variant:small-caps;font-family:sans-serif;">[[User:Edward Z. Yang|Edward Z. Yang]]</span><sup style="font-family:serif;">([[User talk:Edward Z. Yang|Talk]])</sup> 18:57, 19 August 2007 (UTC)
** Whitelisting data url content-types seems like a good idea.  Whether to apply sanitization to the encoded content is up to the sanitizer.  White-listed content-types that may require additional sanitization could be flagged somehow.  [[User:JamesMSnell|JamesMSnell]]
 
== Regarding the CSS <code>url()</code> ==
 
As I understand the proposal, all <code>url()</code> properties are stripped or ignored.  Why is this important?  If it's to keep people from linking to malicious scripts only, then you've made it difficult for designers to link in background images.
 
Could we not dereference the URI to determine if it's safe (ie: a valid image, not a script).  "Safe" files are then stored on the server doing the sanitization, preventing users from swapping the innocent resource for a malicious one.
 
--[[User:Roberthahn|Roberthahn]] 12:55, 10 August 2007 (UTC)
 
: As far as I know, not even that's necessary. Most exploits involving <code>url()</code> involve some variant of <code>url("expression:alert('foo!');")</code>; simply pointing to a JavaScript file like: <code>url("http://example.com/evil.js")</code> should not cause problems: the browser will download the file normally, figure out it's not a valid image, and not do anything else. There, is of course, the risk of external resource retrieval, but it's applicable to <code>img</code> tags too. &mdash; <span style="font-variant:small-caps;font-family:sans-serif;">[[User:Edward Z. Yang|Edward Z. Yang]]</span><sup style="font-family:serif;">([[User talk:Edward Z. Yang|Talk]])</sup> 18:54, 19 August 2007 (UTC)
 
== Issues ==
 
Here are a number of issues I see in these rules, based on my experiences with HTML Purifier:
 
* Form elements are listed as acceptable. Under certain contexts, they are, but they also be used for the dark side: phishing and such. For example, the [http://it.slashdot.org/article.pl?sid=06/11/21/2319243&from=rss Mozilla Firefox Password Manager Bug] will automatically populate a form with login credentials even if they are hidden and the form points to another website. To be on the safe side, I would argue they are dangerous.
* Attributes really need to be paired up with their elements. <code>&lt;b summary="" readonly ismap&gt;</code> is harmless enough, but semantically it makes no sense and it will cause the page to stop validating.
* Attribute values need to be validated. Example: <code>id="m*##$ASD83@"</code>
* CSS property values need to be paired up with the appropriate properties. <code>overflow:aqua;</code>
 
If we take sanitization in the strictest sense of the word: to remove objectionable features, only my first point is valid. However, if you want standards compliant output, all of these bases need to be covered. &mdash; <span style="font-variant:small-caps;font-family:sans-serif;">[[User:Edward Z. Yang|Edward Z. Yang]]</span><sup style="font-family:serif;">([[User talk:Edward Z. Yang|Talk]])</sup> 15:20, 19 August 2007 (UTC)
 
* CSS Rules: negative lengths should be allowed? For example: margin-top: -100px;
--[[User:Mrball|Mrball]] 08:29, 23 August 2007 (UTC)

Latest revision as of 08:29, 23 August 2007

Is the data URI scheme safe?

  • Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the article that indicates the scheme is not safe.
    • Looking at that wikipedia page, data could only be added if it were followed by an asterisk, kinda like the 756* that I see popping up all over the place these days. In particular, I don't see the use case which would justify the investment in sanitizing text/html encoded as a data URI. Not that it would be difficult, just hard to justify. Perhaps a section could be added which lists safe content types when included in data URIs. -- Rubys 03:48, 9 August 2007 (UTC)
  • Data URIs should be santizable on a per-MIME type basis. Until a vulnerability is found for text/plain mime types data URIs should be allowed, but other MIME types should be not allowed by default. Other, safer types could then be allowed via white list. -- Enricopulatzo 16:49, 9 August 2007 (UTC)
    • The word "default" puzzles me here. The common use case here is small GIFs, JPEGs, and PNGs to be directly embedded in places like CSS and <img> tags. If the associated MIME-types were to be white listed, under what condition would they not be allowed through? -- Rubys 10:30, 10 August 2007 (UTC)
      • One would assume you'd get yourself an actual checker that understands the PNG, JPEG or GIF format, decode the base64 encoder, run it through, and if it were in a valid format, allow it through. This is not strictly necessary: the browser will do this too and fail to display an image if it's not in a valid format. — Edward Z. Yang(Talk) 18:57, 19 August 2007 (UTC)
    • Whitelisting data url content-types seems like a good idea. Whether to apply sanitization to the encoded content is up to the sanitizer. White-listed content-types that may require additional sanitization could be flagged somehow. JamesMSnell

Regarding the CSS url()

As I understand the proposal, all url() properties are stripped or ignored. Why is this important? If it's to keep people from linking to malicious scripts only, then you've made it difficult for designers to link in background images.

Could we not dereference the URI to determine if it's safe (ie: a valid image, not a script). "Safe" files are then stored on the server doing the sanitization, preventing users from swapping the innocent resource for a malicious one.

--Roberthahn 12:55, 10 August 2007 (UTC)

As far as I know, not even that's necessary. Most exploits involving url() involve some variant of url("expression:alert('foo!');"); simply pointing to a JavaScript file like: url("http://example.com/evil.js") should not cause problems: the browser will download the file normally, figure out it's not a valid image, and not do anything else. There, is of course, the risk of external resource retrieval, but it's applicable to img tags too. — Edward Z. Yang(Talk) 18:54, 19 August 2007 (UTC)

Issues

Here are a number of issues I see in these rules, based on my experiences with HTML Purifier:

  • Form elements are listed as acceptable. Under certain contexts, they are, but they also be used for the dark side: phishing and such. For example, the Mozilla Firefox Password Manager Bug will automatically populate a form with login credentials even if they are hidden and the form points to another website. To be on the safe side, I would argue they are dangerous.
  • Attributes really need to be paired up with their elements. <b summary="" readonly ismap> is harmless enough, but semantically it makes no sense and it will cause the page to stop validating.
  • Attribute values need to be validated. Example: id="m*##$ASD83@"
  • CSS property values need to be paired up with the appropriate properties. overflow:aqua;

If we take sanitization in the strictest sense of the word: to remove objectionable features, only my first point is valid. However, if you want standards compliant output, all of these bases need to be covered. — Edward Z. Yang(Talk) 15:20, 19 August 2007 (UTC)

  • CSS Rules: negative lengths should be allowed? For example: margin-top: -100px;

--Mrball 08:29, 23 August 2007 (UTC)