Talk:Sanitization rules

2007-08-23T08:29:57Z

Mrball:

== Is the data URI scheme safe? ==

* Rob Sayre says no and refers to a wikipedia article; however, I cannot see anything in the [http://en.wikipedia.org/wiki/Data:_URI_scheme article] that indicates the scheme is not safe.
** Looking at that wikipedia page, <code>data</code> could only be added if it were followed by an asterisk, kinda like the 756* that I see popping up all over the place these days. In particular, I don't see the use case which would justify the investment in sanitizing <code>text/html</code> encoded as a data URI. Not that it would be difficult, just hard to justify. Perhaps a section could be added which lists safe content types when included in data URIs. -- [[User:Rubys|Rubys]] 03:48, 9 August 2007 (UTC)
* Data URIs should be santizable on a per-MIME type basis. Until a vulnerability is found for text/plain mime types data URIs should be allowed, but other MIME types should be not allowed by default. Other, safer types could then be allowed via white list. -- [[User:Enricopulatzo|Enricopulatzo]] 16:49, 9 August 2007 (UTC)
** The word "default" puzzles me here. The common use case here is small GIFs, JPEGs, and PNGs to be directly embedded in places like CSS and <img> tags. If the associated MIME-types were to be white listed, under what condition would they '''not''' be allowed through? -- [[User:Rubys|Rubys]] 10:30, 10 August 2007 (UTC)
*** One would assume you'd get yourself an actual checker that understands the PNG, JPEG or GIF format, decode the base64 encoder, run it through, and if it were in a valid format, allow it through. This is not strictly necessary: the browser will do this too and fail to display an image if it's not in a valid format. — [[User:Edward Z. Yang|Edward Z. Yang]]([[User talk:Edward Z. Yang|Talk]]) 18:57, 19 August 2007 (UTC)
** Whitelisting data url content-types seems like a good idea. Whether to apply sanitization to the encoded content is up to the sanitizer. White-listed content-types that may require additional sanitization could be flagged somehow. [[User:JamesMSnell|JamesMSnell]]

== Regarding the CSS <code>url()</code> ==

As I understand the proposal, all <code>url()</code> properties are stripped or ignored. Why is this important? If it's to keep people from linking to malicious scripts only, then you've made it difficult for designers to link in background images.

Could we not dereference the URI to determine if it's safe (ie: a valid image, not a script). "Safe" files are then stored on the server doing the sanitization, preventing users from swapping the innocent resource for a malicious one.

--[[User:Roberthahn|Roberthahn]] 12:55, 10 August 2007 (UTC)

: As far as I know, not even that's necessary. Most exploits involving <code>url()</code> involve some variant of <code>url("expression:alert('foo!');")</code>; simply pointing to a JavaScript file like: <code>url("http://example.com/evil.js")</code> should not cause problems: the browser will download the file normally, figure out it's not a valid image, and not do anything else. There, is of course, the risk of external resource retrieval, but it's applicable to <code>img</code> tags too. — [[User:Edward Z. Yang|Edward Z. Yang]]([[User talk:Edward Z. Yang|Talk]]) 18:54, 19 August 2007 (UTC)

== Issues ==

Here are a number of issues I see in these rules, based on my experiences with HTML Purifier:

* Form elements are listed as acceptable. Under certain contexts, they are, but they also be used for the dark side: phishing and such. For example, the [http://it.slashdot.org/article.pl?sid=06/11/21/2319243&from=rss Mozilla Firefox Password Manager Bug] will automatically populate a form with login credentials even if they are hidden and the form points to another website. To be on the safe side, I would argue they are dangerous.
* Attributes really need to be paired up with their elements. <code></code> is harmless enough, but semantically it makes no sense and it will cause the page to stop validating.
* Attribute values need to be validated. Example: <code>id="m*##$ASD83@"</code>
* CSS property values need to be paired up with the appropriate properties. <code>overflow:aqua;</code>

If we take sanitization in the strictest sense of the word: to remove objectionable features, only my first point is valid. However, if you want standards compliant output, all of these bases need to be covered. — [[User:Edward Z. Yang|Edward Z. Yang]]([[User talk:Edward Z. Yang|Talk]]) 15:20, 19 August 2007 (UTC)

* CSS Rules: negative lengths should be allowed? For example: margin-top: -100px;
--[[User:Mrball|Mrball]] 08:29, 23 August 2007 (UTC)

WHATWG Wiki - User contributions [en]

Talk:Sanitization rules