A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Sanitization rules: Difference between revisions
(New page: This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph...) |
(+{{obsolete}}) |
||
(18 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
{{obsolete}} | |||
This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser]. | This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser]. | ||
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. | It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]). Another product is [http://www.bloglines.com/help/css-support bloglines]. | ||
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date. | As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date. | ||
As a convenience, [http://intertwingly.net/stories/2007/08/13/sanitize_lists.cgi this script] ([http://intertwingly.net/stories/2007/08/13/sanitize_lists.rb source]) converts these lists into a syntax shared by a number of common programming languages. | |||
=== Acceptable Elements === | === Acceptable Elements === | ||
Line 13: | Line 16: | ||
* area | * area | ||
* b | * b | ||
* bdo | |||
* big | * big | ||
* blockquote | * blockquote | ||
Line 78: | Line 82: | ||
* ul | * ul | ||
* var | * var | ||
* wbr | |||
==== mathml Elements ==== | ==== mathml Elements ==== | ||
Line 425: | Line 430: | ||
* azimuth | * azimuth | ||
* background- | * background, background-* | ||
* | * border, border-* | ||
* border | |||
* | |||
* clear | * clear | ||
* color | * color | ||
Line 448: | Line 448: | ||
* letter-spacing | * letter-spacing | ||
* line-height | * line-height | ||
* margin, margin-* | |||
* overflow | * overflow | ||
* padding, padding-* | |||
* pause | * pause | ||
* pause-after | * pause-after | ||
Line 513: | Line 515: | ||
* yellow | * yellow | ||
==== svg | In addition, values that match the following regular expression are valid: | ||
<code>^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$</code> | |||
==== svg style Properties ==== | |||
* fill | * fill | ||
Line 535: | Line 541: | ||
* xml:base | * xml:base | ||
==== URI | ==== URI schemes ==== | ||
* afs | * afs | ||
* aim | * aim | ||
* callto | * callto | ||
* data (see [[#Safe data URL content types]]) | |||
* ed2k | * ed2k | ||
* feed | * feed | ||
Line 555: | Line 562: | ||
* ssh | * ssh | ||
* tag | * tag | ||
* tel | |||
* telnet | * telnet | ||
* urn | * urn | ||
* webcal | * webcal | ||
* wtai | |||
* xmpp | * xmpp | ||
==== Safe data URL content types ==== | |||
Note: This section is being [http://wiki.whatwg.org/wiki/Talk:Sanitization_rules discussed]. | |||
* text/plain | |||
* image/gif | |||
* image/jpeg | |||
* image/png |
Latest revision as of 16:07, 27 January 2015
This document is obsolete.
This page was initially seeded with the sanitization lists and rules implemented by the html5lib sanitizer, which in turn was based on Jacques Distler's branch of Instiki, which in turn was based on the sanitization logic in the Universal Feed Parser.
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is HTMLPurifier (diffs). Another product is bloglines.
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.
As a convenience, this script (source) converts these lists into a syntax shared by a number of common programming languages.
Acceptable Elements
- a
- abbr
- acronym
- address
- area
- b
- bdo
- big
- blockquote
- br
- button
- caption
- center
- cite
- code
- col
- colgroup
- dd
- del
- dfn
- dir
- div
- dl
- dt
- em
- fieldset
- font
- form
- h1
- h2
- h3
- h4
- h5
- h6
- hr
- i
- img
- input
- ins
- kbd
- label
- legend
- li
- map
- menu
- ol
- optgroup
- option
- p
- pre
- q
- s
- samp
- select
- small
- span
- strike
- strong
- sub
- sup
- table
- tbody
- td
- textarea
- tfoot
- th
- thead
- tr
- tt
- u
- ul
- var
- wbr
mathml Elements
- maction
- math
- merror
- mfrac
- mi
- mmultiscripts
- mn
- mo
- mover
- mpadded
- mphantom
- mprescripts
- mroot
- mrow
- mspace
- msqrt
- mstyle
- msub
- msubsup
- msup
- mtable
- mtd
- mtext
- mtr
- munder
- munderover
- none
svg Elements
- a
- animate
- animateColor
- animateMotion
- animateTransform
- circle
- defs
- desc
- ellipse
- font-face
- font-face-name
- font-face-src
- g
- glyph
- hkern
- image
- linearGradient
- line
- marker
- metadata
- missing-glyph
- mpath
- path
- polygon
- polyline
- radialGradient
- rect
- set
- stop
- svg
- switch
- text
- title
- tspan
- use
Acceptable Attributes
- abbr
- accept
- accept-charset
- accesskey
- action
- align
- alt
- axis
- border
- cellpadding
- cellspacing
- char
- charoff
- charset
- checked
- cite
- class
- clear
- cols
- colspan
- color
- compact
- coords
- datetime
- dir
- disabled
- enctype
- for
- frame
- headers
- height
- href
- hreflang
- hspace
- id
- ismap
- label
- lang
- longdesc
- maxlength
- media
- method
- multiple
- name
- nohref
- noshade
- nowrap
- prompt
- readonly
- rel
- rev
- rows
- rowspan
- rules
- scope
- selected
- shape
- size
- span
- src
- start
- style
- summary
- tabindex
- target
- title
- type
- usemap
- valign
- value
- vspace
- width
- xml:lang
mathml Attributes
- actiontype
- align
- columnalign
- columnalign
- columnalign
- columnlines
- columnspacing
- columnspan
- depth
- display
- displaystyle
- equalcolumns
- equalrows
- fence
- fontstyle
- fontweight
- frame
- height
- linethickness
- lspace
- mathbackground
- mathcolor
- mathvariant
- mathvariant
- maxsize
- minsize
- other
- rowalign
- rowalign
- rowalign
- rowlines
- rowspacing
- rowspan
- rspace
- scriptlevel
- selection
- separator
- stretchy
- width
- width
- xlink:href
- xlink:show
- xlink:type
- xmlns
- xmlns:xlink
svg Attributes
- accent-height
- accumulate
- additive
- alphabetic
- arabic-form
- ascent
- attributeName
- attributeType
- baseProfile
- bbox
- begin
- by
- calcMode
- cap-height
- class
- color
- color-rendering
- content
- cx
- cy
- d
- dx
- dy
- descent
- display
- dur
- end
- fill
- fill-rule
- font-family
- font-size
- font-stretch
- font-style
- font-variant
- font-weight
- from
- fx
- fy
- g1
- g2
- glyph-name
- gradientUnits
- hanging
- height
- horiz-adv-x
- horiz-origin-x
- id
- ideographic
- k
- keyPoints
- keySplines
- keyTimes
- lang
- marker-end
- marker-mid
- marker-start
- markerHeight
- markerUnits
- markerWidth
- mathematical
- max
- min
- name
- offset
- opacity
- orient
- origin
- overline-position
- overline-thickness
- panose-1
- path
- pathLength
- points
- preserveAspectRatio
- r
- refX
- refY
- repeatCount
- repeatDur
- requiredExtensions
- requiredFeatures
- restart
- rotate
- rx
- ry
- slope
- stemh
- stemv
- stop-color
- stop-opacity
- strikethrough-position
- strikethrough-thickness
- stroke
- stroke-dasharray
- stroke-dashoffset
- stroke-linecap
- stroke-linejoin
- stroke-miterlimit
- stroke-opacity
- stroke-width
- systemLanguage
- target
- text-anchor
- to
- transform
- type
- u1
- u2
- underline-position
- underline-thickness
- unicode
- unicode-range
- units-per-em
- values
- version
- viewBox
- visibility
- width
- widths
- x
- x-height
- x1
- x2
- xlink:actuate
- xlink:arcrole
- xlink:href
- xlink:role
- xlink:show
- xlink:title
- xlink:type
- xml:base
- xml:lang
- xml:space
- xmlns
- xmlns:xlink
- y
- y1
- y2
- zoomAndPan
CSS Rules
First urls
matching the following regular expression are removed:
url\s*\(\s*[^\s)]+?\s*\)\s*
The style strings that don't match the following are deemed obfuscated, and ignored entirely:
^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$
^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$
style Properties
- azimuth
- background, background-*
- border, border-*
- clear
- color
- cursor
- direction
- display
- elevation
- float
- font
- font-family
- font-size
- font-style
- font-variant
- font-weight
- height
- letter-spacing
- line-height
- margin, margin-*
- overflow
- padding, padding-*
- pause
- pause-after
- pause-before
- pitch
- pitch-range
- richness
- speak
- speak-header
- speak-numeral
- speak-punctuation
- speech-rate
- stress
- text-align
- text-decoration
- text-indent
- unicode-bidi
- vertical-align
- voice-family
- volume
- white-space
- width
style Property Values
- auto
- aqua
- black
- block
- blue
- bold
- both
- bottom
- brown
- center
- collapse
- dashed
- dotted
- fuchsia
- gray
- green
- !important
- italic
- left
- lime
- maroon
- medium
- none
- navy
- normal
- nowrap
- olive
- pointer
- purple
- red
- right
- solid
- silver
- teal
- top
- transparent
- underline
- white
- yellow
In addition, values that match the following regular expression are valid:
^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$
svg style Properties
- fill
- fill-opacity
- fill-rule
- stroke
- stroke-width
- stroke-linecap
- stroke-linejoin
- stroke-opacity
URIs
Attributes whose value is a URI
- href
- src
- cite
- action
- longdesc
- xlink:href
- xml:base
URI schemes
- afs
- aim
- callto
- data (see #Safe data URL content types)
- ed2k
- feed
- ftp
- gopher
- http
- https
- irc
- mailto
- news
- nntp
- rsync
- rtsp
- sftp
- ssh
- tag
- tel
- telnet
- urn
- webcal
- wtai
- xmpp
Safe data URL content types
Note: This section is being discussed.
- text/plain
- image/gif
- image/jpeg
- image/png