A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "Sanitization rules"

From WHATWG Wiki
Jump to: navigation, search
(Acceptable Elements)
(Add /(background|border|margin|padding)(-.*)?/; links)
Line 1: Line 1:
 
This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].
 
This page was initially seeded with the sanitization lists and rules implemented by the [http://code.google.com/p/html5lib/ html5lib] sanitizer, which in turn was based on [http://golem.ph.utexas.edu/instiki/show/HomePage Jacques Distler's branch of Instiki], which in turn was based on the sanitization logic in the [http://www.feedparser.org/ Universal Feed Parser].
  
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists.
+
It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists.  One such product is [http://htmlpurifier.org/ HTMLPurifier] ([http://intertwingly.net/stories/2007/08/11/diffs diffs]).
  
 
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.
 
As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.
Line 426: Line 426:
  
 
* azimuth
 
* azimuth
* background-color
+
* background, background-*
* border-bottom-color
+
* border, border-*
* border-collapse
 
* border-color
 
* border-left-color
 
* border-right-color
 
* border-top-color
 
 
* clear
 
* clear
 
* color
 
* color
Line 449: Line 444:
 
* letter-spacing
 
* letter-spacing
 
* line-height
 
* line-height
 +
* margin, margin-*
 
* overflow
 
* overflow
 +
* padding, padding-*
 
* pause
 
* pause
 
* pause-after
 
* pause-after
Line 545: Line 542:
 
* aim
 
* aim
 
* callto
 
* callto
* data (see below)
+
* data (see [[#Safe data URL content types]])
 
* ed2k
 
* ed2k
 
* feed
 
* feed

Revision as of 09:23, 11 August 2007

This page was initially seeded with the sanitization lists and rules implemented by the html5lib sanitizer, which in turn was based on Jacques Distler's branch of Instiki, which in turn was based on the sanitization logic in the Universal Feed Parser.

It is hoped that others will add, update, and extend this list based on their experiences in their own products, and furthermore that some will update their products based on these lists. One such product is HTMLPurifier (diffs).

As a suggestion but not as a requirement: people who do update their products to reflect information from this list are encouraged to add a link to this page as a comment in the hopes that it will encourage subsequent maintainers to keep this page up to date.

Acceptable Elements

  • a
  • abbr
  • acronym
  • address
  • area
  • b
  • bdo
  • big
  • blockquote
  • br
  • button
  • caption
  • center
  • cite
  • code
  • col
  • colgroup
  • dd
  • del
  • dfn
  • dir
  • div
  • dl
  • dt
  • em
  • fieldset
  • font
  • form
  • h1
  • h2
  • h3
  • h4
  • h5
  • h6
  • hr
  • i
  • img
  • input
  • ins
  • kbd
  • label
  • legend
  • li
  • map
  • menu
  • ol
  • optgroup
  • option
  • p
  • pre
  • q
  • s
  • samp
  • select
  • small
  • span
  • strike
  • strong
  • sub
  • sup
  • table
  • tbody
  • td
  • textarea
  • tfoot
  • th
  • thead
  • tr
  • tt
  • u
  • ul
  • var

mathml Elements

  • maction
  • math
  • merror
  • mfrac
  • mi
  • mmultiscripts
  • mn
  • mo
  • mover
  • mpadded
  • mphantom
  • mprescripts
  • mroot
  • mrow
  • mspace
  • msqrt
  • mstyle
  • msub
  • msubsup
  • msup
  • mtable
  • mtd
  • mtext
  • mtr
  • munder
  • munderover
  • none

svg Elements

  • a
  • animate
  • animateColor
  • animateMotion
  • animateTransform
  • circle
  • defs
  • desc
  • ellipse
  • font-face
  • font-face-name
  • font-face-src
  • g
  • glyph
  • hkern
  • image
  • linearGradient
  • line
  • marker
  • metadata
  • missing-glyph
  • mpath
  • path
  • polygon
  • polyline
  • radialGradient
  • rect
  • set
  • stop
  • svg
  • switch
  • text
  • title
  • tspan
  • use

Acceptable Attributes

  • abbr
  • accept
  • accept-charset
  • accesskey
  • action
  • align
  • alt
  • axis
  • border
  • cellpadding
  • cellspacing
  • char
  • charoff
  • charset
  • checked
  • cite
  • class
  • clear
  • cols
  • colspan
  • color
  • compact
  • coords
  • datetime
  • dir
  • disabled
  • enctype
  • for
  • frame
  • headers
  • height
  • href
  • hreflang
  • hspace
  • id
  • ismap
  • label
  • lang
  • longdesc
  • maxlength
  • media
  • method
  • multiple
  • name
  • nohref
  • noshade
  • nowrap
  • prompt
  • readonly
  • rel
  • rev
  • rows
  • rowspan
  • rules
  • scope
  • selected
  • shape
  • size
  • span
  • src
  • start
  • style
  • summary
  • tabindex
  • target
  • title
  • type
  • usemap
  • valign
  • value
  • vspace
  • width
  • xml:lang

mathml Attributes

  • actiontype
  • align
  • columnalign
  • columnalign
  • columnalign
  • columnlines
  • columnspacing
  • columnspan
  • depth
  • display
  • displaystyle
  • equalcolumns
  • equalrows
  • fence
  • fontstyle
  • fontweight
  • frame
  • height
  • linethickness
  • lspace
  • mathbackground
  • mathcolor
  • mathvariant
  • mathvariant
  • maxsize
  • minsize
  • other
  • rowalign
  • rowalign
  • rowalign
  • rowlines
  • rowspacing
  • rowspan
  • rspace
  • scriptlevel
  • selection
  • separator
  • stretchy
  • width
  • width
  • xlink:href
  • xlink:show
  • xlink:type
  • xmlns
  • xmlns:xlink

svg Attributes

  • accent-height
  • accumulate
  • additive
  • alphabetic
  • arabic-form
  • ascent
  • attributeName
  • attributeType
  • baseProfile
  • bbox
  • begin
  • by
  • calcMode
  • cap-height
  • class
  • color
  • color-rendering
  • content
  • cx
  • cy
  • d
  • dx
  • dy
  • descent
  • display
  • dur
  • end
  • fill
  • fill-rule
  • font-family
  • font-size
  • font-stretch
  • font-style
  • font-variant
  • font-weight
  • from
  • fx
  • fy
  • g1
  • g2
  • glyph-name
  • gradientUnits
  • hanging
  • height
  • horiz-adv-x
  • horiz-origin-x
  • id
  • ideographic
  • k
  • keyPoints
  • keySplines
  • keyTimes
  • lang
  • marker-end
  • marker-mid
  • marker-start
  • markerHeight
  • markerUnits
  • markerWidth
  • mathematical
  • max
  • min
  • name
  • offset
  • opacity
  • orient
  • origin
  • overline-position
  • overline-thickness
  • panose-1
  • path
  • pathLength
  • points
  • preserveAspectRatio
  • r
  • refX
  • refY
  • repeatCount
  • repeatDur
  • requiredExtensions
  • requiredFeatures
  • restart
  • rotate
  • rx
  • ry
  • slope
  • stemh
  • stemv
  • stop-color
  • stop-opacity
  • strikethrough-position
  • strikethrough-thickness
  • stroke
  • stroke-dasharray
  • stroke-dashoffset
  • stroke-linecap
  • stroke-linejoin
  • stroke-miterlimit
  • stroke-opacity
  • stroke-width
  • systemLanguage
  • target
  • text-anchor
  • to
  • transform
  • type
  • u1
  • u2
  • underline-position
  • underline-thickness
  • unicode
  • unicode-range
  • units-per-em
  • values
  • version
  • viewBox
  • visibility
  • width
  • widths
  • x
  • x-height
  • x1
  • x2
  • xlink:actuate
  • xlink:arcrole
  • xlink:href
  • xlink:role
  • xlink:show
  • xlink:title
  • xlink:type
  • xml:base
  • xml:lang
  • xml:space
  • xmlns
  • xmlns:xlink
  • y
  • y1
  • y2
  • zoomAndPan

CSS Rules

First urls matching the following regular expression are removed:

url\s*\(\s*[^\s)]+?\s*\)\s*

The style strings that don't match the following are deemed obfuscated, and ignored entirely:

^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$
^(\s*[-\w]+\s*:\s*[^:;]*(;|$))*$

style Properties

  • azimuth
  • background, background-*
  • border, border-*
  • clear
  • color
  • cursor
  • direction
  • display
  • elevation
  • float
  • font
  • font-family
  • font-size
  • font-style
  • font-variant
  • font-weight
  • height
  • letter-spacing
  • line-height
  • margin, margin-*
  • overflow
  • padding, padding-*
  • pause
  • pause-after
  • pause-before
  • pitch
  • pitch-range
  • richness
  • speak
  • speak-header
  • speak-numeral
  • speak-punctuation
  • speech-rate
  • stress
  • text-align
  • text-decoration
  • text-indent
  • unicode-bidi
  • vertical-align
  • voice-family
  • volume
  • white-space
  • width

style Property Values

  • auto
  • aqua
  • black
  • block
  • blue
  • bold
  • both
  • bottom
  • brown
  • center
  • collapse
  • dashed
  • dotted
  • fuchsia
  • gray
  • green
  • !important
  • italic
  • left
  • lime
  • maroon
  • medium
  • none
  • navy
  • normal
  • nowrap
  • olive
  • pointer
  • purple
  • red
  • right
  • solid
  • silver
  • teal
  • top
  • transparent
  • underline
  • white
  • yellow

In addition, values that match the following regular expression are valid:

^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$

svg sytle Properties

  • fill
  • fill-opacity
  • fill-rule
  • stroke
  • stroke-width
  • stroke-linecap
  • stroke-linejoin
  • stroke-opacity

URIs

Attributes whose value is a URI

  • href
  • src
  • cite
  • action
  • longdesc
  • xlink:href
  • xml:base

URI protocols

  • afs
  • aim
  • callto
  • data (see #Safe data URL content types)
  • ed2k
  • feed
  • ftp
  • gopher
  • http
  • https
  • irc
  • mailto
  • news
  • nntp
  • rsync
  • rtsp
  • sftp
  • ssh
  • tag
  • tel
  • telnet
  • urn
  • webcal
  • wtai
  • xmpp

Safe data URL content types

Note: This section is being discussed.

  • text/plain
  • image/gif
  • image/jpg
  • image/png