A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "New Vocabularies Solution"

From WHATWG Wiki
Jump to: navigation, search
(+spec)
 
(30 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''this doesn't handle these cases well:'''
+
{{obsolete|spec=[http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#tree-construction HTML Standard: Tree construction] and [http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenization HTML Standard: Tokenization]}}
<pre>
 
<table><math></html><!-- x --> (</html> should be ignored)
 
<table><td><math></html><!-- x --> (ditto)
 
<table><caption><math></html><!-- x --> (ditto)
 
<table><caption><math><mtext><caption>x (<caption> should start new caption)
 
<em><button><math></em>x (</em> should be ignored)
 
</pre>
 
'''to fix these we'll need a secondary insertion mode which is set whenever the insertion mode is set to "in math/svg", including in the reset algorithm (just jump down to the secondary steps and continue the walk up until you hit either the root, or a caption, td/th, or table)'''
 
 
 
'''what about:'''
 
<pre>
 
<table><caption><math><mrow><mtext><svg><circle></mrow>
 
<table><caption><math><mtext><svg><circle><caption>
 
</pre>
 
'''check that those are handled right. maybe we need a stack for the insertion mode which is "{svg|math}*, {cell|caption|table|body}"'''
 
 
 
resetting appropriately:
 
* if current node is in the mathml namespace and is one of <mi>, <mo>, <mn>, <ms>, <mtext>, "in math content"
 
* if current node is in the mathml namespace, "in math"
 
* if current node is in the svg namespace and is one of <foreignObject>, <title>, <desc> "in svg content"
 
* if current node is in the svg namespace, "in svg"
 
  
 
tokeniser:
 
tokeniser:
* tokeniser changes so that when insertion mode is "in math" or "in svg", support CDATA blocks.
+
* move the insertion mode flag to before the tokeniser ✓‬
 +
* tokeniser changes so that when insertion mode is "in namespace", support CDATA blocks. ✓‬
 +
** add a paragraph to the "Markup declaration open state" saying that if the next seven characters match "[CDATA[", and the insertion mode is "in namespace", and the current node is not html nor either <mi>, <mo>, <mn>, <ms>, <mtext> in mathml, nor <foreignObject>, <desc>, <title> in svg, then you switch to a state that emits character tokens until it hits "]]>". ✓‬
 
* tokeniser keeps track of /> endings.
 
* tokeniser keeps track of /> endings.
 +
** add a new tokeniser state which you go to when hitting a / instead of going to the "before attribute name state". This new state has just two exits -- one for ">", which sets a flag saying that the tag is self-closing, and one for anything else, which has a parse error and reconsumes in the "before attribute name state". ✓‬
 +
** make it a parse error for end tags to have this slash ✓
 +
** change the definition of "permitted slash" paragraph to instead say "if a start tag is emitted with the self-closing flag set, and the token is processed by the tree construction stage without that flag being acknowledged, then there is a parse error". ✓
 +
** change the handling of the void elements "in body" and "in head" (anywhere else?) to acknowledge the self-closing flag. ✓‬
 
* we add all the MathML entities to the entity list.
 
* we add all the MathML entities to the entity list.
* &phi; works differently when in "in math" or "in math content".
+
 
 +
resetting appropriately:
 +
* if node is in a namespace other than html, "in namespace" ✓‬
  
 
'''"in body":'''
 
'''"in body":'''
* "math" element - switch to "in math"
+
* "math" element:
* "svg" element - switch to "in svg"
+
*# insert math element in mathml namespace✓‬
 
+
*# switch to "in namespace", with secondary mode set to whatever insertion mode used to be✓‬
'''"in math":'''
+
*# if the tag had a closing slash, imply a closing tag with the same tag name✓‬
 
+
* "svg" element
* comment
+
*# insert svg element in svg namespace✓‬
*# insert comment
+
*# switch to "in namespace", with secondary mode set to whatever insertion mode used to be✓‬
 
+
*# if the tag had a closing slash, imply a closing tag with the same tag name✓‬
* text:
 
*# insert text
 
 
 
* doctype
 
*# parse error
 
 
 
* start tag for: <mi>, <mo>, <mn>, <ms>, <mtext>
 
*# insert element for token
 
*# switch to "in math content"
 
*# if the tag had a closing slash, imply a closing tag with the same tag name
 
 
 
* start tag for one of: maction maligngroup malignmark menclose merror mfenced mfrac mglyph mlabeledtr mmultiscripts mover mpadded mphantom mprescripts mroot mrow mspace msqrt mstyle msub msubsup msup mtable mtd mtr munder munderover none
 
*# insert element for token
 
*# if the tag had a closing slash, imply a closing tag with the same tag name
 
 
 
* other start tag:
 
* end tag: </p> or </br>:
 
*# parse error
 
*# pop until <math> element is popped
 
*# reset insertion mode
 
*# reprocess
 
 
 
* other end tag
 
*# if current element has the tag name of the token: pop it
 
*# otherwise: if there is a matching element in scope and it has the mathml namespace: parse error, ignore token
 
*# otherwise:
 
*## parse error
 
*## pop until <math> element is popped
 
*## reset insertion mode
 
*## reprocess
 
 
 
 
 
'''"in math content":'''
 
 
 
* start tag for one of: mglyph malignmark
 
*# insert element for token, treat as void
 
 
 
* end tag:
 
*# if the bottommost node on the stack, ignoring those whose end tags can be implied, is in the mathml namespace and, ignoring case, has the same tag name as the token, imply end tags, pop the current node, and switch to "in math"
 
*# otherwise, treat as "in body"
 
 
 
* otherwise
 
*# treat as "in body"
 
 
 
 
 
 
 
'''"in svg":'''
 
  
First, perform tagname and attribute name fixup as follows: ...
+
'''"in namespace":'''
  
 
* comment
 
* comment
Line 96: Line 36:
 
*# parse error
 
*# parse error
  
* start tag for: <foreignObject>, <desc>, <title>
+
* start tag if current node is <mi>, <mo>, <mn>, <ms>, <mtext> in mathml
*# insert element for token
+
* start tag if current node is <foreignObject>, <desc>, <title> in svg
*# switch to "in svg content"
+
* start tag if current node is in the html namespace
*# if the tag had a closing slash, imply a closing tag with the same tag name
+
* start tag with tag name "svg" if current node is <annotation-xml> in mathml
 
+
* end tag
* start tag for one of: a altGlyph altGlyphDef altGlyphItem animate animateColor animateMotion animateTransform circle clipPath color-profile cursor definition-src defs desc ellipse feBlend feColorMatrix feComponentTransfer feComposite feConvolveMatrix feDiffuseLighting feDisplacementMap feDistantLight feFlood feFuncA feFuncB feFuncG feFuncR feGaussianBlur feImage feMerge feMergeNode feMorphology feOffset fePointLight feSpecularLighting feSpotLight feTile feTurbulence filter font font-face font-face-format font-face-name font-face-src font-face-uri foreignObject g glyph glyphRef hkern image line linearGradient marker mask metadata missing-glyph mpath path pattern polygon polyline radialGradient rect script set stop style svg switch symbol text textPath title tref tspan use view vkern
+
*# treat as in the secondary mode
*# insert element for token
+
*# if the insertion mode is still "in namespace", but there is no namespaced element in scope, switch to the secondary mode
*# if the tag had a closing slash, imply a closing tag with the same tag name
 
  
* other start tag:
+
* start tag for one of: (html elements)
* end tag: </p> or </br>:
 
 
*# parse error
 
*# parse error
*# pop until <svg> element is popped
+
*# pop nodes until the current node is in the html namespace
*# reset insertion mode
+
*# switch to the secondary mode
*# reprocess
+
*# reprocess token
 
 
* other end tag
 
*# if current element has the tag name of the token: pop it
 
*# otherwise: if there is a matching element in scope and it has the svg namespace: parse error, ignore token
 
*# otherwise:
 
*## parse error
 
*## pop until <svg> element is popped
 
*## reset insertion mode
 
*## reprocess
 
 
 
 
 
  
'''"in svg content":'''
+
* other start tag
 +
*# if namespace is svg, apply case fixups
 +
*# insert element for token, in same namespace as current node
 +
*# if the tag had a closing slash, imply a closing tag with the same tag name and acknowledge the self-closing flag
  
* end tag:
+
syntax:
*# if the bottommost node on the stack, ignoring those whose end tags can be implied, is in the svg namespace and, ignoring case, has the same tag name as the token, imply end tags, pop the current node, and switch to "in svg"
+
* add a kind of element for those elements not in the HTML namespace ✓‬
*# otherwise, treat as "in body"
+
* add a definition for how tags work for those elements to the Tags paragraph, including mentioning that start tags can be marked as self-closing, in which case there must not be a matching end tag ✓‬
 +
* define what these elements can contain in the subsequent paragraphs, including cdata; nothing if the start tag is marked self-closing ✓‬
 +
* add the definition of the /> syntax to the start tag bit, "self-closing" start tag ✓‬
 +
* define cdata blocks ✓‬
  
* otherwise
+
content models:
*# treat as "in body"
+
* define MathML's <math> element as phrasing ✓‬
 +
* define SVG's <svg> as embedded ✓‬
  
  

Latest revision as of 16:11, 10 November 2012

This document is obsolete.

For the current specification, see: HTML Standard: Tree construction and HTML Standard: Tokenization


tokeniser:

  • move the insertion mode flag to before the tokeniser ✓‬
  • tokeniser changes so that when insertion mode is "in namespace", support CDATA blocks. ✓‬
    • add a paragraph to the "Markup declaration open state" saying that if the next seven characters match "[CDATA[", and the insertion mode is "in namespace", and the current node is not html nor either <mi>, <mo>, <mn>, <ms>, <mtext> in mathml, nor <foreignObject>, <desc>, <title> in svg, then you switch to a state that emits character tokens until it hits "]]>". ✓‬
  • tokeniser keeps track of /> endings.
    • add a new tokeniser state which you go to when hitting a / instead of going to the "before attribute name state". This new state has just two exits -- one for ">", which sets a flag saying that the tag is self-closing, and one for anything else, which has a parse error and reconsumes in the "before attribute name state". ✓‬
    • make it a parse error for end tags to have this slash ✓
    • change the definition of "permitted slash" paragraph to instead say "if a start tag is emitted with the self-closing flag set, and the token is processed by the tree construction stage without that flag being acknowledged, then there is a parse error". ✓
    • change the handling of the void elements "in body" and "in head" (anywhere else?) to acknowledge the self-closing flag. ✓‬
  • we add all the MathML entities to the entity list.

resetting appropriately:

  • if node is in a namespace other than html, "in namespace" ✓‬

"in body":

  • "math" element:
    1. insert math element in mathml namespace✓‬
    2. switch to "in namespace", with secondary mode set to whatever insertion mode used to be✓‬
    3. if the tag had a closing slash, imply a closing tag with the same tag name✓‬
  • "svg" element
    1. insert svg element in svg namespace✓‬
    2. switch to "in namespace", with secondary mode set to whatever insertion mode used to be✓‬
    3. if the tag had a closing slash, imply a closing tag with the same tag name✓‬

"in namespace":

  • comment
    1. insert comment
  • text:
    1. insert text
  • doctype
    1. parse error
  • start tag if current node is <mi>, <mo>, <mn>, <ms>, <mtext> in mathml
  • start tag if current node is <foreignObject>, <desc>, <title> in svg
  • start tag if current node is in the html namespace
  • start tag with tag name "svg" if current node is <annotation-xml> in mathml
  • end tag
    1. treat as in the secondary mode
    2. if the insertion mode is still "in namespace", but there is no namespaced element in scope, switch to the secondary mode
  • start tag for one of: (html elements)
    1. parse error
    2. pop nodes until the current node is in the html namespace
    3. switch to the secondary mode
    4. reprocess token
  • other start tag
    1. if namespace is svg, apply case fixups
    2. insert element for token, in same namespace as current node
    3. if the tag had a closing slash, imply a closing tag with the same tag name and acknowledge the self-closing flag

syntax:

  • add a kind of element for those elements not in the HTML namespace ✓‬
  • add a definition for how tags work for those elements to the Tags paragraph, including mentioning that start tags can be marked as self-closing, in which case there must not be a matching end tag ✓‬
  • define what these elements can contain in the subsequent paragraphs, including cdata; nothing if the start tag is marked self-closing ✓‬
  • add the definition of the /> syntax to the start tag bit, "self-closing" start tag ✓‬
  • define cdata blocks ✓‬

content models:

  • define MathML's <math> element as phrasing ✓‬
  • define SVG's <svg> as embedded ✓‬


MathML error handling in text/html

Require the following behaviour, since MathML requires the host language to define error handling:

  • An element with the wrong number or type of children must render as an error indicator. (list cases)
  • a sequence of one or more text nodes containing non-inter-element-whitespace must be treated as <mtext>

Interaction of MathML and CSS ('display')

...