A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Extensions
Ways to arbitrarily extend text/html for new vocabularies
Please put ideas for what it should look like here, each in their own section.
Each example should explain in details (ideally with examples) how to handle:
- Syntax errors at the tokeniser level, the tree construction level, and the schema level.
- Existing content that happens to use elements or syntax that you are proposing have special processing rules.
- Pages that contain any special syntax after that syntax was copied and pasted by an ignorant Web author from a valid page written by a competent Web author aware of the new syntax.
See also SVG-specific proposals in Diagrams in HTML.
Proposal 1: xmlns strawman
When you hit an element with an xmlns="" attribute, switch to an XML parser until that parser has parsed the matching end tag.
bla bla text/html bla bla <foo xmlns="http://example.com/foo"><this><must/> be<valid>XML! </valid></this> must be.</foo> bla bla text/html
Errors cause the entire page to stop parsing.
Existing pages are not handled.
Pages that copy-and-paste this syntax then use it incorrectly are not handled.
Reasons why we can't do this
- There are pages that already specify xmlns="" attributes that would break if the content were processed as XML. For example, http://www.live.com/.
- Probably, xmlns="" attribute, when used for HTML5 extensibility purposes, should be clearly marked as such, to disambiguate from legacy uses. For example, it could be explicitly declared at the root of the document:
<html xmlns:xmlns="urn:html5:xmlns:for-example"> ... <foo xmlns="http://example.com/foo"> <!-- the region of the "foo" extension --> </foo> ... </html>
Proposal 2: Extensibility Element (<ext>)
This is a possible generic extensibility point, for SVG and possibly MathML or other XML content. Naturally, any content placed in an <ext> element would have to be understood by the UA in order to render correctly, and more complex rules may need to be developed for specific kinds of interaction between the root document and the inline content, such as with script, CSS, etc. For some discussion, see the IRC logs.
(The <ext> element would have a name that doesn't clash with existing content. Inside you can use XML or another format.)
<p>Hello world. <ext> <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10"> <circle x="5" y="5 r="5" stroke="green"/> </svg> </ext> </p>
We should define a content model for where the <ext> element can occur, and if there are implications for different locations (such as inside a table, a paragraph, the head, etc). The simplest thing, at least for SVG (and probably MathML), would be that it would have the same restrictions as an <img> element. Also, there should be a default block model for <ext> in CSS.
What's the exact processing model?
Notes:
- This is similar to IE's "XML islands" with the <xml> element. It's believed that there are some conflicts with the <xml> element itself, since it creates a separate document that is tied to the <xml> element in the DOM, but more research is needed.
- The <ext> element could potentially be an implicit element, generated by the HTML5 parser on encountering a start tag of e.g. "<svg " or "<math ". That would save authors of having to type this extra element, but has a drawback in that it doesn't provide fallback content for legacy UA:s. -Ed
- We could specify exactly what flavors of markup must be supported by a UA, and which may be supported by a UA. This would be rather restrictive, but could improve interoperability of UA features, and would ensure that the proper DOM interfaces are available. For example, SVG and MathML must be supported, and FooML may be supported (or something).
Error Handling
Pick one! Or separate the proposal into several proposals, for each different proposal, so that they can be evaluated. The proposals below are just brief notes, not detailed enough for me to know what you mean. -Hixie
The main options seem to be:
- strict XML parsing (not favored by many)
- very permissive error handling (as in HTML5); this idea is controversial and has many open issues, which should be detailed below
- moderate error handling, as detailed in SVG Tiny 1.2
- other ideas?
The chief risk with permissive error handling is that it would create content that is not compatible across different UAs, including mobile devices and authoring tools.
Tentative proposals:
- Tokenizer recovers from errors by ignoring them and moving on; for SVG, any element with errors is not rendered.
- Tree construction recovers from errors by closing the <svg> element, and not rendering any content after the error.
- Case folding is not supported within the main body of the <ext> element, though it would be within the <fallback> element.
- The tree builder would assign the appropriate namespace URI to the element nodes it creates.
- If the "/>" is not found at the end of an element, all subsequent element will be placed as child elements of the element (and thus not rendered) until a matching closing tag is found, or until the a matching root tag is found, or until the "</ext>" element is found.
- Unknown content is ignored
- Unquoted attribute values will be ignored (should the element also not be rendered?)
See an email by Henri Sivonen for comparison and contrast.
Embedded HTML
The case of content inside a <foreignObject> element could be subject to the parsing model of the root document. (Note that this is only a partial solution, and more thought and details are needed.)
For content outside <foreignObject>, it should follow the XML processing rules.
Fallback Behavior
This is an opportunity to get nice fallback behavior, as well.
Here's a possible suggestion, where the raster image would show in UAs that didn't support the <ext> syntax, and the SVG would show in those that did (and which support SVG). In UAs which support <ext> and not SVG, the fallback would also be the raster. The fallback content should be inside a wrapper element (<fallback>), so that you can have rich fallback options, such as an image map, a table, <canvas> and an accompanying <script> element, or whatever; in this case, I also include fallback CSS to hide textual content in title, desc, and text elements, but it may be desirable to leave this content as alternate text to the image, even including styling.
For MathML content, a conditional CSS override could allow for CSS styling of MathML elements for those that don't render MathML natively.
Note: as stated before, the names of the <ext> and <fallback> elements are subject to change based on existing element names in the wild.
<html lang="en"> <head> <title>HTML Extensibility Test</title> </head> <body> <h1 id="test_of_extensibility">Test of Extensibility</h1> <p>This is a test of an extensibility point in text/html, with a fallback mechanism.</p> <ext> <fallback> <img src="anIsland.png" alt="..."/> <style type='text/css'> svg > * { display: none; } </style> </fallback> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="100%" version="1.1"> <title>My Title</title> <desc>schepers, 01-04-2008</desc> <circle id="circle_1" cx="75" cy="25" r="20" fill="lime" /> <text id='text_1' x='10' y='25' font-size='18' fill='crimson'>This is some text.</text> </svg> </ext> </body> </html>
Reasons why we can't do this
It's not clear what the processing model being proposed actually is. However, there is already one problem:
- The idea relies on not conflicting with legacy content. Unfortunately, whatever syntax we end up using, people will copy and paste it from documents that were written by competent authors that tested it against the new UAs, into documents written by authors who don't know about this, and who don't have the new UA, thus creating new "legacy documents" that use whatever syntax we come up with. Saying the risk is minimal doesn't mitigate this problem. It's a real problem, and we have to deal with it. I think this risk is minimal, since it clearly wouldn't work in the legacy UAs, and so the mistake will have less reason to propagate. -Shepazu
Note also that the fallback idea doesn't work. Elements like <script>, <style>, <title>, <input>, <textarea> etc, get treated as HTML elements in legacy UAs. Relying on CSS for hiding the text content doesn't work either, because CSS is optional and might not be enabled (or supported). (It doesn't much matter, though, because fallback isn't one of the things we're trying to address with this.)
Proposal 3: XML5
Microsoft has published a whitepaper on the subject of Improved Namespace Support. Salient features:
- Windows Internet Explorer 8 Beta 1 for Developers offers Web developers the opportunity to write standards-compliant HTML-based Web pages that support features (such as SVG, XUL, and MathML) in namespaces, provided that the client has installed appropriate handlers for those namespaces via binary behaviors. (A binary behavior is a type of ActiveX control.) Tbroyer: note that those behaviors don't change the way the markup is parsed into a DOM; at least for elements whose name contains a colon (haven't tested this in IE8, but this is the way it is since IE5.5)
- Internet Explorer 8 does not support the XHTML namespace definition. Thus, default namespace declarations of XHTML are ignored (xmlns="http://www.w3.org/1999/xhtml"). Tbroyer: this means that you cannot switch from a default namespace back to HTML (actually, this is true in IE8 in a more general fashion: once you've set a default namespace (i.e. once you've leaved "HTML"), you cannot switch to another; the whitepaper describes this as "Nesting of multiple default namespaces is not allowed; in other words, a default namespace declaration inside of another default namespace declaration will be ignored."
- Internet Explorer 8 does not support default namespace declarations on any known elements such as HTML, SCRIPT, DIV, or STYLE. If default namespace declarations are encountered on these elements, the declaration is ignored (for purposes of existing Web page compatibility).
A few notes:
- While Microsoft's IE8 implementation as described by this whitepaper does not satisfy all of the requirements; the above list focuses on the parts that do.
- While Microsoft's implementation is based on ActiveX (Tbroyer: see above, ActiveX give you the behaviors associated with a given namespace URI, but doesn't change the parsing algorithm), the situation could very well end up being similar to XMLHttpRequest whereby the functionality was first exposed via ActiveX, other browser vendors adopted an alternate object model interface to this same functionality, and that interface was later adopted and standardized.
- While the white paper does not explicitly state this requirement, the approach works best if the simple name for the unknown (to HTML5) element which contains the default namespace declaration for which a binary behavior has been installed is not contained within the subtree. Both SVG and MathML have unique elements (
svg
andmath
, respectively) that satisfy this purpose. This gives proposal 3 some of the desirable characteristics of proposal 2 spelled out above. - In order to meet the Resistance to errors (e.g. not brittle in the face of syntax errors) requirement, something akin to Anne van Kesteren's XML5 would be required, an implementation of which can be seen on Google Code. Tbroyer: see also the namespaces-in-text-html branch of html5lib