A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).


From WHATWG Wiki
Revision as of 06:08, 2 April 2008 by Hixie (talk | contribs)
Jump to: navigation, search

Ways to arbitrarily extend text/html for new vocabularies

Please put ideas for what it should look like here, each in their own section.

Each example should explain in details (ideally with examples) how to handle:

  • Syntax errors at the tokeniser level, the tree construction level, and the schema level.
  • Existing content that happens to use elements or syntax that you are proposing have special processing rules.
  • Pages that contain any special syntax after that syntax was copied and pasted by an ignorant Web author from a valid page written by a competent Web author aware of the new syntax.

See also SVG-specific proposals in Diagrams in HTML.

Proposal 1: xmlns strawman

When you hit an element with an xmlns="" attribute, switch to an XML parser until that parser has parsed the matching end tag.

 bla bla text/html bla bla <foo xmlns="http://example.com/foo"><this><must/>
 be<valid>XML! </valid></this> must be.</foo> bla bla text/html

Errors cause the entire page to stop parsing.

Existing pages are not handled.

Pages that copy-and-paste this syntax then use it incorrectly are not handled.

Reasons why we can't do this

  • There are pages that already specify xmlns="" attributes that would break if the content were processed as XML. For example, http://www.live.com/.

Proposal 2: Extensibility Element (<ext>)

This is a possible generic extensibility point, for SVG and possibly MathML or other XML content. Naturally, any content placed in an <ext> element would have to be understood by the UA in order to render correctly, and more complex rules may need to be developed for specific kinds of interaction between the root document and the inline content, such as with script, CSS, etc. For some discussion, see the IRC logs.

(The <ext> element would have a name that doesn't clash with existing content. Inside you can use XML or another format.)

 <p>Hello world. 
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10">
          <circle x="5" y="5 r="5" stroke="green"/>

We should define a content model for where the <ext> element can occur, and if there are implications for different locations (such as inside a table, a paragraph, the head, etc). The simplest thing, at least for SVG (and probably MathML), would be that it would have the same restrictions as an element. Also, there should be a default block model for <ext> in CSS.

Note that this is similar to IE's "XML islands" with the <xml> element. It's believed that there are some conflicts with the <xml> element itself, since it creates a separate document that is tied to the <xml> element in the DOM, but more research is needed.

The <ext> element could potentially be an implicit element, generated by the HTML5 parser on encountering a start tag of e.g. "<svg " or "<math ". That would save authors of having to type this extra element, but has a drawback in that it doesn't provide fallback content for legacy UA:s. -Ed

Note that we could specify exactly what flavors of markup must be supported by a UA, and which may be supported by a UA. This would be rather restrictive, but could improve interoperability of UA features, and would ensure that the proper DOM interfaces are available. For example, SVG and MathML must be supported, and FooML may be supported (or something).

Error Handling

Pick one! Or separate the proposal into several proposals, for each different proposal, so that they can be evaluated. The proposals below are just brief notes, not detailed enough for me to know what you mean. -Hixie

The main options seem to be:

  1. strict XML parsing (not favored by many)
  2. very permissive error handling (as in HTML5); this idea is controversial and has many open issues, which should be detailed below
  3. moderate error handling, as detailed in SVG Tiny 1.2
  4. other ideas?

The chief risk with permissive error handling is that it would create content that is not compatible across different UAs, including mobile devices and authoring tools.

Tentative proposals:

  • Tokenizer recovers from errors by ignoring them and moving on; for SVG, any element with errors is not rendered.
  • Tree construction recovers from errors by closing the <svg> element, and not rendering any content after the error.
  • Case folding is not supported within the main body of the <ext> element, though it would be within the <fallback> element.
  • The tree builder would assign the appropriate namespace URI to the element nodes it creates.
  • If the "/>" is not found at the end of an element, all subsequent element will be placed as child elements of the element (and thus not rendered) until a matching closing tag is found, or until the a matching root tag is found, or until the "</ext>" element is found.
  • Unknown content is ignored
  • Unquoted attribute values will be ignored (should the element also not be rendered?)

See an email by Henri Sivonen for comparison and contrast.

Embedded HTML

The case of content inside a <foreignObject> element could be subject to the parsing model of the root document. (Note that this is only a partial solution, and more thought and details are needed.)

For content outside <foreignObject>, it should follow the XML processing rules.

Fallback Behavior

This is an opportunity to get nice fallback behavior, as well.

Here's a possible suggestion, where the raster image would show in UAs that didn't support the <ext> syntax, and the SVG would show in those that did (and which support SVG). In UAs which support <ext> and not SVG, the fallback would also be the raster. The fallback content should be inside a wrapper element (<fallback>), so that you can have rich fallback options, such as an image map, a table, <canvas> and an accompanying <script> element, or whatever; in this case, I also include fallback CSS to hide textual content in title, desc, and text elements, but it may be desirable to leave this content as alternate text to the image, even including styling.

For MathML content, a conditional CSS override could allow for CSS styling of MathML elements for those that don't render MathML natively.

Note: as stated before, the names of the <ext> and <fallback> elements are subject to change based on existing element names in the wild.

<html lang="en">
	<title>HTML Extensibility Test</title>
	<h1 id="test_of_extensibility">Test of Extensibility</h1>
	<p>This is a test of an extensibility point in text/html, with a fallback mechanism.</p>
			<img src="anIsland.png" alt="..."/>
			<style type='text/css'>
			   svg > * { display: none; }
		<svg xmlns="http://www.w3.org/2000/svg"
		     width="100%" height="100%"
			<title>My Title</title>
			<desc>schepers, 01-04-2008</desc>
			<circle id="circle_1" cx="75" cy="25" r="20" fill="lime" />
      			<text id='text_1' x='10' y='25' font-size='18' fill='crimson'>This is some text.</text>

Reasons why we can't do this

  • Unfortunately, whatever syntax we end up using, people will copy and paste it from documents that were written by competent authors that tested it against the new UAs, into documents written by authors who don't know about this, and who don't have the new UA, thus creating new "legacy documents" that use whatever syntax we come up with. Saying the risk is minimal doesn't mitigate this problem. It's a real problem, and we have to deal with it. I think this risk is minimal, since it clearly wouldn't work in the legacy UAs, and so the mistake will have less reason to propagate. -Shepazu
  • The fallback idea doesn't work. Elements like <script>, <style>, <title>, <input>, <textarea> etc, get treated as HTML elements in legacy UAs. Relying on CSS for hiding the text content doesn't work either, because CSS is optional and might not be enabled (or supported). (It doesn't much matter, though, because fallback isn't one of the things we're trying to address with this.)