https://wiki.whatwg.org/api.php?action=feedcontributions&user=RBorja97&feedformat=atomWHATWG Wiki - User contributions [en]2024-03-29T10:40:43ZUser contributionsMediaWiki 1.39.3https://wiki.whatwg.org/index.php?title=HTML5Lib&diff=7260HTML5Lib2011-10-10T09:32:20Z<p>RBorja97: None</p>
<hr />
<div>[http://code.google.com/p/html5lib/ HTML5Lib] is a project to create both a Python-based and Ruby-based implementations of various parts of the WHATWG spec, in particular, a tokenizer, a parser, and a serializer. It is '''not''' an offical WHATWG project, however we plan to use this wiki to document and discuss the library design. The code is avaliable under an open-source MIT license.<br />
<br />
== SVN ==<br />
Please commit often with sort of detailed descriptions of what you did. If you want to make sure you're not going to redo ask on the [http://groups.google.com/group/html5lib-discuss mailing list]. For questions that could benefit from quick turnaround, talk to people on #whatwg.<br />
<br />
== General ==<br />
<br />
In comments "XXX" indicates something that has yet to be done. Something might be wrong, has not yet been written and other things in that general direction.<br />
<br />
In comments "AT" indicates that the comment documents an alternate implementation technique or strategy.<br />
<br />
== HTMLTokenizer ==<br />
<br />
The tokenizer is controlled by a single HTMLTokenizer class stored in tokenizer.py at the moment. You initialize the HTMLTokenizer with a stream argument that holds an HTMLInputStream. You can iterate over the object created to get tokens back.<br />
<br />
Currently tokens are objects, they will become dicts.<br />
<br />
=== Interface ===<br />
<br />
The parser needs to change the self.contentModelFlag attribute which affects how certain states are handled.<br />
<br />
=== Issues ===<br />
* Use of if statements in the states may be suboptimal (but we should time this)<br />
<br />
== HTMLParser ==<br />
<br />
=== Profiling on web-apps.htm ===<br />
<br />
I did some profiling on web-apps.htm which is a rather large document. Based on that I already changed a number of things which speed us up a bit. Below are some things to consider for future revisions:<br />
<br />
* utils.MethodDispatcher is invoked way too often. By pre declaring some of it in InBody I managed to decrease the amount of invocatoins by over 24.000, but InBody.__init__ is invoked about 7000 times for web-apps.htm so that amount could be higher. Not sure how to put them somewhere else though. First thing I tried was HTMLParser but references get all messed up then...<br />
: We should be able to store a single instance of each InsertionMode rather than creating a new one every time the mode switches. Hopefully we have been disiplined enough not to keep any state in those classes so the change should be painless.<br />
:: That's an interesting idea. How would that work? [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
::: I got an idea on how it might work and it worked! Still about 3863 invocations to utils.MethodDispatcher but it takes 0.000 CPU seconds. I suppose we can decrease that amount even more, but I wonder if it's worth it. [[User:Annevk|Annevk]] 11:37, 26 December 2006 (UTC)<br />
<br />
* 713194 calls to __contains__ in sets.py makes us slow. Takes about 1.0x CPU seconds. <br />
: I've just switched to the built-in sets type. hopefully this will help a bit [[User:Jgraham|Jgraham]] 00:30, 25 December 2006 (UTC)<br />
:: It did. (Not surprisingly when 700.000 method calls are gone...) [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* 440382 calls to char in tokenizer.py is the runner up with 0.8x CPU seconds.<br />
: This is now the largest time consumer. [[User:Annevk|Annevk]] 12:49, 25 December 2006 (UTC)<br />
<br />
* dataState in tokenizer.py with 0.7 CPU seconds is next.<br />
: This is now at 0.429 CPU seconds. Probably becase the tokenizer switched to dicts instead of custom Token objects. [[User:Annevk|Annevk]]<br />
<br />
* __iter__ in tokenizer.py with 0.59x CPU seconds...<br />
<br />
* Creation of all node objects in web-apps takes .57x CPU seconds.<br />
<br />
* etc.<br />
<br />
== Testcases ==<br />
Testcases are under the /tests directory. They require<span class="plainlinks">[http://thoughtmechanics.com/ <span style="color:black;font-weight:normal;text-decoration:none!important;background:none!important; text-decoration:none;">web design firms atlanta</span>]</span> New code should not be checked in if it regresses previously functional unit tests. Similarly, new tests that don't pass should not be checked in without both informing others on the [http://groups.google.com/group/html5lib-discuss mailing list] and a concrete plan. Ideally new features should be accompanied by new unit tests for those features. Documentation of the test format is available at [[Parser_tests]].<br />
<br />
<br />
<br />
[[Category:Implementations]]</div>RBorja97https://wiki.whatwg.org/index.php?title=XHTML2_versus_HTML5&diff=7259XHTML2 versus HTML52011-10-10T09:32:13Z<p>RBorja97: None</p>
<hr />
<div>http://www.w3.org/MarkUp/2009/ED-xhtml-modularization2-20090123/<br />
<br />
This is not about HTML versus XHTML, instead this page is an attempt to find out where XHTML2 and HTML5 features overlap, why certain design decisions in HTML5 have been made different, and why HTML5 lacks certain features XHTML2 has.<br />
<br />
It is not an attempt to demonstrate that 5 > 2. We know that.<br />
<br />
It is also very simple at this point. I wish my time was infinite.<br />
<br />
<br />
== XHTML Document Module ==<br />
<br />
=== The html element ===<br />
<br />
XHTML2 has version="" and xsi:schemaLocation="". HTML5 has neither.<br />
<br />
'''Rationale:''' HTML5 does away with versioning in HTML defining it equivalent to CSS in that regard. XXX add something ''nice'' about why we do not have xsi:schemaLocation=""<br />
<br />
=== The head element ===<br />
<br />
XHTML2 has profile="". HTML5 has not.<br />
<br />
'''Rationale:''' it does not appear to be used in the wild.<br />
<br />
'''Note:''' this is still being debated by the HTML WG.<br />
<br />
== XHTML Structural Module ==<br />
<br />
=== The blockcode element ===<br />
<br />
HTML5 does not have this element. In HTML5 you can use <pre><code> instead.<br />
<br />
'''Rationale:''' the blockcode element is not backwards compatible.<br />
<br />
=== The heading elements ===<br />
<br />
HTML5 does not have the h element. In HTML5 the h1-h6 elements work together with the section element. In XHTML2 only the h element works with the section element.<br />
<br />
'''Rationale:''' the h element is not backwards compatible. Also, it seems important to define interaction between the h1-h6 elements and <span class="plainlinks">[http://thoughtmechanics.com/ <span style="color:black;font-weight:normal;text-decoration:none!important;background:none!important; text-decoration:none;">web design service in atlanta</span>]</span> the section element so authors can more easily reuse existing content and assistive technology can still make sense of invalid pages.<br />
<br />
=== The separator element ===<br />
<br />
HTML5 does not have the separator element. It does have the hr element which means and does the same thing.<br />
<br />
'''Rationale:''' the separator element is not backwards compatible and we cannot do away with the hr element so adding an equivalent element would just make matters more complex.<br />
<br />
== XHTML Text Module ==<br />
<br />
=== The abbr element ===<br />
<br />
XHTML2 has a full attribute that can reference another element which provides the expansion (within the same page). HTML5 does not. HTML5 does this implicitly by comparing element contents.<br />
<br />
'''Rationale:''' Less work for authors.<br />
<br />
=== The l element ===<br />
<br />
XXX The XHTML2 open issues list says that the br element will be added back. Does this element stay though?<br />
<br />
== XHTML Hypertext Module ==<br />
<br />
XXX This module talks about adding the access element to the head element, but the link is broken.<br />
<br />
== XHTML List Module ==<br />
<br />
HTML5 does not have a a caption element to annotate list items.<br />
<br />
'''Rationale:''' XXX<br />
<br />
=== The dl, di, dt and dd elements ===<br />
<br />
HTML5 does not have the di element. (HTML5 is much clearer in defining this, by the way.)<br />
<br />
'''Rationale:''' the di element solves a styling problem.<br />
<br />
=== The nl element ===<br />
<br />
HTML5 does not have the nl element.<br />
<br />
'''Rationale:''' XXX<br />
<br />
== XHTML Core Attributes Module ==<br />
<br />
=== The xml:id attribute ===<br />
<br />
HTML5 does not have this attribute, however, <code>xml:id</code> can be used in XHTML5 web pages.<br />
<br />
'''Rationale:''' there already is an id attribute that works fine and can be used both in HTML and XHTML web pages whereas <code>xml:id</code> can only be used in XML documents.<br />
<br />
=== The layout attribute ===<br />
<br />
HTML5 does not have this attribute.<br />
<br />
'''Rationale:''' the pre element can be used instead.<br />
<br />
== XHTML Hypertext Attributes Module ==<br />
<br />
...</div>RBorja97https://wiki.whatwg.org/index.php?title=Image_Caption&diff=7258Image Caption2011-10-10T09:32:08Z<p>RBorja97: None</p>
<hr />
<div>Image caption are often found on the web, but there is no standard markup for this.<br />
<br />
== Problem Description ==<br />
Currently, most people use either a table, custom class names, or simply put the image inside a paragraph, each option either conveying a wrong meaning or being ambiguous with the rest of the content.<br />
<br />
An interesting analysis has been done on the subject by Dan Cederholm in one of his SimpleQuiz. [http://www.simplebits.com/notebook/2004/01/20/sqxi_conclusion.html His conclusion]:<br />
<br />
<blockquote>So in this case, I might choose option A -- because visually it shows the relationship between the items better than the others. But I suppose this is bad semantics. Or maybe just another case where we don't have the 'perfect' set of defined elements for this (very) specific job.</blockquote><br />
<br />
And his option A was:<br />
<pre><br />
<p><img scr="..."><br /><br />
Caption Text</p><br />
</pre><br />
<br />
In other word, he could not figure out anything good using current elements available in HTML, and, as most people do, had to create his own solution.<br />
<br />
Setting a standard for such things -- an explicit association between the caption and the illustration -- would help image search engines, it could enable the automatic creation of a figure index for a page. It would also be benificial for sight-impaired users. The fact that image captions should be treated differently to body text (they are not in the main flow of the document) suggests this element could be important for figure handling by assistive tools allowing e.g. aural browsers to skip captions except on explicit user request.<br />
<br />
=== Current Methods and Workarounds ===<br />
See [[Image Caption Examples]] for a couple of sample cases.<br />
<br />
== Proposed Solutions ==<br />
<br />
=== <figure> with <caption> ===<br />
A <figure> element contains illustrative content for the current section. It can contain a <caption> element, either as the first or the last child, that will be used to describe or give a caption to the content of the figure.<br />
<br />
<pre><br />
<figure><br />
<caption>Caption Text</caption><br />
<img src="..."><br />
</figure><br />
</pre><br />
<br />
==== Processing Model ====<br />
The processing model for HTML files must be changed so that the <caption> is no longer ignored when outside the context of a table. It could also be a good idea to add a new figure insertion mode that would prevent figure captions from being moved to the enclosing table when inside a table cell, otherwise <figure> will break in table-based layouts.<br />
<br />
<pre><br />
<table><br />
<tr><td><br />
<figure><br />
<caption>Caption Text</caption><br />
<img src="..."><br />
</figure><br />
</td></tr><br />
</table><br />
</pre><br />
<br />
==== Limitations ====<br />
<caption> being ignored by current browsers when outside a table makes it impossible to style, and it'll also be terribly broken with table layouts when figure captions end up at the top (or the bottom) of the enclosing layout table.<br />
<br />
==== Implementation ====<br />
Parsing changes in this solution could be hard to implement given <caption> element's legacy within <table>.<br />
<br />
Putting aside the parsing problem, there's not much else to implement for visual browsers. A good display model that could be used to display figures is already available in CSS 2.1:<br />
<br />
<pre><br />
figure { display: table; }<br />
caption { display: table-caption; }<br />
</pre><br />
<br />
This would display the figure as a one-cell table, and the caption either at the top or at the bottom (depending on the [http://www.w3.org/TR/CSS21/tables.html#propdef-caption-side caption-side] property). The interesting part of this model is that the caption's width is constrained by the width of the figure, making it the ideal choice for floated figures.<br />
<br />
==== Adoption ==== <br />
The syntax is pretty straightforward to use. "figure" and "caption" are commonly used terms to designate exactly this feature in the print world. It should be a natural choice to authors that wonder how to markup their images.<br />
<br />
This markup won't work however if an author wants the caption to be elsewhere in the document. (In this proposal, <caption> is pinned to the figure's content.) It does not seem a common use case however.<br />
<br />
=== <figure> with <legend> ===<br />
A <figure> element contains illustrative content <span class="plainlinks">[http://thoughtmechanics.com/ <span style="color:black;font-weight:normal;text-decoration:none!important;background:none!important; text-decoration:none;">website design</span>]</span> for the current section. It can contain a <legend> element, either as the first or the last child, that will be used to describe or give a caption to the content of the figure.<br />
<br />
<pre><br />
<figure><br />
<legend>Caption Text</legend><br />
<img src="..."><br />
</figure><br />
</pre><br />
<br />
==== Processing Model ====<br />
:''To be completed''<br />
<br />
==== Limitations ====<br />
:''To be completed''<br />
<br />
==== Implementation ====<br />
A good display model that could be used to display figures is already available in CSS 2.1, the table model. A default stylesheet could look like this:<br />
<br />
<pre><br />
figure { display: table; }<br />
figure legend { display: table-caption; }<br />
</pre><br />
<br />
This would display the figure as a one-cell table, and the figure legend either at the top or at the bottom (depending on the [http://www.w3.org/TR/CSS21/tables.html#propdef-caption-side caption-side] property). The interesting part of this model is that the legend's width is constrained by the width of the figure, making it the ideal choice for floated figures.<br />
<br />
==== Adoption ==== <br />
"figure" and "legend" are commonly used terms in the print world, so their use could prove natural to authors. It is most likely that authors that need a markup for their figure will chose this one if it is sanctioned in a standard.<br />
<br />
This markup won't work if an author wants the caption to be elsewhere in the document. (In this proposal, <legend> is pinned to the figure's content.) It does not seem a common use case however.<br />
<br />
=== Adjacent <caption> or <legend> ===<br />
<caption> or <legend> elements directly following a <img> element give the caption text for that image.<br />
<br />
<pre><br />
<img src="..."><br />
<caption>Caption Text</caption><br />
</pre><br />
<br />
<pre><br />
<img src="..."><br />
<legend>Caption Text</legend><br />
</pre><br />
<br />
==== Processing Model ====<br />
The processing model for HTML files must be changed so that the <caption> is no longer ignored when outside the context of a table. It could also be a good idea to add a new figure insertion mode that would prevent figure captions from being moved to the enclosing table when inside a table cell, otherwise <figure> will break in table-based layouts.<br />
<br />
:''Are the elements in this construct inline or block-level content? Currently <img> is inline.''<br />
<br />
If any other element, or non-whitespace text nodes are found between <img> and its corresponding caption element, elements are considered to not be adjacent, the semantic link is broken and it generates a parse error.<br />
<br />
==== Limitations ====<br />
<caption> being ignored by current browsers when outside a table makes it impossible to style, and it'll also be terribly broken with table layouts when captions end up at the top (or the bottom) of the enclosing layout table.<br />
<br />
:''To be completed: <legend> parsing''<br />
<br />
==== Implementation ====<br />
Parsing changes in this solution could be hard to implement given <caption> element's legacy within <table>.<br />
<br />
:''To be completed: <legend> parsing implementation''<br />
<br />
Giving a distinctive visual style to figure captions may prove difficult with this design. If a browser wants to treat figures in a distinctive manner, it'll have treat them as a special case; the adjacent element selector in CSS can't distinguish between adjacent elements which are separated by text and those that are not.<br />
<br />
==== Adoption ==== <br />
:''To be completed''<br />
<br />
"legend" and "caption" are commonly used terms in the print world, so their use could prove natural to authors. Difficulties in styling are likely however to cause authors to always warp figures in a <div> element as most already do anyway (see [[Image Caption Examples]]).<br />
<br />
This markup won't work if an author wants the caption to be elsewhere in the document. It does not seem a common use case however.<br />
<br />
=== <label> defining attributes with nested markup ===<br />
A <label> element holds a value which should be treated the same way like the title attribute on <img>, except that it can contain nested markup. The for attribute of the label contains the id of the target element. A new type attribute on the label indicates which attribute the label intend to replace on the target.<br />
<br />
<pre><br />
<img id="img1" src="..."><br />
<label for="img1" type="title">...</label><br />
</pre><br />
<br />
==== Processing Model ====<br />
:''To be completed: Attribute override, progressive rendering, etc.''<br />
<br />
==== Limitations ====<br />
:''To be completed''<br />
<br />
==== Implementation ====<br />
:''To be completed''<br />
<br />
==== Adoption ==== <br />
This markup has the benefit that it'll work if an author wants the caption to be elsewhere in the document.<br />
<br />
:''To be completed''<br />
<br />
<br />
[[Category:Proposals]]</div>RBorja97