URL: Difference between revisions

Revision as of 15:20, 10 November 2012

This documents research and notes around URLs for the URL standard.

Implementations

Tests

https://github.com/cweb/iri-tests

Variants of the following code (runs in Live DOM Viewer) are useful to test which code points are URL escaped in browsers:

<!DOCTYPE html>
<script>
var a = document.createElement("a")

i = 0
cp = 0x100

while ( i < cp ) {
  a.href = "http://x" + String.fromCharCode(i) + "@x/"
  if(a.href.length != "http://x)@x/".length) {
    w(a.href)
  }
  i++
}
</script>

Parsing

https://github.com/annevk/url/blob/master/url.js
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Sep/0305.html has notes on file URLs in Gecko.

JavaScript libraries

For improving the API we might want to take inspiration from:

Schemes

Currently the parser does not separate out query, this could be problematic for about and maybe mailto.

data
javascript
mailto
about (uselessly defined in RFC 6694)

IDNA

IDNA2003 below is IDNA2003 with updated Unicode (in theory IDNA2003 restricts Unicode to 3.2?)

Opera: http://www.alvestrand.no/pipermail/idna-update/2012-November/007455.html (IDNA2008 + deviations); email does not mention domain label separators or fullwidth mapping, presumably http://tools.ietf.org/html/rfc5895#section-2 is implemented too (though not entirely, see label separators)
Firefox: IDNA2003
- https://bugzilla.mozilla.org/show_bug.cgi?id=479520 (implement IDNA2008)
Safari/Chrome: IDNA2003
Internet Explorer: ?
http://mathias.html5.org/tests/url/idna2003-separators/ IDNA2003 domain label separators are supported everywhere
Browsers have no restrictions in the ASCII range either. E.g. labels with underscores work.

What algorithms do we need. ToLabels(domain string) -> list of labels (trailing dot) or failure. ToASCII(label) -> ASCII-label. ToUnicode(label) -> Unicode label. ToLabels should do validation and such too. ToASCII and ToUnicode ideally never fail because ToLabels already ensured validity.

@@ Line 53: / Line 53: @@
 * mailto
 * about (uselessly defined in RFC 6694)
+==IDNA==
+IDNA2003 below is IDNA2003 with updated Unicode (in theory IDNA2003 restricts Unicode to 3.2?)
+* Opera: http://www.alvestrand.no/pipermail/idna-update/2012-November/007455.html (IDNA2008 + deviations); email does not mention domain label separators or fullwidth mapping, presumably http://tools.ietf.org/html/rfc5895#section-2 is implemented too (though not entirely, see label separators)
+* Firefox: IDNA2003
+** https://bugzilla.mozilla.org/show_bug.cgi?id=479520 (implement IDNA2008)
+* Safari/Chrome: IDNA2003
+* Internet Explorer: ?
+* http://mathias.html5.org/tests/url/idna2003-separators/ IDNA2003 domain label separators are supported everywhere
+* Browsers have no restrictions in the ASCII range either. E.g. labels with underscores work.
+What algorithms do we need. ToLabels(domain string) -> list of labels (trailing dot) or failure. ToASCII(label) -> ASCII-label. ToUnicode(label) -> Unicode label. ToLabels should do validation and such too. ToASCII and ToUnicode ideally never fail because ToLabels already ensured validity.
 [[Category:Spec coordination]]

URL: Difference between revisions

Revision as of 15:20, 10 November 2012

Contents

Implementations

Tests

Parsing

JavaScript libraries

Schemes

IDNA

Navigation menu

URL: Difference between revisions

Revision as of 15:20, 10 November 2012

Implementations

Tests

Parsing

JavaScript libraries

Schemes

IDNA

Navigation menu

Search