URL: Difference between revisions

Revision as of 15:54, 11 February 2013

This documents research and notes around URLs for the URL standard.

Implementations

Tests

https://github.com/cweb/iri-tests

Variants of the following code (runs in Live DOM Viewer) are useful to test which code points are URL escaped in browsers:

<!DOCTYPE html>
<script>
var a = document.createElement("a")

i = 0
cp = 0x100

while ( i < cp ) {
  a.href = "http://x" + String.fromCharCode(i) + "@x/"
  if(a.href.length != "http://x)@x/".length) {
    w(a.href)
  }
  i++
}
</script>

Parsing

https://github.com/annevk/url/blob/master/url.js
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Sep/0305.html has notes on file URLs in Gecko.

JavaScript libraries

For improving the API we might want to take inspiration from:

Schemes

Apart from the scheme-types listed below, the URL Standard identifies "relative schemes", used for parsing a URL into a parsed URL.

Purpose-specific schemes

URL schemes are purpose-specific schemes if they only work in one context. These only work for WebSocket:

ws
wss

Fetch schemes

URL schemes are resource schemes if fetching the URL results in either a network error or a resource with associated MIME type (potentially sniffed).

ftp
http
https: These all can be used by the corresponding protocol directly.
file: Needs platform-specific interpretation and mapping to a resource on a the local file system.
data: Needs its resource and MIME type information retrieved from its scheme data/query.
blob
about: The resource is effectively the result of passing scheme data to a hash table (not sure if case-sensitive or not; definitely no percent decoding). Query and fragment can be used by the resource.

(The same-origin definition should maybe account for about/blob/data.)

Navigate schemes

The "fetch schemes" -> use "fetch"
javascript
Not the "purpose-specific schemes" -> error
All other schemes (including "external schemes")

External schemes

Depending on the context, schemes not listed above will either launch an external application or result in a network error. Examples:

mailto
skype

IDNA

Definitions

IDNA2003+: IDNA2003 with Unicode updated to the latest version. (So not NFKC from Unicode 3.2., although Python might do that... ) Restrictions on display might be in place.
IDNA2008+: IDNA2008 with RFC 5895 section 2 mapping and IDNA2003 domain label separators. Display is restricted to IDNA2008, lookup is unrestricted (everything gets Punycoded).

Implementations

IDNA2003+: Safari, Chrome, Firefox, Internet Explorer
IDNA2008+: Opera

Tests

http://mathias.html5.org/tests/url/idna2003-separators/ IDNA2003 domain label separators are supported everywhere

Algorithms

ToLabels(domain string) -> ASCII-label list (empty label at the end signifies trailing dot) or failure.
ToASCII(Unicode-label) -> ASCII-label.
ToUnicode(ASCII-label) -> Unicode label.

(For convenience maybe ToASCII and ToUnicode should accept lists too.)

UI

Note that this has potential security implications too, but does not matter for interoperability.

http://www.chromium.org/developers/design-documents/idn-in-google-chrome (also includes summary for other browsers)
https://wiki.mozilla.org/IDN_Display_Algorithm
http://www.alvestrand.no/pipermail/idna-update/2011-December/date.html (has lots of background discussion)

Notes

Input to DNS is a byte array. (This means that "_" and byte 0x03 can be valid input. Not sure whether "." works within a label. Higher than 0x7F cannot happen if IDNA is used.)
DNS is of course not the only system in place, but browsers do not seem to care as far as mapping is concerned.
http://www.unicode.org/mail-arch/unicode-ml/y2011-m07/0036.html http://www.unicode.org/mail-arch/unicode-ml/y2011-m07/0057.html
http://tools.ietf.org/html/rfc6055 has historical deliberations

@@ Line 68: / Line 68: @@
 ; about : The resource is effectively the result of passing scheme data to a hash table (not sure if case-sensitive or not; definitely no percent decoding). Query and fragment can be used by the resource.
-(The same-origin definition should maybe account for about/blob/data. javascript should always be special-cased, the rest is then handled automatically.)
+(The same-origin definition should maybe account for about/blob/data.)
 === Navigate schemes ===

URL: Difference between revisions

Revision as of 15:54, 11 February 2013

Contents

Implementations

Tests

Parsing

JavaScript libraries

Schemes

Purpose-specific schemes

Fetch schemes

Navigate schemes

External schemes

IDNA

Definitions

Implementations

Tests

Algorithms

UI

Notes

Navigation menu

URL: Difference between revisions

Revision as of 15:54, 11 February 2013

Implementations

Tests

Parsing

JavaScript libraries

Schemes

Purpose-specific schemes

Fetch schemes

Navigate schemes

External schemes

IDNA

Definitions

Implementations

Tests

Algorithms

UI

Notes

Navigation menu

Search