A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
URL
Jump to navigation
Jump to search
This documents research and notes around URLs for the URL standard.
Implementations
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURL.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLWTFURL.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLGoogle.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/DataURL.cpp (data URLs)
- http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStandardURL.cpp
- http://mxr.mozilla.org/mozilla-central/source/dom/src/jsurl/nsJSProtocolHandler.cpp (javascript URLs)
- http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prnetdb.c#1544 (IPv6)
Tests
Variants of the following code (runs in Live DOM Viewer) are useful to test which code points are URL escaped in browsers:
<!DOCTYPE html> <script> var a = document.createElement("a") i = 0 cp = 0x100 while ( i < cp ) { a.href = "http://x" + String.fromCharCode(i) + "@x/" if(a.href.length != "http://x)@x/".length) { w(a.href) } i++ } </script>
Parsing
- https://github.com/annevk/url/blob/master/url.js
- http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Sep/0305.html has notes on file URLs in Gecko.
JavaScript libraries
For improving the API we might want to take inspiration from:
- http://medialize.github.com/URI.js/
- https://github.com/joyent/node/blob/master/doc/api/url.markdown
- https://github.com/bestiejs/punycode.js (just Punycode)
Schemes
Post-processing of parsed URLs:
- ftp
- gopher
- http
- https
- ws
- wss
- These all can be used by the corresponding protocol directly.
- file
- Needs platform-specific interpretation and mapping to a resource on a the local file system.
- data
- Needs its resource and MIME type information retrieved from its scheme data/query.
- javascript
- Needs its resource retrieved from its scheme data/query/fragment.
- mailto
- *No resource.* Needs information extracted from scheme data/query. Fragment is not used right?
- about
- The resource is effectively the result of passing scheme data to a hash table (not sure if case-sensitive or not; definitely no percent decoding). Query and fragment can be used by the resource.
IDNA
Definitions:
- IDNA2003+: IDNA2003 with Unicode updated to the latest version. (So not NFKC from Unicode 3.2., although Python might do that... ) Restrictions on display might be in place.
- IDNA2008+: IDNA2008 with RFC 5895 section 2 mapping and IDNA2003 domain label separators. Display is restricted to IDNA2008, lookup is unrestricted (everything gets Punycoded).
Implementations:
- IDNA2003+: Safari, Chrome, Firefox, Internet Explorer
- IDNA2008+: Opera
Tests:
- http://mathias.html5.org/tests/url/idna2003-separators/ IDNA2003 domain label separators are supported everywhere
Required algorithms:
- ToLabels(domain string) -> ASCII-label list (empty label at the end signifies trailing dot) or failure.
- ToASCII(Unicode-label) -> ASCII-label.
- ToUnicode(ASCII-label) -> Unicode label.
(For convenience maybe ToASCII and ToUnicode should accept lists too.)
Notes:
- Input to DNS is a byte array. (This means that "_" and byte 0x03 can be valid input. Not sure whether "." works within a label. Higher than 0x7F cannot happen if IDNA is used.)
- DNS is of course not the only system in place, but browsers do not seem to care as far as mapping is concerned.
- http://www.unicode.org/mail-arch/unicode-ml/y2011-m07/0036.html http://www.unicode.org/mail-arch/unicode-ml/y2011-m07/0057.html
- http://tools.ietf.org/html/rfc6055 has historical deliberations