A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
URL: Difference between revisions
Jump to navigation
Jump to search
(add some proposed terminology) |
(→Parsing: update some states with new findings based on offline experiments in javascript) |
||
| Line 39: | Line 39: | ||
tokenize(urlstr) | tokenize(urlstr) | ||
SCHEME | SCHEME START | ||
if char is in ALPHA | if char is in ALPHA | ||
buffer += char | buffer += char | ||
-> SCHEME | -> SCHEME | ||
else | else | ||
unconsume char | unconsume char | ||
-> NO SCHEME | -> NO SCHEME | ||
SCHEME | SCHEME | ||
if char is in ALPHA / DIGIT / "+" / "-" / "." | if char is in ALPHA / DIGIT / "+" / "-" / "." | ||
buffer += char | buffer += char | ||
| Line 54: | Line 54: | ||
url.scheme = buffer.toASCIILowercase() | url.scheme = buffer.toASCIILowercase() | ||
buffer = "" | buffer = "" | ||
-> | if url.scheme is not hierarchical (data:) | ||
-> PATH | |||
elif baseURL and url.scheme is baseURL.scheme (http:?test) | |||
-> RELATIVE | |||
else (https://test.com/) | |||
-> AUTHORITY START | |||
else: | else: | ||
input.reset() | input.reset() | ||
-> NO SCHEME | -> NO SCHEME | ||
NO SCHEME | NO SCHEME | ||
| Line 73: | Line 70: | ||
else | else | ||
-> RELATIVE | -> RELATIVE | ||
RELATIVE | RELATIVE | ||
| Line 93: | Line 84: | ||
url.scheme = baseURL.scheme | url.scheme = baseURL.scheme | ||
url.authority = baseURL.authority | url.authority = baseURL.authority | ||
-> PATH | -> HIERARCHICAL PATH | ||
elif char is "?" | elif char is "?" | ||
| Line 112: | Line 103: | ||
url.authority = baseURL.authority | url.authority = baseURL.authority | ||
prepend input by baseURL.path up to the last / | prepend input by baseURL.path up to the last / | ||
-> PATH | -> HIERARCHICAL PATH | ||
AUTHORITY START | AUTHORITY START | ||
| Line 123: | Line 114: | ||
... | ... | ||
PATH | PATH (maybe merge with HIERARCHICAL PATH) | ||
if curChar is "#" | |||
FRAGMENT | |||
else | |||
... | |||
HIERARCHICAL PATH | |||
if char is "?" | if char is "?" | ||
-> QUERY | -> QUERY | ||
Revision as of 14:27, 17 July 2012
This documents research and notes around the URL specification.
Implementations
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURL.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLWTFURL.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLGoogle.cpp
- http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/DataURL.cpp (data URLs)
Tests
Terminology
- URL string
- What you find in attribute values, property values, method parameters, etc.
- parse a URL string url using base URL base
- Turning a URL string into a URL by using a base URL.
- URL
- An in-memory representation of a URL with various properties as elaborated on by model below.
- URL interface/object
- JavaScript representation of a URL.
Model
URL (.href) - invalid? - scheme (.protocol) - authority - username (proposed .username) - password (proposed .password) - ip/host (.hostname) - port (.port) - path (.pathname) - query (.search) - fragment (.hash)
Parsing
parse (urlstr, optional baseURL)
url = new URL
tokenize(urlstr)
SCHEME START
if char is in ALPHA
buffer += char
-> SCHEME
else
unconsume char
-> NO SCHEME
SCHEME
if char is in ALPHA / DIGIT / "+" / "-" / "."
buffer += char
-> continue
elif char is ":"
url.scheme = buffer.toASCIILowercase()
buffer = ""
if url.scheme is not hierarchical (data:)
-> PATH
elif baseURL and url.scheme is baseURL.scheme (http:?test)
-> RELATIVE
else (https://test.com/)
-> AUTHORITY START
else:
input.reset()
-> NO SCHEME
NO SCHEME
if not baseURL or baseURL.scheme is not hierarchical
url.invalid = true
return url
else
-> RELATIVE
RELATIVE
if char is EOI (end-of-input)
url = baseURL
url.fragment = null
exit
elif char is "/" or char is "\"
if next char "/" or next char is "\"
url.scheme = baseURL.scheme
-> AUTHORITY START
else
url.scheme = baseURL.scheme
url.authority = baseURL.authority
-> HIERARCHICAL PATH
elif char is "?"
url.scheme = baseURL.scheme
url.authority = baseURL.authority
url.path = baseURL.path
-> QUERY
elif char is "#"
url.scheme = baseURL.scheme
url.authority = baseURL.authority
url.path = baseURL.path
url.query = baseURL.query
-> FRAGMENT
else
url.scheme = baseURL.scheme
url.authority = baseURL.authority
prepend input by baseURL.path up to the last /
-> HIERARCHICAL PATH
AUTHORITY START
if char is "/" or char is "\"
-> continue
else
-> AUTHORITY
AUTHORITY
...
PATH (maybe merge with HIERARCHICAL PATH)
if curChar is "#"
FRAGMENT
else
...
HIERARCHICAL PATH
if char is "?"
-> QUERY
if char is "#"
-> FRAGMENT
else
buffer += char
QUERY
if char is "#"
-> FRAGMENT
FRAGMENT
...