A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "URL"

From WHATWG Wiki
Jump to: navigation, search
(add some proposed terminology)
(Parsing: update some states with new findings based on offline experiments in javascript)
Line 39: Line 39:
 
   tokenize(urlstr)
 
   tokenize(urlstr)
 
   
 
   
   SCHEME CHECK START
+
   SCHEME START
 
     if char is in ALPHA
 
     if char is in ALPHA
 
       buffer += char
 
       buffer += char
       -> SCHEME CHECK NEXT
+
       -> SCHEME
 
     else
 
     else
 
       unconsume char
 
       unconsume char
 
       -> NO SCHEME
 
       -> NO SCHEME
 
   
 
   
   SCHEME CHECK NEXT
+
   SCHEME
 
     if char is in ALPHA / DIGIT / "+" / "-" / "."
 
     if char is in ALPHA / DIGIT / "+" / "-" / "."
 
       buffer += char
 
       buffer += char
Line 54: Line 54:
 
       url.scheme = buffer.toASCIILowercase()
 
       url.scheme = buffer.toASCIILowercase()
 
       buffer = ""
 
       buffer = ""
       -> SCHEME
+
       if url.scheme is not hierarchical (data:)
 +
        -> PATH
 +
      elif baseURL and url.scheme is baseURL.scheme (http:?test)
 +
        -> RELATIVE
 +
      else  (https://test.com/)
 +
        -> AUTHORITY START
 
     else:
 
     else:
 
       input.reset()
 
       input.reset()
 
       -> NO SCHEME
 
       -> NO SCHEME
 
  SCHEME
 
    if url.scheme is not hierarchical (data:)
 
      -> NON-HIERARCHICAL
 
    elif baseURL and url.scheme is baseURL.scheme (http:?test)
 
      -> RELATIVE
 
    else  (https://test.com/)
 
      -> AUTHORITY START
 
 
   
 
   
 
   NO SCHEME
 
   NO SCHEME
Line 73: Line 70:
 
     else
 
     else
 
       -> RELATIVE
 
       -> RELATIVE
 
  NON-HIERARCHICAL (could merge with PATH)
 
    if curChar is "#"
 
      FRAGMENT
 
    else
 
      ...
 
 
   
 
   
 
   RELATIVE
 
   RELATIVE
Line 93: Line 84:
 
         url.scheme = baseURL.scheme
 
         url.scheme = baseURL.scheme
 
         url.authority = baseURL.authority
 
         url.authority = baseURL.authority
         -> PATH
+
         -> HIERARCHICAL PATH
 
   
 
   
 
     elif char is "?"
 
     elif char is "?"
Line 112: Line 103:
 
       url.authority = baseURL.authority
 
       url.authority = baseURL.authority
 
       prepend input by baseURL.path up to the last /
 
       prepend input by baseURL.path up to the last /
       -> PATH
+
       -> HIERARCHICAL PATH
 
   
 
   
 
   AUTHORITY START
 
   AUTHORITY START
Line 123: Line 114:
 
     ...
 
     ...
 
   
 
   
   PATH
+
   PATH (maybe merge with HIERARCHICAL PATH)
 +
    if curChar is "#"
 +
      FRAGMENT
 +
    else
 +
      ...
 +
 +
  HIERARCHICAL PATH
 
     if char is "?"
 
     if char is "?"
 
       -> QUERY
 
       -> QUERY

Revision as of 14:27, 17 July 2012

This documents research and notes around the URL specification.

Implementations

Tests

Terminology

URL string
What you find in attribute values, property values, method parameters, etc.
parse a URL string url using base URL base
Turning a URL string into a URL by using a base URL.
URL
An in-memory representation of a URL with various properties as elaborated on by model below.
URL interface/object
JavaScript representation of a URL.

Model

URL (.href)
- invalid?
- scheme (.protocol)
- authority
  - username (proposed .username)
  - password (proposed .password)
  - ip/host (.hostname)
  - port (.port)
- path (.pathname)
- query (.search)
- fragment (.hash)

Parsing

parse (urlstr, optional baseURL)
 url = new URL
 tokenize(urlstr)

 SCHEME START
   if char is in ALPHA
     buffer += char
     -> SCHEME
   else
     unconsume char
     -> NO SCHEME

 SCHEME
   if char is in ALPHA / DIGIT / "+" / "-" / "."
     buffer += char
     -> continue
   elif char is ":"
     url.scheme = buffer.toASCIILowercase()
     buffer = ""
     if url.scheme is not hierarchical (data:)
       -> PATH
     elif baseURL and url.scheme is baseURL.scheme (http:?test)
       -> RELATIVE
     else  (https://test.com/)
       -> AUTHORITY START
   else:
     input.reset()
     -> NO SCHEME

 NO SCHEME
   if not baseURL or baseURL.scheme is not hierarchical
     url.invalid = true
     return url
   else
     -> RELATIVE

 RELATIVE
   if char is EOI (end-of-input)
     url = baseURL
     url.fragment = null
     exit

   elif char is "/" or char is "\"
     if next char "/" or next char is "\"
       url.scheme = baseURL.scheme
       -> AUTHORITY START
     else
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       -> HIERARCHICAL PATH

   elif char is "?"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       -> QUERY

   elif char is "#"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       url.query = baseURL.query
       -> FRAGMENT

   else
     url.scheme = baseURL.scheme
     url.authority = baseURL.authority
     prepend input by baseURL.path up to the last /
     -> HIERARCHICAL PATH

 AUTHORITY START
   if char is "/" or char is "\"
     -> continue
   else
     -> AUTHORITY

 AUTHORITY
   ...

 PATH (maybe merge with HIERARCHICAL PATH)
   if curChar is "#"
     FRAGMENT
   else
     ...

 HIERARCHICAL PATH
   if char is "?"
     -> QUERY
   if char is "#"
     -> FRAGMENT
   else
     buffer += char

 QUERY
   if char is "#"
     -> FRAGMENT

 FRAGMENT
   ...