A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "URL"

From WHATWG Wiki
Jump to: navigation, search
m
m (Parsing)
Line 30: Line 30:
 
   SCHEME CHECK START
 
   SCHEME CHECK START
 
     if char is in ALPHA
 
     if char is in ALPHA
       append char to buffer
+
       buffer += char
 
       -> SCHEME CHECK NEXT
 
       -> SCHEME CHECK NEXT
 
     else
 
     else
Line 38: Line 38:
 
   SCHEME CHECK NEXT
 
   SCHEME CHECK NEXT
 
     if char is in ALPHA / DIGIT / "+" / "-" / "."
 
     if char is in ALPHA / DIGIT / "+" / "-" / "."
       append char to buffer
+
       buffer += char
 
       -> continue
 
       -> continue
 
     elif char is ":"
 
     elif char is ":"
Line 45: Line 45:
 
       -> SCHEME
 
       -> SCHEME
 
     else:
 
     else:
       unconsume char
+
       input.reset()
      prepend buffer to input
 
 
       -> NO SCHEME
 
       -> NO SCHEME
 
   
 
   
Line 118: Line 117:
 
     if char is "#"
 
     if char is "#"
 
       -> FRAGMENT
 
       -> FRAGMENT
 +
    else
 +
      buffer += char
 
   
 
   
 
   QUERY
 
   QUERY

Revision as of 13:58, 21 June 2012

This documents research and notes around the URL specification.

Implementations

Model

URL (.href)
- invalid?
- scheme (.protocol)
- authority
  - username (proposed .username)
  - password (proposed .password)
  - ip/host (.hostname)
  - port (.port)
- path (.pathname)
- query (.search)
- fragment (.hash)

Parsing

parse (urlstr, optional baseURL)
 url = new URL
 tokenize(urlstr)

 SCHEME CHECK START
   if char is in ALPHA
     buffer += char
     -> SCHEME CHECK NEXT
   else
     unconsume char
     -> NO SCHEME

 SCHEME CHECK NEXT
   if char is in ALPHA / DIGIT / "+" / "-" / "."
     buffer += char
     -> continue
   elif char is ":"
     url.scheme = buffer.toASCIILowercase()
     buffer = ""
     -> SCHEME
   else:
     input.reset()
     -> NO SCHEME

 SCHEME
   if url.scheme is not hierarchical (data:)
     -> NON-HIERARCHICAL
   elif baseURL and url.scheme is baseURL.scheme (http:?test)
     -> RELATIVE
   else  (https://test.com/)
     -> AUTHORITY START

 NO SCHEME
   if not baseURL or baseURL.scheme is not hierarchical
     url.invalid = true
     return url
   else
     -> RELATIVE

 NON-HIERARCHICAL (could merge with PATH)
   if curChar is "#"
     FRAGMENT
   else
     ...

 RELATIVE
   if char is EOI (end-of-input)
     url = baseURL
     url.fragment = null
     exit

   elif char is "/" or char is "\"
     if next char "/" or next char is "\"
       url.scheme = baseURL.scheme
       -> AUTHORITY START
     else
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       -> PATH

   elif char is "?"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       -> QUERY

   elif char is "#"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       url.query = baseURL.query
       -> FRAGMENT

   else
     url.scheme = baseURL.scheme
     url.authority = baseURL.authority
     prepend input by baseURL.path up to the last /
     -> PATH

 AUTHORITY START
   if char is "/" or char is "\"
     -> continue
   else
     -> AUTHORITY

 AUTHORITY
   ...

 PATH
   if char is "?"
     -> QUERY
   if char is "#"
     -> FRAGMENT
   else
     buffer += char

 QUERY
   if char is "#"
     -> FRAGMENT

 FRAGMENT
   ...