A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on IRC (such as one of these permanent autoconfirmed members).

Difference between revisions of "URL"

From WHATWG Wiki
Jump to: navigation, search
(put parser sketch online)
Line 7: Line 7:
 
* http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLGoogle.cpp
 
* http://trac.webkit.org/browser/trunk/Source/WebCore/platform/KURLGoogle.cpp
 
* http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/DataURL.cpp (data URLs)
 
* http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/DataURL.cpp (data URLs)
 +
 +
==Model==
 +
 +
URL (.href)
 +
- invalid?
 +
- scheme (.protocol)
 +
- authority
 +
  - username (proposed .username)
 +
  - password (proposed .password)
 +
  - ip/host (.hostname)
 +
  - port (.port)
 +
- path (.pathname)
 +
- query (.search)
 +
- fragment (.hash)
 +
 +
==Parsing==
 +
 +
parse (urlstr, optional baseURL)
 +
  url = new URL
 +
 +
  SCHEME-OR-RELATIVE
 +
    FIRST SCHEME CHARACTER
 +
      if ...
 +
        -> REMAINING SCHEME CHARACTERS
 +
        -> NO SCHEME
 +
    REMAINING SCHEME CHARACTERS
 +
      if curChar is ":"
 +
        -> SCHEME
 +
      ...
 +
        -> NO SCHEME
 +
        -> REMAINING SCHEME CHARACTERS
 +
 +
  SCHEME
 +
    if url.scheme is not hierarchical  (data:)
 +
      -> NON-HIERARCHICAL
 +
    if url.scheme is hierarchical and url.scheme is baseURL.scheme (http:?test)
 +
      -> RELATIVE
 +
    if url.scheme is hierarchical (https://test.com/)
 +
      -> AUTHORITY
 +
 +
  NO SCHEME
 +
    if baseURL.scheme is not hierarchical
 +
      url.invalid = true
 +
      return url
 +
    else
 +
      -> RELATIVE
 +
 +
  NON-HIERARCHICAL
 +
    if curChar is "#"
 +
      FRAGMENT
 +
    else
 +
      ...
 +
 +
  RELATIVE
 +
    if urlstr is empty
 +
      url = baseURL
 +
      url.fragment = null
 +
      return url
 +
 +
    if curChar is either "/" or "\"
 +
      if urlstr second character is either "/" or "\"
 +
        url.scheme = baseURL.scheme
 +
        AUTHORITY
 +
      else
 +
        url.scheme = baseURL.scheme
 +
        url.authority = baseURL.authority
 +
        PATH
 +
 +
    if curChar is "?"
 +
        url.scheme = baseURL.scheme
 +
        url.authority = baseURL.authority
 +
        url.path = baseURL.path
 +
        QUERY
 +
 +
    if curChar is "#"
 +
        url.scheme = baseURL.scheme
 +
        url.authority = baseURL.authority
 +
        url.path = baseURL.path
 +
        url.query = baseURL.query
 +
        FRAGMENT
 +
 +
    else
 +
      url.scheme = baseURL.scheme
 +
      url.authority = baseURL.authority
 +
      prepend input by baseURL.path up to the last /
 +
      PATH
 +
 +
  AUTHORITY
 +
    if "/" or "\"
 +
      AUTHORITY
 +
    else
 +
      AUTHORITY-AFTER-SLASHES
 +
 +
  AUTHORITY-AFTER-SLASHES
 +
    ...
 +
 +
  PATH
 +
    if curChar is "?"
 +
      QUERY
 +
    if curChar is "#"
 +
      FRAGMENT
 +
 +
  QUERY
 +
    if curChar is "#"
 +
      FRAGMENT
 +
 +
  FRAGMENT
 +
    ...
  
 
[[Category:Spec coordination]]
 
[[Category:Spec coordination]]

Revision as of 12:02, 15 June 2012

This documents research and notes around the URL specification.

Implementations

Model

URL (.href)
- invalid?
- scheme (.protocol)
- authority
  - username (proposed .username)
  - password (proposed .password)
  - ip/host (.hostname)
  - port (.port)
- path (.pathname)
- query (.search)
- fragment (.hash)

Parsing

parse (urlstr, optional baseURL)
 url = new URL

 SCHEME-OR-RELATIVE
   FIRST SCHEME CHARACTER
     if ...
       -> REMAINING SCHEME CHARACTERS
       -> NO SCHEME
   REMAINING SCHEME CHARACTERS
     if curChar is ":"
       -> SCHEME
     ...
       -> NO SCHEME
       -> REMAINING SCHEME CHARACTERS

 SCHEME
   if url.scheme is not hierarchical  (data:)
     -> NON-HIERARCHICAL
   if url.scheme is hierarchical and url.scheme is baseURL.scheme (http:?test)
     -> RELATIVE
   if url.scheme is hierarchical (https://test.com/)
     -> AUTHORITY

 NO SCHEME
   if baseURL.scheme is not hierarchical
     url.invalid = true
     return url
   else
     -> RELATIVE

 NON-HIERARCHICAL
   if curChar is "#"
     FRAGMENT
   else
     ...

 RELATIVE
   if urlstr is empty
     url = baseURL
     url.fragment = null
     return url

   if curChar is either "/" or "\"
     if urlstr second character is either "/" or "\"
       url.scheme = baseURL.scheme
       AUTHORITY
     else
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       PATH

   if curChar is "?"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       QUERY

   if curChar is "#"
       url.scheme = baseURL.scheme
       url.authority = baseURL.authority
       url.path = baseURL.path
       url.query = baseURL.query
       FRAGMENT

   else
     url.scheme = baseURL.scheme
     url.authority = baseURL.authority
     prepend input by baseURL.path up to the last /
     PATH

 AUTHORITY
   if "/" or "\"
     AUTHORITY
   else
     AUTHORITY-AFTER-SLASHES

 AUTHORITY-AFTER-SLASHES
   ...

 PATH
   if curChar is "?"
     QUERY
   if curChar is "#"
     FRAGMENT

 QUERY
   if curChar is "#"
     FRAGMENT

 FRAGMENT
   ...