A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
URL schemes: Difference between revisions
SimonSapin (talk | contribs) (→data: URLs: Allow (and suggest) some optimization.) |
SimonSapin (talk | contribs) (→data: URLs: Add a TODO for missing parsing steps) |
||
Line 74: | Line 74: | ||
== data: URLs == | == data: URLs == | ||
See [https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494 Bug 19494] and other stuff linked from there. | See [https://www.w3.org/Bugs/Public/show_bug.cgi?id=19494 Bug 19494] and other stuff linked from there. This is a more precise definition for [http://tools.ietf.org/html/rfc2397 RFC2397] | ||
To '''obtain a resource''' from a [http://url.spec.whatwg.org/#concept-parsed-url parsed URL] whose ''scheme'' is a ASCII case-insensitive match for "data:", run these steps: | To '''obtain a resource''' from a [http://url.spec.whatwg.org/#concept-parsed-url parsed URL] whose ''scheme'' is a ASCII case-insensitive match for "data:", run these steps: | ||
Line 87: | Line 87: | ||
** Decode ''body-bytes'' with the [https://tools.ietf.org/html/rfc4648#section-4 Base 64 Encoding]. ('''Issues:''' Return an error on "invalid" base64? What is invalid? Also accept the ''URL and Filename Safe Alphabet''? Mixed alphabets in the same body? Ignore which non-alphabet bytes? Missing/too little/too much padding?) | ** Decode ''body-bytes'' with the [https://tools.ietf.org/html/rfc4648#section-4 Base 64 Encoding]. ('''Issues:''' Return an error on "invalid" base64? What is invalid? Also accept the ''URL and Filename Safe Alphabet''? Mixed alphabets in the same body? Ignore which non-alphabet bytes? Missing/too little/too much padding?) | ||
* Return a response with ''header'' as a ''Content-Type'' header and ''body-bytes'' as the body. The parsing and interpretation of ''header'' must be the same as for an HTTP ''Content-Type'' header. ('''Issue:''' what can we reference to define that?) | * Return a response with ''header'' as a ''Content-Type'' header and ''body-bytes'' as the body. The parsing and interpretation of ''header'' must be the same as for an HTTP ''Content-Type'' header. ('''Issue:''' what can we reference to define that?) | ||
'''TODO:''' The algorithm is missing this part of RFC2397: | |||
<blockquote>If <mediatype> is omitted, it | |||
defaults to text/plain;charset=US-ASCII. As a shorthand, | |||
"text/plain" can be omitted but the charset parameter supplied.</blockquote> | |||
'''Note:''' this definition does not impose any length limit on data: URLs. When doing [http://url.spec.whatwg.org/#concept-url-parser URL parsing] followed by this algorithm, implementation are allowed to skip some intermediate steps in order to process large URLs efficiently, as long as the "black box" behavior the same. | '''Note:''' this definition does not impose any length limit on data: URLs. When doing [http://url.spec.whatwg.org/#concept-url-parser URL parsing] followed by this algorithm, implementation are allowed to skip some intermediate steps in order to process large URLs efficiently, as long as the "black box" behavior the same. |
Revision as of 21:20, 1 December 2012
Licensing: this page is under CC0, not the MIT License.
about: URLs
about: URLs serve as identifier, potentially with an associated resource. The identifier is given in the URL's scheme data.
Identifier | Resource | Notes |
---|---|---|
"blank" | A resource whose Content-Type is text/html;charset=utf-8 and entity body is the empty string. | |
"invalid" | - | Used to represent a network error. See also CSS Values and Units. |
"legacy-compat" | - | Used in HTML for XSLT serializers. |
"srcdoc" | - | Used in HTML for its <iframe srcdoc> feature. |
"unicorn" | A resource whose Content-Type is image/svg+xml and entity body is the contents of unicorn.svg. | Unicorn! |
To obtain a resource from an about: URL, run these steps:
- If URL's scheme data is not the literal string "blank" or "unicorn", return a network error. (URL's query and URL's fragment are simply not taken into account and can be anything.)
- Return the resource corresponding to the identifier as listed in the table above, with HTTP status code 200 and HTTP status text "OK".
See also The "about" URI Scheme and "about" URI Tokens which this wiki page obsoletes.
Examples
Input | Result |
---|---|
about:blanK | network error (uppercase K) |
about:bl%61nk | network error (no percent decoding by either the URL parser or the obtain a resource algorithm) |
about:blank?teehee | works (query does not matter) |
about:blank?teehee#hihi | works (fragment does not matter either) |
javascript: URLs
javascript: URLs represent a JavaScript script.
To obtain a script from a javascript: URL, run these steps:
- Let input be the concatenation of URL's scheme data, followed by "?" and URL's query if URL's query is non-null, followed by "#" and URL's fragment if URL's fragment is non-null.
- Set input to the result of percent decoding input.
- If input starts with a U+FEFF, remove a single occurrence from the start of input.
- Return input.
See also The 'javascript' resource identifier scheme which this wiki page obsoletes.
data: URLs
See Bug 19494 and other stuff linked from there. This is a more precise definition for RFC2397
To obtain a resource from a parsed URL whose scheme is a ASCII case-insensitive match for "data:", run these steps:
- Set input to the URL’s scheme data.
- If the URL’s query is not null, append "?" and the query to input.
- If input does not contain a U+002C COMMA code point, return a network error and abort these steps. (Note: the comma can come either from the scheme data or the query.)
- Split input at the first comma. Set header and body to the parts before and after the comma, respectively. (Issue: what if the comma is an a MIME quoted string for a parameter value? Example:
data:text/plain;foo="bar,baz";charset=utf8,body
) - Set body-bytes to the result of running percent decode to bytes on body.
- If header ends with ";base64" (Issue: Match how strictly? Case sensitive or not? Allow whitespace? Percent-encoding?) then:
- Remove the matched substring from header
- Decode body-bytes with the Base 64 Encoding. (Issues: Return an error on "invalid" base64? What is invalid? Also accept the URL and Filename Safe Alphabet? Mixed alphabets in the same body? Ignore which non-alphabet bytes? Missing/too little/too much padding?)
- Return a response with header as a Content-Type header and body-bytes as the body. The parsing and interpretation of header must be the same as for an HTTP Content-Type header. (Issue: what can we reference to define that?)
TODO: The algorithm is missing this part of RFC2397:
If <mediatype> is omitted, it
defaults to text/plain;charset=US-ASCII. As a shorthand,
"text/plain" can be omitted but the charset parameter supplied.
Note: this definition does not impose any length limit on data: URLs. When doing URL parsing followed by this algorithm, implementation are allowed to skip some intermediate steps in order to process large URLs efficiently, as long as the "black box" behavior the same.
To percent decode to bytes, run the same algorithm as percent decode but replacing the last step with "Return bytes."
Others
- ws: & wss:
- See Wikipedia
- See IANA