WHATWG Wiki - User contributions [en]

Why not conneg

2015-06-11T06:31:49Z

Mnot:

The purpose of this page is to explain what's wrong with HTTP content negotiation and why you should not suggest HTTP content negotiation as a solution to a problem.

HTTP content negotiation has four axes: negotiation by format (<code>Accept</code>), negotiation by character encoding <code>Accept-Charset</code>, negotiation by natural language (<code>Accept-Language</code>) and negotiation by compression (<code>Accept-Encoding</code>). These axes need to be discussed separately. The only one of these that shouldn't be classified as a failure is negotiation by compression, but even it shouldn't be taken as a role model considering its verbosity for basically negotiating one bit of information.

This page explains what’s wrong with each of the existing negotiation axes and then uses the HTML <code>video</code> element as an example for illustrating why HTTP-based negotiation in general is worse than letting the browser choose from alternative URLs by showing how an HTTP-based codec negotiation solution would be worse than the actual browser-side codec negotiation solution that <code>video</code> element uses.

See also the [http://httpwg.github.io/specs/rfc7231.html#content.negotiation HTTP specification], which itself lists a number of disadvantages to content negotiation (the type discussed here is called "proactive content negotiation" there).

==Negotiation by compression==

In practice, in the years HTTP has existed, a wide variety of compression methods has not cropped up. For practical purposes, there are two compression modes: uncompressed and gzipped. Google experimented with Shared Dictionary Compression over HTTP, but in practice it went nowhere on the wider Web. Even the Google-designed HTTP replacement SPDY uses gzip.

These days, all major browsers support gzipped responses. In that sense, the feature is a success. However, it is terribly wasteful that each request ends up containing 23 bytes of boilerplate. It would have been much more efficient if HTTP 1.1 had made the ability to accept gzipped responses a mandatory feature so that servers would have known that they are eligible to send a gzipped response if the request says HTTP/1.1 (or higher) rather than HTTP/1.0.

One might argue that it was important for getting from the HTTP/1.0 world to the world we have now by allowing new features to be opted into one by one and that format and protocol versions are an anti-pattern on the Web. In any case, it’s sad for the 1 bit of information to take 23 bytes—especially if HTTP is to have version number anyway and some features tied to the version number (like support for chunked responses).

==Negotiation by character encoding==

UTF-8 was introduced in 1993. By the time <code>Accept-Charset</code> was introduced, it was already obsolete in the sense that any server capable of converting to multiple encoding could have used its conversion capabilities to just convert to UTF-8. (Granted, UTF-8 support in browsers didn’t appear right away, but supporting UTF-8 sooner would have been more worthwhile than sending <code>Accept-Charset</code>.) Fortunately, we have been able to get rid of this one. Of the major browsers, only Chrome sends <code>Accept-Charset</code> anymore.

==Negotiation by natural language==

Negotiation by natural language is a failure for multiple reasons.

First, Web sites that provide content (as opposed to application user interface) in multiple languages so that the different language versions are so equally good that one language version could be selected without human judgment is actually very, very rare. When sites do have multiple language versions, the versions often are not equal. Typically, one language is the primary language for the site and the other language versions are incomplete, out of date or otherwise of low quality. Therefore, if the user can read more than one of the languages offered by the site, the user is most often best off choosing the language version to read manually by making a judgment about which language version that the user is able to read is most likely to be of highest quality. The negotiation algorithm tries to make this automatic, but in practice the algorithm is no substitute for the users human judgment.

It is slightly more common for Web applications to provide their UI strings in multiple languages. In that case, automatic choice might have a chance of working, but the user has a larger time commitment to a particular application anyway, so it is more okay for an application to provide an explicit language is UI than for a content site to require the user choose a language (which, per the previous paragraph, multilanguage content sites need to do anyway) making automatic negotiation less necessary for an application. Even though the initial value for the language preference for a Web application UI could be taken from what the browser suggests, the user starts using a new Web application so rarely relative to all HTTP requests that the user's browser makes that it is terribly wasteful for each HTTP request to broadcast the user's preferred language in case the request happens to be one whose headers and up seeding the preferences for a Web application. (And Web applications these days use JavaScript anyway and, therefore, could request the data from the browser instead of the browser broadcasting it with every HTTP request.)

Also, it's a problem that negotiation by natural language is about negotiating by a characteristic of the human user as opposed to a characteristic of software. That is, the browser doesn't really know the characteristics of the human user without the human user configuring the browser. Since negotiation by natural language is so rarely useful, it doesn't really make sense for the browser to advertise the configuration option a lot or insist that the user makes the configuration before browsing the Web. As a result, the browser doesn't really know about the user's language skills beyond guessing that the user might be able to read the UI language of the browser. And that's a pretty bad guess. It doesn't give any information about the other languages the user is able to read and the user might not even be able to actually read meaningful prose in the language of the browser UI. (You can get pretty far with the browser UI simply by knowing that you can type addresses into the location bar and that the arrow to the left takes you back to the previous page.)

Finally, there is an actual disincentive for configuring the browser: Even if everyone configured with their language preferences in their browser, the combination of languages that the person can read would partition the population of the world into rather small buckets in some cases making the language combination the way to identify a particular user or a smallish group of users. Since people so rarely configure their language preference, when someone does configure it, chances are that the configuration becomes uniquely or almost uniquely identifying. This can be seen as a privacy problem.

==Negotiating by format==

Negotiating by format is a failure because:

* It is actually rather rare that format alternatives can be chosen without knowing the user's intent. You might be able to choose automatically between different ways of compressing bitmaps, but choosing among e.g. a PDF and a Microsoft Word file might depend on the user's intent to print without reflowing lines due to different fonts on the system (PDF) or editing (Word) even if the user has software for reading both PDF and Microsoft Word files. Therefore, it makes sense to provide links to both the PDF and the Microsoft Word file and let the user choose which link to follow.

* You can’t rely on format negotiation as Web author, because there are always clients that accept something they don’t declare.

* Due to the previous point, if you are a browser vendor and another vendor has shipped a browser that doesn’t declare that it support something it supports, it doesn't make sense for you to waste bytes declaring it, either, because Web authors can’t rely (solely) on the declaration anyway.

* Negotiated responses are so rare relative to all HTTP responses that the request bloat resulting from advertising the supported formats is almost always waste.

* Past experience from the GIF to PNG transition suggests that authors are more likely to deploy one format—either the old one (with worse capabilities) or the new one (forgoing support for old browsers)—than actually bother using negotiation (as a generalization on the Web scale; there may be individual counterexamples that round to practical nothingness).

* It never pays to advertise a format once all current browsers support it but there are still old browsers that don't. (E.g. SVG as of 2013.) Since there are browsers that support the format that don't advertise it, the earlier point about not being able to rely on the declaration applies. However, since non-support is a past phenomenon, it is safe to UA sniff for the legacy browsers. If all current browsers support the format, there probably cannot be successful future browsers that don't support the format so the usual problem with you a sniffing where you assume that the current defect stays present in the future versions of a given browser does not apply.

==Server-side choice is worse for intermediate caches than browser-side choice==

Aside from script-driven HTTP requests (XHR), the HTTP requests made by browsers fall into two main categories: requests that arise from navigation
(HTML most of the time) and requests that arise from resource
inclusions (images, videos, audio, style sheets, fonts and scripts).

For resource inclusions, the server has always already sent
(unless the site uses SPDY push) an HTML document to the browser and
then the browser makes the requests in response to what it sees in
that document. That is, the server has had an opportunity to say something
to the browser already.

It is more friendly for intermediate caches for the resource (typically HTML document) that the server has alread sent to the browser to list alternative URLs for a resource inclusion request with information that allows the browser to choose which one of the URLs to fetch than to give the browser one URL and have the server vary what gets served from that URL.

Consider the actual design of the video element versus a hypothetical HTTP-based negotiation design
like this: There are no <code>source</code> elements. There is just the <code>src</code>
attribute on the <code>video</code> element. When making the request for the URL
given in the <code>src</code> attribute, the browser sends an <code>Accept-Codecs</code> HTTP
header that declares the supported container and codec combinations.
The server decides which video file to respond with depending on the
request header. The server sets the <code>Vary: Accept-Codecs</code> header on the response.

Now, intermediate caches have to use the value of the <code>Accept-Codecs</code>
header as part of the cache key.

Suppose the origin server has
two alternative video files MP4/H.264/AAC and WebM/VP8/Vorbis.
Ideally, this should result in at most two distinct cache entries and two cache keys in
an intermede cache. However, in this scenario, the worst case number for
the cache keys in the intermediary is the number of distinct
<code>Accept-Codec</code> values sent by the browsers out there.

Suppose first a browser that declares support for WebM/VP8/Vorbis request the video URL. The intermediate cache has no data for that URL yet, so it forwards the request to the origin server and caches the result with the URL and the value of the <code>Accept-Codecs</code> header as the cache key. Then another browser that declares support for MP4/H.264/AAC fetches the same URL from the same intermediate cache. The value of the <code>Accept-Codecs</code> header does not match the value in the existing cache key, so the intermediary cannot just go and serve the existing cache entry. Instead, it forwards the request to the origin server, gets a different response and makes new cache entry for that response with the URL and the different <code>Accept-Codecs</code> header value as the cache key. So now the intermediary has in its cache all the distinct video files that the origin server has for this URL. If the solution was properly cache-friendly, all subsequent requests could be served straight from the cache without consulting with the origin server until the entries' time to live expires.

But server-side negotiation isn't cache-friendly in that way. Suppose that now a browser that declares support for WebM/VP8/Vorbis ''and'' WebM/VP8/Opus makes a request for the same URL. The value of the <code>Accept-Codecs</code> header matches neither of the cache keys, so the intermediary needs to consult with the origin server even though it has cached all the possible variants already and one of them would be acceptable to this browser.

With the actual design of the video element, this problem does not occur. Each alternative video
file has a distinct URL, there is no need for the origin server to set
the Vary header and the result is as cache-friendly as possible. That
is, the worst case number of distinct cache keys is the number of
distinct videos on the origin server, so after the intermediary has cached them all, there is no need to consult with the origin server until the cache entries expire due to their age.

This example illustrates that solving the problem of choosing
between alternative resources on the HTTP level ends up being
less cache-friendly than letting the browser choose from
distinct URLs offered in the referring document.

[[Category:Justifications]]

Band names

2014-07-17T00:52:55Z

Mnot:

These are band names collected from #whatwg.

* Bogus DOM
* The Unpaired Surrogates
* Lone Surrogates
* The Designated Experts
* Spidermonkey and the GC Jitters
* Polyglot Heartbeat
* Extant Web Corpus
* Ambushed by Ambiguity
* Ambiguous Ampersands
* bad value robot
* Abort These Steps

== Album titles ==

* User Bang Important Rule

HTTP

2014-06-26T06:12:57Z

Mnot:

This page is an attempt to document some discrepancies between browsers and RFC 2068 (and its successor, RFC 2616) because the HTTP WG seems unwilling to resolve those issues. Hopefully one day someone writes HTTP5 and takes this into account.

== Content-Encoding ==

Under certain conditions this header needs to be stripped: http://hg.mozilla.org/mozilla-central/file/366b5c0c02d3/netwerk/protocol/http/nsHttpChannel.cpp#l4042

Not raised. Monkey patched in Fetch.

== Content-Type parsing ==

Pretty sure I (Anne) raised this at some point. A trailing ";" after a MIME type is considered invalid, but works fine in all implementations.

mnot: relevant spec - http://httpwg.github.io/specs/rfc7231.html#media.type I don't remember this being raised; we can either record it as errata or work it into the next revision.

== Redirects ==

For 301 and 302 redirects browsers uniformly ignore HTTP and use GET for the subsequent request if the initial request uses an unsafe method. (And the user is not prompted.)

'''Raised:''' http://lists.w3.org/Archives/Public/ietf-http-wg/2007JanMar/thread.html#msg225

mnot: See http://httpwg.github.io/specs/rfc7231.html#status.3xx

== Location header ==

Browsers handle relative URIs and URIs with invalid characters in interoperable fashion.

'''Raised:''' http://lists.w3.org/Archives/Public/ietf-http-wg/2009JanMar/thread.html#msg276

mnot: see note in: http://httpwg.github.io/specs/rfc7231.html#header.location

If there's an updated URL spec that's able to be referenced when 7231 is revised, we can point at that.

== Content-Location header ==

Browsers cannot support this header.

'''Raised:''' http://lists.w3.org/Archives/Public/ietf-http-wg/2006OctDec/thread.html#msg190

This has apparently been fixed by making Content-Location have no UA conformance criteria. (It's not clear what it's good for at this point.)

== Accept header ==

Accept header should preferably be done without spaces.

(not raised, odinho: I came across a site that didn't like the spaces, the developer said he'd gotten it off php.net or stackoverflow. He fixed the site. This could be disputed.)

== Requiring two interoperable browser implementations ==

To prove that RFC 2616 can be implemented there should be two compatible implementations in browsers.

'''Raised:''' http://lists.w3.org/Archives/Public/ietf-http-wg/2007JanMar/0222.html

mnot: That'll happen when RFC723x go to full Standard.

== Assume Vary: User-Agent ==

UAs and intermediary caches should act as if all responses had Vary: User-Agent specified since many pages on the Web serve different content depending on the User-Agent header but do not bother specifying Vary: User-Agent.

'''Raised:''' http://lists.w3.org/Archives/Public/ietf-http-wg/2012OctDec/0114.html

:You may as well not have a cache if you do this. It's hard to find two users with the same User-Agent string if you try. It varies based on minor browser version, major OS version, and in old IE doesn't it vary based on installed plugins? Yes, some pages will break if you run a transparent caching proxy and don't vary based on UA, but it will be a small minority and somewhat random, and generally they'll fix themselves if you force-refresh. (Browsers send Cache-Control: no-cache when you force-refresh, which will skip a normally-configured cache.) Even if you vary based on UA, caching proxies will break some pages, because some sites serve incorrect caching headers and a caching proxy will make you hit these more often even in the single-user case. (E.g., hitting refresh will skip browser cache for the current page but not proxy cache, right?)So basically, this is a performance vs. correctness tradeoff, and the correct answer for the vast majority of users is not to have a caching proxy at all. Some will want a caching proxy that serves them some incorrect pages. No one wants a caching proxy that varies based on UA, because then the cache will be useless. The only case I could think of where this might make sense is in an office with a homogeneous browser environment, which wants caching for its standard browsers (which all have the same UA string), but still wants to be relatively correct for people using Wi-Fi on their laptops with different browsers. But it's not something that makes any sense to require across the board. [[User:Aryeh Gregor|Aryeh Gregor]] 08:45, 17 October 2012 (UTC)

mnot: Yeah, that's a really bad idea.

[[Category:Spec_coordination]]

Link Hashes

2013-11-14T04:24:00Z

Mnot:

Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward. Furthermore, there are other use cases where link hashes might be useful to improve caching or modify the user experience of security.

== Problem Description ==
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.

The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.

=== Current Usage ===

Some links to software download pages featuring hashes:
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006]
* [http://www.php.net/downloads.php PHP Downloads]
* Apache: [http://httpd.apache.org/download.cgi HTTP Server]

Hashes on links are used in [http://tools.ietf.org/html/rfc6249#section-6 Metalink], which is implemented by a number of software download products.

Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki.

=== Benefits ===

There are a few use cases for link hashes.

==== Integrity ====

The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.

Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?

For downloads a browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

Displaying a new type of error to users for linked content (e.g., CSS, JS, SVG) probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.

For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.

==== Caching ====

Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.

Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.

See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6

== Proposed Solutions ==

=== hash attribute ===

A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.

<pre>
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
</pre>

==== Processing Model ====

When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.

For links to JS, CSS, etc., where HTML has been fetched over SSL/TLS, a matching hash means that the mixed content warning can be omitted. This might require substantial modification of the fetch to allow content to be downloaded before the policy decision is made.

(The processing model for all of the proposals is largely the same)

=== Hash Microformat ===
The hash microformat provides a way to associate hash values with links:

<pre>

<a rel="bookmark" href="...">Download OpenOffice.org
e0d123e5f316bef78bfdf5a008837577
</a>

</pre>

The microformat is better described on the [http://microformats.org/wiki/hash-examples hash-examples] page.

==== Processing Model ====
When a link is clicked, the browser check if it corresponds to the microformat (''details to be added''). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

=== Link Fingerprint ===

Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.

<pre>
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
</pre>

The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details.

=== HTTP Headers ===

HTTP defines the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. However, MD5 has known security flaws, and is not recommended for use.

The [http://tools.ietf.org/html/rfc3230#section-4.3.2 Digest] HTTP header avoids this by allowing the hash algorithm to be specified.

Both of these approaches are on the response itself, rather than in '''links''' -- so they're only able to indicate the integrity of the response they occur within, and are naturally vulnerable to modification in transit and on the server.

Another approach would be using [http://tools.ietf.org/html/rfc5988 Link headers] to indicate hashes for the links in content; this could be useful if you want to add hashes with a server plug-in or a reverse proxy, for example.

== Mailing List References ==
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007833.html Re: hash attribute] -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007857.html Re: hash attribute] -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006

[[Category:Proposals]]

Link Hashes

2013-11-14T04:10:13Z

Mnot: /* Content-MD5 HTTP Header */

Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward.

== Problem Description ==
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.

The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.

=== Current Usage ===
Some links to software download pages featuring hashes:
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006]
* [http://www.php.net/downloads.php PHP Downloads]
* Apache: [http://httpd.apache.org/download.cgi HTTP Server]

Hashes on links are used in [http://tools.ietf.org/html/rfc6249#section-6 Metalink], which is implemented by a number of software download products.

Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki.

=== Benefits ===

There are a few use cases for link hashes.

==== Integrity ====

The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.

Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?

Displaying a new type of error to users probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.

For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.

==== Caching ====

Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.

Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.

See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6

== Proposed Solutions ==

=== hash attribute ===
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.

<pre>
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
</pre>

==== Processing Model ====
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.

:"To be completed: what to do about non-download links, like links to other pages, when they have a hash?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Hash Microformat ===
The hash microformat provides a way to associate hash values with links:

<pre>

<a rel="bookmark" href="...">Download OpenOffice.org
e0d123e5f316bef78bfdf5a008837577
</a>

</pre>

The microformat is better described on the [http://microformats.org/wiki/hash-examples hash-examples] page.

==== Processing Model ====
When a link is clicked, the browser check if it corresponds to the microformat (''details to be added''). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

The microformat markup is heavier that it needs to be. It also force page authors to put the hash visible inside the link, or to apply specific stylesheets to hide it on visual browsers.

=== Link Fingerprint ===
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.

<pre>
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
</pre>

The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details.

==== Processing Model ====
When the link is clicked, the browser check if the URL contains a hash. If the URL contains a hash, once the file is downloaded the computed hash is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
Work only for downloaded files; fragment identifiers are used in other ways for regular pages and PDF files opened in the browser with a plugin.

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== HTTP Headers ===

HTTP defines the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. However, MD5 has known security flaws, and is not recommended for use.

The [http://tools.ietf.org/html/rfc3230#section-4.3.2 Digest] HTTP header avoids this by allowing the hash algorithm to be specified.

Both of these approaches are on the response itself, rather than in '''links''' -- so they're only able to indicate the integrity of the response they occur within, and are naturally vulnerable to modification in transit and on the server.

Another approach would be using [http://tools.ietf.org/html/rfc5988 Link headers] to indicate hashes for the links in content; this could be useful if you want to add hashes with a server plug-in or a reverse proxy, for example.

== Mailing List References ==
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007833.html Re: hash attribute] -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007857.html Re: hash attribute] -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006

[[Category:Proposals]]

Link Hashes

2013-11-14T03:59:54Z

Mnot: /* Current Usage */

Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward.

== Problem Description ==
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.

The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.

=== Current Usage ===
Some links to software download pages featuring hashes:
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006]
* [http://www.php.net/downloads.php PHP Downloads]
* Apache: [http://httpd.apache.org/download.cgi HTTP Server]

Hashes on links are used in [http://tools.ietf.org/html/rfc6249#section-6 Metalink], which is implemented by a number of software download products.

Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki.

=== Benefits ===

There are a few use cases for link hashes.

==== Integrity ====

The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.

Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?

Displaying a new type of error to users probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.

For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.

==== Caching ====

Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.

Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.

See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6

== Proposed Solutions ==

=== hash attribute ===
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.

<pre>
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
</pre>

==== Processing Model ====
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.

:"To be completed: what to do about non-download links, like links to other pages, when they have a hash?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Hash Microformat ===
The hash microformat provides a way to associate hash values with links:

<pre>

<a rel="bookmark" href="...">Download OpenOffice.org
e0d123e5f316bef78bfdf5a008837577
</a>

</pre>

The microformat is better described on the [http://microformats.org/wiki/hash-examples hash-examples] page.

==== Processing Model ====
When a link is clicked, the browser check if it corresponds to the microformat (''details to be added''). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

The microformat markup is heavier that it needs to be. It also force page authors to put the hash visible inside the link, or to apply specific stylesheets to hide it on visual browsers.

=== Link Fingerprint ===
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.

<pre>
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
</pre>

The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details.

==== Processing Model ====
When the link is clicked, the browser check if the URL contains a hash. If the URL contains a hash, once the file is downloaded the computed hash is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
Work only for downloaded files; fragment identifiers are used in other ways for regular pages and PDF files opened in the browser with a plugin.

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Content-MD5 HTTP Header ===
It has been suggested to use the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. A tampered file on a hacked server is very likely to get its digest updated accordingly however.

== Mailing List References ==
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007833.html Re: hash attribute] -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007857.html Re: hash attribute] -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006

[[Category:Proposals]]

Link Hashes

2013-11-14T03:56:10Z

Mnot: /* Benefits */

Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward.

== Problem Description ==
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.

The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.

=== Current Usage ===
Some links to software download pages featuring hashes:
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006]
* [http://www.php.net/downloads.php PHP Downloads]
* Apache: [http://httpd.apache.org/download.cgi HTTP Server]

Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki.

=== Benefits ===

There are a few use cases for link hashes.

==== Integrity ====

The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.

Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?

Displaying a new type of error to users probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.

For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.

==== Caching ====

Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.

Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.

See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6

== Proposed Solutions ==

=== hash attribute ===
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.

<pre>
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
</pre>

==== Processing Model ====
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.

:"To be completed: what to do about non-download links, like links to other pages, when they have a hash?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Hash Microformat ===
The hash microformat provides a way to associate hash values with links:

<pre>

<a rel="bookmark" href="...">Download OpenOffice.org
e0d123e5f316bef78bfdf5a008837577
</a>

</pre>

The microformat is better described on the [http://microformats.org/wiki/hash-examples hash-examples] page.

==== Processing Model ====
When a link is clicked, the browser check if it corresponds to the microformat (''details to be added''). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

The microformat markup is heavier that it needs to be. It also force page authors to put the hash visible inside the link, or to apply specific stylesheets to hide it on visual browsers.

=== Link Fingerprint ===
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.

<pre>
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
</pre>

The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details.

==== Processing Model ====
When the link is clicked, the browser check if the URL contains a hash. If the URL contains a hash, once the file is downloaded the computed hash is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
Work only for downloaded files; fragment identifiers are used in other ways for regular pages and PDF files opened in the browser with a plugin.

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Content-MD5 HTTP Header ===
It has been suggested to use the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. A tampered file on a hacked server is very likely to get its digest updated accordingly however.

== Mailing List References ==
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007833.html Re: hash attribute] -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007857.html Re: hash attribute] -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006

[[Category:Proposals]]

Link Hashes

2013-11-14T03:48:47Z

Mnot: /* Benefits */

Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward.

== Problem Description ==
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.

The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.

=== Current Usage ===
Some links to software download pages featuring hashes:
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006]
* [http://www.php.net/downloads.php PHP Downloads]
* Apache: [http://httpd.apache.org/download.cgi HTTP Server]

Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki.

=== Benefits ===

There are a few use cases for link hashes.

The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.

Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?

Displaying a new type of error to users probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.

For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.

Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.

Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.

See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6

== Proposed Solutions ==

=== hash attribute ===
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.

<pre>
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
</pre>

==== Processing Model ====
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.

:"To be completed: what to do about non-download links, like links to other pages, when they have a hash?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Hash Microformat ===
The hash microformat provides a way to associate hash values with links:

<pre>

<a rel="bookmark" href="...">Download OpenOffice.org
e0d123e5f316bef78bfdf5a008837577
</a>

</pre>

The microformat is better described on the [http://microformats.org/wiki/hash-examples hash-examples] page.

==== Processing Model ====
When a link is clicked, the browser check if it corresponds to the microformat (''details to be added''). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
:''Cases not covered by this solution in relation to the problem description; other problems with this solution, if any.''

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

The microformat markup is heavier that it needs to be. It also force page authors to put the hash visible inside the link, or to apply specific stylesheets to hide it on visual browsers.

=== Link Fingerprint ===
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.

<pre>
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
</pre>

The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details.

==== Processing Model ====
When the link is clicked, the browser check if the URL contains a hash. If the URL contains a hash, once the file is downloaded the computed hash is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.

:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"

==== Limitations ====
Work only for downloaded files; fragment identifiers are used in other ways for regular pages and PDF files opened in the browser with a plugin.

==== Implementation ====
The software industry as a whole is more and more concerned about security implications of the Internet. Security has become another feature of the browser. Something that increase security with minor impact to the user experience will probably be welcome.

A browser could display the following message when in case of hash mismatch:

:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?'' [Delete File] [Keep in Quarantine]

==== Adoption ====
Distributors that already give hashes for their users to verify the files are very likely to add this extra attribute if it simplifies the security checks for their users. The fact that the digests are already available on these pages means that the author of the page is already concerned about security of the transfered file.

=== Content-MD5 HTTP Header ===
It has been suggested to use the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. A tampered file on a hacked server is very likely to get its digest updated accordingly however.

== Mailing List References ==
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007833.html Re: hash attribute] -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007857.html Re: hash attribute] -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006

[[Category:Proposals]]