A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Link Hashes: Difference between revisions
(Added to the Feature Request category.) |
|||
(8 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward. | Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward. Furthermore, there are other use cases where link hashes might be useful to improve caching or modify the user experience of security. | ||
== Problem Description == | == Problem Description == | ||
Line 7: | Line 7: | ||
=== Current Usage === | === Current Usage === | ||
Some links to software download pages featuring hashes: | Some links to software download pages featuring hashes: | ||
* Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006] | * Apple: [http://www.apple.com/support/downloads/securityupdate20060061039client.html Security Update 2006-006] | ||
* [http://www.php.net/downloads.php PHP Downloads] | * [http://www.php.net/downloads.php PHP Downloads] | ||
* Apache: [http://httpd.apache.org/download.cgi HTTP Server] | * Apache: [http://httpd.apache.org/download.cgi HTTP Server] | ||
Hashes on links are used in [http://tools.ietf.org/html/rfc6249#section-6 Metalink], which is implemented by a number of software download products. | |||
Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki. | Other examples can be found on the [http://microformats.org/wiki/hash-examples#Who_offers_MD5.2FSHA-1_checksums_with_software hash examples] page on the Microformat wiki. | ||
=== Benefits === | === Benefits === | ||
There are a few use cases for link hashes. | |||
==== Integrity ==== | |||
The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed. | |||
Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load? | |||
For downloads a browser could display the following message when in case of hash mismatch: | |||
:''File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?''<br>[Delete File] [Keep in Quarantine] | |||
Displaying a new type of error to users for linked content (e.g., CSS, JS, SVG) probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience. | |||
For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given. | |||
==== Caching ==== | |||
Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal. | |||
Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here. | |||
See also: | |||
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html | |||
http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6 | |||
== Proposed Solutions == | == Proposed Solutions == | ||
=== hash attribute === | === hash attribute === | ||
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk. | A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk. | ||
Line 26: | Line 56: | ||
</pre> | </pre> | ||
Note: MD5 is a particularly bad choice here, assuming that the resource being accessed can be modified by an attack (e.g. it is open source). A collision attack where the original message's JavaScript comments are modified to produce the same hash as a valid JavaScript file that does something malicious would be unfortunate. | |||
As well, algorithm agility would be nice. Brad Hill [http://lists.w3.org/Archives/Public/public-webappsec/2012Nov/0129.html suggested] a syntax like: | |||
== | <pre> | ||
: | <link href="https://www.example.com/foo.css" digest="sha256:ab8e92231..."> | ||
</pre> | |||
==== | ==== Processing Model ==== | ||
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash. | |||
For links to JS, CSS, etc., where HTML has been fetched over SSL/TLS, a matching hash means that the mixed content warning can be omitted. This might require substantial modification of the fetch to allow content to be downloaded before the policy decision is made. | |||
(The processing model for all of the proposals is largely the same) | |||
=== Hash Microformat === | === Hash Microformat === | ||
Line 62: | Line 90: | ||
:"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?" | :"Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?" | ||
==== | === Link Fingerprint === | ||
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it. | Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it. | ||
Line 86: | Line 101: | ||
The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details. | The [http://www.gerv.net/security/link-fingerprints/ Link Fingerprints article] by Gervase Markham gives more details. | ||
=== HTTP Headers === | |||
==== | |||
HTTP defines the [http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15 Content-MD5] HTTP header. However, MD5 has known security flaws, and is not recommended for use. | |||
: | The [http://tools.ietf.org/html/rfc3230#section-4.3.2 Digest] HTTP header avoids this by allowing the hash algorithm to be specified. | ||
Both of these approaches are on the response itself, rather than in '''links''' -- so they're only able to indicate the integrity of the response they occur within, and are naturally vulnerable to modification in transit and on the server. | |||
Another approach would be using [http://tools.ietf.org/html/rfc5988 Link headers] to indicate hashes for the links in content; this could be useful if you want to add hashes with a server plug-in or a reverse proxy, for example. | |||
== Mailing List References == | == Mailing List References == | ||
Line 113: | Line 118: | ||
* [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006 | * [http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2006-November/007903.html Re: hash attribute] -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006 | ||
[[Category: | [[Category:Proposals]] |
Latest revision as of 08:36, 20 November 2013
Many download sites, especially for software download, give hashes or digests for the file they distribute so that users can check the validity of the files once they've downloaded it. The process for verifying the hash however isn't straightforward. Furthermore, there are other use cases where link hashes might be useful to improve caching or modify the user experience of security.
Problem Description
A lot of software download pages already give you MD5 or SHA-1 digests values to check the validity of the downloaded file. Checking the file ensure that the downloaded file is same as the author of the page wanted to give you. Corrupted or tampered files can be detected that way.
The problem is that there is no way to automate that verification process. To automate this process, a browser would need to extract the hash associated with the link on the original page.
Current Usage
Some links to software download pages featuring hashes:
- Apple: Security Update 2006-006
- PHP Downloads
- Apache: HTTP Server
Hashes on links are used in Metalink, which is implemented by a number of software download products.
Other examples can be found on the hash examples page on the Microformat wiki.
Benefits
There are a few use cases for link hashes.
Integrity
The most obvious is easier discoverability of tampered files which could come from a mirror server being hacked. However, the security improvement is limited to the security properties of how the links themselves are conveyed.
Additionally, the failure case needs to be considered; if we use hashes for security and the hashes don't match, how should this affect the page load?
For downloads a browser could display the following message when in case of hash mismatch:
- File "image.iso" is different from the file linked on page "My Software CD Images". It is possible that this file has been tampered with and it'd be advisable to not open it. Do you wish to delete the file?
[Delete File] [Keep in Quarantine]
Displaying a new type of error to users for linked content (e.g., CSS, JS, SVG) probably won't improve security; it'll just be another warning to click through. However, hashes COULD be used to improve the current security experience.
For example, a Web page served over a https:// URL could include hashes for links to assets with http:// URLs; if the hashes match, mixed content warnings might not need to be given.
Caching
Another use case is for caching. If a browser has a cached copy of jquery, for example, and a link has a hash that matches the cached copy, it could avoid a request, even if the URL is completely different. The collision resistance and other security properties would obviously need to be carefully specified here, but if the hash doesn't match, it isn't necessary to present the error to the user; you just fetch the URL as per normal.
Hashes would also enable new forms of caching; e.g., a browser could implement a peer-to-peer protocol to ask its peers for URLs that it wants, verifying what it gets from them using the hashes. Again, there are serious privacy and security issues to work through here.
See also:
http://alexchamberlain.co.uk/opinion/2012/09/13/cache-across-domains.html http://tools.ietf.org/html/draft-rpeon-httpbis-exproxy-00#section-6
Proposed Solutions
hash attribute
A hash attribute could contain a md5 checksum of the target file. If the hash of the downloaded file does not match the one from the link, the file is deleted or quarantined and the user is alerted of a potential security risk.
<a href="..." hash="b3187253c1667fac7d20bb762ad53967">
Note: MD5 is a particularly bad choice here, assuming that the resource being accessed can be modified by an attack (e.g. it is open source). A collision attack where the original message's JavaScript comments are modified to produce the same hash as a valid JavaScript file that does something malicious would be unfortunate.
As well, algorithm agility would be nice. Brad Hill suggested a syntax like:
<link href="https://www.example.com/foo.css" digest="sha256:ab8e92231...">
Processing Model
When the link is clicked, the browser keeps the hash in memory to compare it with the it hashes from the downloaded file. Once the file is downloaded, the the computed hash is compared against the expected hash.
For links to JS, CSS, etc., where HTML has been fetched over SSL/TLS, a matching hash means that the mixed content warning can be omitted. This might require substantial modification of the fetch to allow content to be downloaded before the policy decision is made.
(The processing model for all of the proposals is largely the same)
Hash Microformat
The hash microformat provides a way to associate hash values with links:
<span class="download"> <a rel="bookmark" href="...">Download OpenOffice.org <span class="checksum md5">e0d123e5f316bef78bfdf5a008837577</span> </a> </span>
The microformat is better described on the hash-examples page.
Processing Model
When a link is clicked, the browser check if it corresponds to the microformat (details to be added). If it is the hash value is extracted and, once the file is downloaded, the computed hash for the file is compared against the expected hash. Browsers should keep the initial hash value across redirections, if any. This only applies to files downloaded to the disk.
- "Could the syntax be extended so that fragment identifiers could cohabit with fingerprints?"
Link Fingerprint
Append a digest for the file in the fragment identifier of the URL. The browser can then check the validity of the file when it downloads it.
http://example.com/file#!md5!b3187253c1667fac7d20bb762ad53967
The Link Fingerprints article by Gervase Markham gives more details.
HTTP Headers
HTTP defines the Content-MD5 HTTP header. However, MD5 has known security flaws, and is not recommended for use.
The Digest HTTP header avoids this by allowing the hash algorithm to be specified.
Both of these approaches are on the response itself, rather than in links -- so they're only able to indicate the integrity of the response they occur within, and are naturally vulnerable to modification in transit and on the server.
Another approach would be using Link headers to indicate hashes for the links in content; this could be useful if you want to add hashes with a server plug-in or a reverse proxy, for example.
Mailing List References
- Re: hash attribute -- Tom Pike, Wed Nov 8 05:21:22 PST 2006
- Re: hash attribute -- Ian Hickson, Wed Nov 8 08:28:19 PST 2006
- Re: hash attribute -- Gervase Markham, Thu Nov 9 09:23:32 PST 2006
- Re: hash attribute -- Michel Fortin, Tue Nov 14 08:53:43 PST 2006