has the exact same content that it previously cached as:
This is a proposal for a new attribute of LINK, SCRIPT, IMG, and OBJECT elements (and possibly IFRAME elements?) that would allow for unique, unchanging content to be identified and cached, thus helping to augment the normal caching mechanisms of web browsers, (and speed up Web 2.0!)
In brief, my solution is the creation of a resource identifier:
<img src="my-site-images/valid-html5.png" res="org.html5.simon.valid-html5.png" />
Use Case Description
Many websites today show icon badges, small common images that serve a variety of purposes, such as the "W3C valid" pictures, the "Get Adobe Reader" and "Get Adobe Flash Player" links, the Linux penguin, etc. Although these images are small, repeatedly downloading them adds to the overall amount of traffic on the Internet.
In the case of the W3C, they permit a direct link to their servers:
<p> <a href="http://validator.w3.org/check/referer"><img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /></a> </p>
This is a good thing. It means that every single web page that displays a W3C validation badge is caching the image by the same URL, http://www.w3.org/Icons/valid-xhtml10. Thus, each new website that you navigate to which uses these images essentially have a shared resource, which the browser knows it already has cached by the same resource id, the URL.
The problem is that when a particular icon badge does happen to go out of cache, or when it is being downloaded for the first time, just one website has to handle the load of all calls to get that image. For a large and important site like the World Wide Web Consortium, this may not be a problem, but for many other sites, this can dramatically increase bandwidth costs.
The website thiefware.com, which promotes ethical uses of software, provides free "This site is ThiefWare Free!" icon badges. As a smaller site, they ask their users to not hotlink to these images, as they would not be able to handle the bandwidth costs. Thus, users of these images download them to their own website and serve them locally. This is a good thing for the web servers, as now no one site has to handle all requests for these images, but now web browsers have no particularly good way to know that one of these images is already somewhere in its cache when it encounters them at 2 or more websites, thus it downloads an image that it already has in local memory.
Many webmasters today take advantage of open source software around which an entire site can be built, such as WordPress installations for blogging and MediaWiki installations for wikis. Although each installation can have it's components modified, many do not, thus:
may carry the exact same style information, but again, there is no way for a web browser to know, (and be assured) of this. Thus, one who navigates to many different blogging sites and wiki sites may be (needlessly) downloading the same style sheets over and over again, without realizing that the same exact file is already locally available to the browser.
A reusable Flash "button" whose behavior is set via HTML would be another example. It would be nice for a browser to be able to cache a commonly used, HTML OBJECT-tag configurable Flash file.
I would like to propose a resource identifier attribute, named res for the sake of brevity:
<link rel="stylesheet" type="text/css" href="local/pretty-divs.css" res="com.pretty-divs.pretty-divs-1.5" />
<img src="images/madeonamac20050720.gif" res="com.apple.images.web-badges.madeonamac20050720.gif" />
<object ... res="com.newgrounds.numa-numa-forever.swf" />
<iframe src="local/copy/of/something.html" res="org.gov.mil.something.html" />
So the idea is very simple. If a link, script, img, object, or iframe tag (any other tags?) has a res attribute, the browser would first check to see if it has that resource "filed" in its cache with the res value. If it does, great; if it doesn't, then it might look for that resource in its cache filed under the href/src attribute. Finally, having not found either attribute in cache, it would download the resource from the href/src specified.
It would not somehow try to download the resource from the original author based on the res attribute, for one of the design goals here is to prevent any one web server from bearing the majority of the bandwidth costs, (though it might use the res attribute to obtain a checksum or signature of some kind, see #Security Concerns).
To emphasize that res is a static ID rather than a URI, I recommend the use of Java package-style namespaces:
Because it would be very important for two resource IDs to truely indicate the exact same resource, I imagine they should always include exact versioning information:
<script ... res="uk.gov.oxford.stiffUpperLip-1.5.js"></script>
A malicious hacker might try to create "Trojan horses" along the following lines:
This Trojan could then perhaps emulate the real script (so that the user doesn't suspect anything is wrong) but might do other things for its own nefarious purposes. In order for this to be effective, the victim would have to visit the hacker's website before any other website using com.everyone.trusts.this.script-1.0.js, so that his script would be the one that gets cached under that resource ID.
I am not certain what a good solution to this would be. Perhaps a central registry of digital signatures and/or checksums for trusted resources would be the (very complicated) solution.