Why not conneg

The purpose of this page is to explain what's wrong with HTTP content negotiation and why you should not suggest HTTP content negotiation as a solution to a problem.

HTTP content negotiation has four axes: negotiation by format (Accept), negotiation by character encoding Accept-Charset, negotiation by natural language (Accept-Language) and negotiation by compression (Accept-Encoding). These axes need to be discussed separately. The only one of these that shouldn't be classified as a failure is negotiation by compression, but even it shouldn't be taken as a role model considering its verbosity for basically negotiating one bit of information.

This page explains what’s wrong with each of the existing negotiation axes and then uses the HTML video element as an example for illustrating why HTTP-based negotiation in general is worse than letting the browser choose from alternative URLs by showing how an HTTP-based codec negotiation solution would be worse than the actual browser-side codec negotiation solution that video element uses.

Negotiation by compression

In practice, in the years HTTP has existed, a wide variety of compression methods has not cropped up. For practical purposes, there are two compression modes: uncompressed and gzipped. Google experimented with Shared Dictionary Compression over HTTP, but in practice it went nowhere on the wider Web. Even the Google-designed HTTP replacement SPDY uses gzip.

These days, all major browsers support gzipped responses. In that sense, the feature is a success. However, it is terribly wasteful that each request ends up containing 23 bytes of boilerplate. It would have been much more efficient if HTTP 1.1 had made the ability to accept gzipped responses a mandatory feature so that servers would have known that they are eligible to send a gzipped response if the request says HTTP/1.1 (or higher) rather than HTTP/1.0.

One might argue that it was important for getting from the HTTP/1.0 world to the world we have now by allowing new features to be opted into one by one and that format and protocol versions are an anti-pattern on the Web. In any case, it’s sad for the 1 bit of information to take 23 bytes—especially if HTTP is to have version number anyway and some features tied to the version number (like support for chunked responses).

Negotiation by character encoding

UTF-8 was introduced in 1993. By the time Accept-Charset was introduced, it was already obsolete in the sense that any server capable of converting to multiple encoding could have used its conversion capabilities to just convert to UTF-8. (Granted, UTF-8 support in browsers didn’t appear right away, but supporting UTF-8 sooner would have been more worthwhile than sending Accept-Charset.) Fortunately, we have been able to get rid of this one. Of the major browsers, only Chrome sends Accept-Charset anymore.

Negotiation by natural language

Negotiation by natural language is a failure for multiple reasons.

First, Web sites that provide content (as opposed to application user interface) in multiple languages so that the different language versions are so equally good that one language version could be selected without human judgment is actually very, very rare. When sites do have multiple language versions, the versions often are not equal. Typically, one language is the primary language for the site and the other language versions are incomplete, out of date or otherwise of low quality. Therefore, if the user can read more than one of the languages offered by the site, the user is most often best off choosing the language version to read manually by making a judgment about which language version that the user is able to read is most likely to be of highest quality. The negotiation algorithm tries to make this automatic, but in practice the algorithm is no substitute for the users human judgment.

It is slightly more common for Web applications to provide their UI strings in multiple languages. In that case, automatic choice might have a chance of working, but the user has a larger time commitment to a particular application anyway, so it is more okay for an application to provide an explicit language is UI than for a content site to require the user choose a language (which, per the previous paragraph, multilanguage content sites need to do anyway) making automatic negotiation less necessary for an application. Even though the initial value for the language preference for a Web application UI could be taken from what the browser suggests, the user starts using a new Web application so rarely relative to all HTTP requests that the user's browser makes that it is terribly wasteful for each HTTP request to broadcast the user's preferred language in case the request happens to be one whose headers and up seeding the preferences for a Web application. (And Web applications these days use JavaScript anyway and, therefore, could request the data from the browser instead of the browser broadcasting it with every HTTP request.)

Also, it's a problem that negotiation by natural language is about negotiating by a characteristic of the human user as opposed to a characteristic of software. That is, the browser doesn't really know the characteristics of the human user without the human user configuring the browser. Since negotiation by natural language is so rarely useful, it doesn't really make sense for the browser to advertise the configuration option a lot or insist that the user makes the configuration before browsing the Web. As a result, the browser doesn't really know about the user's language skills beyond guessing that the user might be able to read the UI language of the browser. And that's a pretty bad guess. It doesn't give any information about the other languages the user is able to read and the user might not even be able to actually read meaningful prose in the language of the browser UI. (You can get pretty far with the browser UI simply by knowing that you can type addresses into the location bar and that the arrow to the left takes you back to the previous page.)

Finally, there is an actual disincentive for configuring the browser: Even if everyone configured with their language preferences in their browser, the combination of languages that the person can read would partition the population of the world into rather small buckets in some cases making the language combination the way to identify a particular user or a smallish group of users. Since people so rarely configure their language preference, when someone does configure it, chances are that the configuration becomes uniquely or almost uniquely identifying. This can be seen as a privacy problem.

Negotiating by format

Negotiating by format is a failure because:

It is actually rather rare that format alternatives can be chosen without knowing the user's intent. You might be able to choose automatically between different ways of compressing bitmaps, but choosing among e.g. a PDF and a Microsoft Word file might depend on the user's intent to print without reflowing lines due to different fonts on the system (PDF) or editing (Word) even if the user has software for reading both PDF and Microsoft Word files. Therefore, it makes sense to provide links to both the PDF and the Microsoft Word file and let the user choose which link to follow.

You can’t rely on format negotiation as Web author, because there are always clients that accept something they don’t declare.

Due to the previous point, if you are a browser vendor and another vendor has shipped a browser that doesn’t declare that it support something it supports, it doesn't make sense for you to waste bytes declaring it, either, because Web authors can rely (solely) on the declaration anyway.

Negotiated responses are so rare relative to all HTTP responses that the request bloat resulting from advertising the supported formats is almost always waste.

Past experience from the GIF to PNG transition suggests that authors are more likely to deploy one format either the old one (with worse capabilities) or the new one (forgoing support for old browsers) than actually bother using negotiation (as a generalization on the Web scale; there may be individual counterexamples that round to practical nothingness).

It never pays to advertise a format once all current browsers support it but there are still old browsers that don't. (E.g. SVG as of 2013.) Since there are browsers that support the format that don't advertise it, the earlier point about not being able to rely on the declaration applies. However, since non-support is a past phenomenon, it is safe to UA sniff for the legacy browsers. If all current browsers support the format, there probably cannot be successful future browsers that don't support the format so the usual problem with you a sniffing where you assume that the current defect stays present in the future versions of a given browser does not apply.

Server-side choice is worse for intermediate caches than browser-side choice

Aside from script-driven HTTP requests (XHR), the HTTP requests made by browsers fall into two main categories: requests that arise from navigation (HTML most of the time) and requests that arise from resource inclusions (images, videos, audio, style sheets, fonts and scripts).

For resource inclusions, the server has always already sent (unless the site uses SPDY push) an HTML document to the browser and then the browser makes the requests in response to what it sees in that document. That is, the server has had an opportunity to say something to the browser already.

It is more friendly for intermediate caches for the resource (typically HTML document) that the server has alread sent to the browser to list alternative URLs for a resource inclusion request with information that allows the browser to choose which one of the URLs to fetch than to give the browser one URL and have the server vary what gets served from that URL.

Consider the actual design of the video element versus a hypothetical HTTP-based negotiation design like this: There are no source elements. There is just the src attribute on the video element. When making the request for the URL given in the src attribute, the browser sends an Accept-Codecs HTTP header that declares the supported container and codec combinations. The server decides which video file to respond with depending on the request header. The server sets the Vary: Accept-Codecs header on the response.

Now, intermediate caches have to use the value of the Accept-Codecs header as part of the cache key.

Suppose the origin server has two alternative video files MP4/H.264/AAC and WebM/VP8/Vorbis. Ideally, this should result in at most two distinct cache entries and two cache keys in an intermede cache. However, in this scenario, the worst case number for the cache keys in the intermediary is the number of distinct Accept-Codec values sent by the browsers out there.

Suppose first a browser that declares support for WebM/VP8/Vorbis request the video URL. The intermediate cache has no data for that URL yet, so it forwards the request to the origin server and caches the result with the URL and the value of the Accept-Codecs header as the cache key. Then another browser that declares support for MP4/H.264/AAC fetches the same URL from the same intermediate cache. The value of the Accept-Codecs header does not match the value in the existing cache key, so the intermediary cannot just go and serve the existing cache entry. Instead, it forwards the request to the origin server, gets a different response and makes new cache entry for that response with the URL and the different Accept-Codecs header value as the cache key. So now the intermediary has in its cache all the distinct video files that the origin server has for this URL. If the solution was properly cache-friendly, all subsequent requests could be served straight from the cache without consulting with the origin server until the entries' time to live expires.

But server-side negotiation isn't cache-friendly in that way. Suppose that now a browser that declares support for WebM/VP8/Vorbis *and* WebM/VP8/Opus makes a request for the same URL. The value of the Accept-Codecs header matches neither of the cache keys, so the intermediary needs to consult with the origin server even though it has cached all the possible variants already.

With the actual design of the video element, this problem does not occur. Each alternative video file has a distinct URL, there is no need for the origin server to set the Vary header and the result is as cache-friendly as possible. That is, the worst case number of distinct cache keys is the number of distinct videos on the origin server, so after the intermediary has cached them all, there is no need to consult with the origin server until the cache entries expire due to their age.

This example illustrates that solving the problem of choosing between alternative resources on the HTTP level ends up being less cache-friendly than letting the browser choose from distinct URLs offered in the referring document.

Why not conneg

Contents

Negotiation by compression

Negotiation by character encoding

Negotiation by natural language

Negotiating by format

Server-side choice is worse for intermediate caches than browser-side choice

Navigation menu

Why not conneg

Negotiation by compression

Negotiation by character encoding

Negotiation by natural language

Negotiating by format

Server-side choice is worse for intermediate caches than browser-side choice

Navigation menu

Search