A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.
To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).
Content-Language
This article is a stub. You can help the whatwg.org wiki by expanding it.
Open Poll Closes 2010-06-30
Please submit your objections to meta Content-Language in this poll (which closes very soon)
http://www.w3.org/2002/09/wbs/40318/issue-88-objection-poll/
as well as adding them to this wiki page.
Summary
The meta tag, specifically http-equiv Content-Language pragma, is confusing and broken and should be removed from HTML5 altogether.
See related issue: http://www.w3.org/html/wg/tracker/issues/88
remove Content-Language
http://lists.w3.org/Archives/Public/public-html/2010Apr/0308
- This seems like the best option. It is preferable to remove broken features rather than keep them (even if "non-conforming") to minimize risk of continued misuse/misunderstanding and otherwise time-wasting on behalf of web designers and developers. Tantek 01:29, 27 June 2010 (UTC)
keep Content-Language as non-conforming
http://lists.w3.org/Archives/Public/public-html/2010Apr/0307.html
- I'd still prefer complete removal of a broken feature rather than issuing warnings. Tantek 01:29, 27 June 2010 (UTC)
allow multiple language tags in Content-Language
http://www.w3.org/html/wg/wiki/ChangeProposals/ContentLanguages
- Even the "Summary" for this proposal is long and confusing. The workarounds provided in the change proposal increase web authoring complexity. Broken features should be removed, from the language and the specification, rather than asking web developers to waste time learning about broken features and how to work around them. Let's keep the spec as clean as possible. Tantek 01:29, 27 June 2010 (UTC)
Difference Between Content-Language HTTP Header and Pragma Directive
This premise of this change proposal is that the Content-Language HTTP header field is functionally equivalent to the Content-Language pragma directive using the meta element. This premise is used to support the idea that that both should share the same syntax and client side processing requirements. However, this premise is demonstrably wrong, and thus the change proposal is unsupported by evidence and must be rejected.
In order to demonstrate the differences between the HTTP header and the pragma directive, it is necessary to analyse the purpose and functionality of each and see how they compare.
Declaring the Language of the Intended Audience
The HTTP Content-Language header field is used by HTTP servers to announce the language of the intended audience for a given resource representation. This and other related information exchanged between the client and server can be used for content negotiation based on language. When the server does this, it is important for this information to be included in the HTTP header where it can be seen by both the client and other intermediary servers.
The information declared within the document using the pragma directive is unsuitable for this purpose, as it will not be parsed by intermediary servers that would otherwise utilise the information for caching purposes.
Server Configurataion
It has been claimed that the information declared using a pragma directive within the document may be parsed by some server implementations, which subsequently process and echo the value in the Content-Language HTTP header field. Since this header field is allowed to contain multiple language values, it is claimed that this ability is limited by permitting only one language in the pragma directive. However, no evidence has been presented to demonstrate how widely used this feature is, nor why such a feature should even be defined within HTML.
This is a layering violation because information intended for server side processing, and specific implementation details thereof, should not unnecessarily affect the conformance definition of client side HTML. That is, it is out of scope for HTML, as a client side markup language, to define specific processing requirements or features to be used by servers for implementing HTTP features. There is also no inherent need for interoperability between different back end implementation details.
Defining the pragma directive in a way that is optimised for specific server implementation details would be analogous to, for example, defining an ASP specific feature within HTML for use on Microsoft IIS platforms. While server implementations are otherwise free to make any design choices, those design choices need not affect HTML conformance requirements.
Default Document Language
In practice, Content-Language used within the meta element in the HTML serves as client side metadata. The functionality of Content-Language in this case is restricted entirely to the purpose of specifying a fallback language, to be used in the absence of the lang attribute. This purpose differs significantly from the purpose of declaring the languages of the intended audience.
Declaring multiple languages for the document's intended audience makes sense in some cases. However, there can only be one default language. Thus, for this purpose, the functionality as defined requires that only a single language value be specified. While the HTTP Content-Language header field is also used for determining the fallback language in cases where it only has a single language value, that is not its primary purpose and is thus not a significant similarity between these two independent features.
Permitting multiple language values to be specified in the pragma directive is at odds with its implementation requirements. Thus, for the client-side meta data functionality of the pragma directive, it is not at all useful to have multiple languages specified, and so it does not make sense for multiple languages to be considered conforming.
These 3 aspects of the functionality — declaring the langauge of the intended audience, server side configuration and default document language — clearly illustrate that the premise of this change proposal — the shared functionality between the two features — is fundamentally flawed. The reality is that the in-document Content-Language pragma directive only shares its name with the HTTP header field, while its functionality is closer to that of the lang attribute. And since server side implementation details are out of scope of HTML, there is no need for the document conformance definition to permit multiple language values. The solution chosen for addressing this issue must take this into account, and thus reject this change proposal.
Arguments Against the Rationale
The rationale for this change proposal states:
"[The current specification] offers no carrot for doing the right thing. while the fallback language effect stops as soon as the author adds lang on the root element, the spec requires conformance checker to continue whining until the http-equiv="Content-Language" meta element has been removed."
The rationale fails to explain the benefit gained by leaving the pragma directive in the document when a lang attribute has been specified on the root element. While leaving it in the document under those circumstances is mostly harmless, it is redundant metadata that the author does not need to include in their document. Failing to offer a warning would continue to mislead the author into thinking that the pragma directive is both acceptable and useful, which it is not.
"That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header — whether they want to replicate the effect of multiple tags or a single tag."
The language fallback effect from using multiple language tags within the value is that there is no default language. This is exactly the same effect as would be achieved by omitting the pragma directive, and so the given reason is blatantly wrong about having no way to replicate the effect of having no specified language. This rationale also fails to provide a reason for wanting to replicate this effect by copying the same syntax.
i.e. The effect of including:
<meta http-equiv="Content-Language" content="en, fr">
is identical to that of omitting this pragma directive entirely, and there is no reason for wanting to include the above either.
That it underlines the confusion that may exist today, about the nature of lang versus Content-Language, by requiring:
- different syntax rules for features that are expected to be identical (HTTP and http-equiv)
- similar syntax rules for features that are different (http-equiv and lang)
- a warning message which asks authors to “use lang instead” – as if they were juxtaposable alternatives.
The HTTP Content-Type header and the http-equiv pragma directive are different. The HTTP header is used for declaring the languages of the intended audience, the pragma directive is used for specifying a default language.
The lang attribute, on the other hand, is an alternative to the pragma directive because the pragma directive only serves to specify the language of the document when a single value is used. When multiple languages are specified in the pragma directive, there is absolutely no defined effect, and so it serves no valid purpose at all.
Therefore, the pragma directive is much closer in functionality to the lang attribute, than it is to the HTTP header, with which it shares its name.
Instead of the above, this change proposal propose:
- the Zero-edit proposal’s warning about using lang instead of Content-Language should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (via lang) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes from http-equiv or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect.
From the authors perspective, the inconsistency of issuing the warning about the use of the pragma directive only when the lang attribute is absent would be confusing. The better alternative is to issue a consistent warning (or error) that simply says to remove the pragma directive and use lang instead.
...
- to hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of lang, which forbids multiple languages), will have the effect of underlining that lang and Content-Language have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases.
The syntax requirements for the HTTP Content-Type header are not affected by the HTML implementation requirements. Since the lang attribute on the root element and the Content-Language pragma directive with a single language value do have the same effect, which differs significantly from the purpose of the HTTP Content-Language header, and it is misleading to pretend otherwise, the syntax of the former does not need to match the syntax of the latter.
...
- a carrot: what we want from authors is that they rely on lang (and xml:lang) for specifying the language — when the author does that, he/she should get immediate reward in the form of removal of conformance warning.
This rationale fails to explain why that same effect of encouraging authors to use the lang attribute, would not be achieved by a more consistent warning that states to use the lang attribute and remove the pragma directive. There is no benefit gained by leaving the directive in, and merely silencing the validator by inserting a lang attribute does little to discourage the use of the redundant and totally unnecessary pragma directive.
Arguments Against the Proposal Details
The change proposal suggests replacing the terminology for "pragma-set default language" with "pragma-set locale language". None of the given rationale explains the need for this change in terminology.
The proposed specification text states:
This pragma contains a Content-Language list, whose semantics and syntax is defined in the HTTP spec.
The semantics of the Content-Language header field as defined in RFC 2616 states:
The Content-Language entity-header field describes the natural language(s) of the intended audience for the enclosed entity. Note that this might not be equivalent to all the languages used within the entity-body.
This semantic definition does not match the actual purpose of the Content-Language pragma directive, for specifying a "pragma-set locale language". Therefore, referring to RFC 2616 for this semantic definition is inappropriate. The syntax requirements from RFC 2616 are also inappropriate, as it defines the following ABNF, which is not directly compatible with the syntax of the meta element with http-equiv and content attributes.
Content-Language = "Content-Language" ":" 1#language-tag language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA
For these syntax requirements to be applicable at all, the specification would have to state that the value of the content attribute must match the ABNF production for language-tag. However, see below regarding the syntax defined in BCP 47.
The proposed text then states:
An HTML5 parser processes this list into a known or unknown pragma-set locale language... The Content-Language list may also be defined in a HTTP header, and will then result in a known or unknown HTTP header-set locale language.
The proposed text fails to define what "known or unknown" means in that context. It is not clear how the implementation determines whether a value is known or unknown.
The parsing requirements for the value of this pragma directive are not specified by the change proposal. However, the change proposal also does not state that the existing parsing requirements in the specification are to be removed, replaced or modified in any way. Thus, by adopting the details of this change proposal, the specification would be left in an inconsistent state which says that multiple language values are supported, but where the parsing requirements abort when more than one value is used.
The aforementioned parsing requirements only focus on parsing the value of the pragma directive, and as such, there is no implementation requirement that sets the "HTTP header-set locale language".
When a document is lacking a language declaration in the form of the lang or xml:lang attribute on the root element, the document’s locale language (pragma-set or HTTP-set) is consulted by the user agent and used as fallback value for the primary document language.
Assuming the value of the "HTTP header-set locale language" comes from the HTTP Content-Language header, this proposed text fails to specify the order of precedence of the values specified in the pragma directive or the HTTP header.
The use of the term "locale language" in this context clashes with the existing use of the term in the specification to refer to the language set by the user in the user agent's preferences. This term is used in the table within step 7 of the algorithm to determine the character encoding.
The proposed text then goes on to state:
The following info about the HTTP semantics and Content-Language usage, is informative:
However, in the non-normative list given following that statement, RFC 2119 terminology is incorrectly used to describe what appear to be authoring requirements. In particular:
... authors should not define the Content-Language list according to its parser effect, but according to it semantics.
This non-normative example text also incorrectly states that "en-US" would not be parsed into a useful value. However, this value complies with the syntax requirements specified in RFC 2616, BCP 47 and also with the existing parsing requirements in the HTML5 specification.
The proposal states that the following reqiurement is to be removed:
Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.
The rationale provided does not adequately justify the removal of this warning, and nor does it adequately justify replacing it with a more limited warning to be issued only when the pragma directive is in the absence of the lang attribute.
The proposal then states to amend this requirement as follows:
the content attribute must have a value consisting of a valid BCP 47 language tag, or a comma separated list of two or more BCP 47 language tags.
However, the proposal stated earlier that the syntax for the value was defined by RFC 2616. This requirement now conflicts with that by stating that the syntax of the content attribute's value is defined by BCP 47. This inconsistency negatively affects the quality of the specification.
The proposal states that this note is to be removed:
This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.
The removal of this note would be misleading, because the note itself is factually correct as-is with the current specification, and with the details of this proposal, which, as stated above, leave the parsing requirements unchanged. The proposal fails to include any implementation requirements that actually permit multiple language tags to be used.
It has now been clearly demonstrated that the proposed specification text provided by this change proposal is thoroughly inadequate for its intended purpose. If the specification were to be amended as required by this change proposal, the inconsistency and lack of clarity would negatively affect the ability to read, understand and implement this specification. As such, this proposal should also be rejected on the basis that its proposal details are inadequate. However, if this working group does make the wrong decision to permit multiple language tags, then I ask that the editor be given full editorial discretion to phrase the requirements in a way that more clearly expresses the requirements, rather than being asked to accept the details of this proposal as written.