A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Autocomplete Types

From WHATWG Wiki
Revision as of 23:12, 6 February 2012 by Isherman (talk | contribs) (Extend the autocomplete attribute rather than defining a new autocompletetype attribute)
Jump to navigation Jump to search

Many user agents provide autofill functionality, but there is not currently a good way for site authors to directly specify field types for autofill. Herein is a lightweight proposal to enable authors to provide these field type hints for autofill agents.

Use Case Description

Many user agents provide functionality to quickly fill frequently used form data – address and contact information, for example. For the purposes of this document, we will refer to this functionality as "autofill" functionality, and will refer to user agents that provide such functionality as "autofill agents". Autofill agents save users' time, and help site authors convert users in purchase and registration flows. Autofill works best when site authors are able to directly provide hints to autofill agents as to what data belongs in each field.

Current Limitations

Current autofill products primarily rely on contextual clues to determine the type of data that should be filled into form elements. Examples of such contextual clues include the name of an input element, the text surrounding the element, and any placeholder text.

We have discussed the shortcomings of these ad hoc approaches with developers of several autofill products, and all have been interested in a solution that would let website authors classify their form fields themselves. While current methods of field classification work in general, for many cases they are unreliable or ambiguous due to the many variations and conventions used by web developers when creating their forms.

  • Ambiguity - Fields named "name" can mean a variety of things, including given name, surname, full name, username, or others. Similar confusion can occur among other fields, such as email address and street address.
  • Internationalization - Recognizing field names and context clues for all the world’s languages is impractical, time-intensive, and error-prone (as good context clues in one language may mean something else in another language)
  • Unrelated Naming - Due to backend requirements (such as a framework that a developer is working within), developers may be constrained in what they can name their fields. As such, the name of a field may be unrelated from the data it contains.

We believe that website authors have strong incentive to facilitate autofill on their forms to help convert users in purchase and registration flows. Additionally, this assists users by streamlining their experience.

Current Usage and Workarounds

As mentioned above, current autofill products primarily rely on contextual clues. Thus, site authors who wish to "play nicely" with these autofill agents must reverse-engineer each agent's heuristics, and design their web site to match.

There has been a previous standard suggested in this space: RFC 3106, a.k.a. ECML. This standard has been largely unused, we believe for (at least) two reasons:

  • RFC 3106 requires websites to conform to a set of input naming standards, effectively co-opting the name attribute — which has other, sometimes conflicting, uses. For existing websites, changing the name attribute is an onerous task: Since this attribute serves as a key for parsing submitted form data, updating a field's name requires coordinated front-end and back-end changes. Even for new websites, the name attribute might have back-end restrictions conflicting with RFC 3106. For example, the SourceForge registration page appears to use the attribute as a way to provide an extra security token. Based on research done by in the summer of 2011, we believe that this is the primary reason that RFC 3106 has been largely unused.
  • RFC 3106 lacks current user agent support. To the best of our knowledge, it is currently supported only by Google Toolbar. There is a bit of a chicken-and-egg problem here: On the one hand, site authors are hesitant to use RFC 3106 due to minimal user agent support. On the other hand, user agents are hesitant to support the RFC due both to minimal usage in the wild, and due to the aforementioned inconvenience to site-authors caused by co-opting of the name attribute. Any new standard will have to face a similar hurdle of bolstering initial adoption; but based on discussion with developers of several autofill products, we believe that many autofill agents would be happy to support a cleaner standard.


Proposed Solutions

Extending the autocomplete Attribute for Form Fields

We propose extending the current autocomplete attribute to optionally specify field types, in addition to the existing values of "on", "off", and "default", in order to eliminate ambiguity from the process of determining input data types.

Mechanics/Model

4.10.7.3.1 The autocomplete attribute

User agents sometimes have features for helping users fill forms in, for example prefilling the user's address based on earlier user input.

The autocomplete attribute is an ordered set of space-separated tokens. The attribute implies one of two autocompletion states for the input element: on or off. The "on" keyword maps to the on state, and the "off" keyword maps to the off state. The attribute may also be omitted, or may provide a field datatype hint, as described in section 4.10.7.3.1.1.


(Begin unchanged snippet)

The off state indicates either that the control's input data is particularly sensitive (for example the activation code for a nuclear weapon); or that it is a value that will never be reused (for example a one-time-key for a bank login) and the user will therefore have to explicitly enter the data each time, instead of being able to rely on the UA to prefill the value for him; or that the document provides its own autocomplete mechanism and does not want the user agent to provide autocompletion values.

Conversely, the on state indicates that the value is not particularly sensitive and the user can expect to be able to rely on his user agent to remember values he has entered for that control.

(End unchanged snippet)


If the attribute is omitted, user agent is to use the autocomplete attribute on the element's form owner instead. (By default, the autocomplete attribute of form elements is in the on state.)

When an input element is in one of the following conditions, the input element's resulting autocompletion state is on; otherwise, the input element's resulting autocompletion state is off:

  • Its autocomplete attribute is specified, has a non-empty value, and the value is not "off".
  • Its autocomplete attribute is omitted, and the element has no form owner.
  • Its autocomplete attribute is omitted, and the element's form owner's autocomplete attribute is in the on state.

(Begin unchanged snippet)

When an input element's resulting autocompletion state is on, the user agent may store the value entered by the user so that if the user returns to the page, the UA can prefill the form. Otherwise, the user agent should not remember the control's value, and should not offer past values to the user.

In addition, if the resulting autocompletion state is off, values are reset when traversing the history.

The autocompletion mechanism must be implemented by the user agent acting as if the user had modified the element's value, and must be done at a time where the element is mutable (e.g. just after the element has been inserted into the document, or when the user agent stops parsing).

Banks frequently do not want UAs to prefill login information:

<label>Account: <input type="text" name="ac" autocomplete="off"></label>

<label>PIN: <input type="password" name="pin" autocomplete="off"></label>

A user agent may allow the user to override the resulting autocompletion state and set it to always on, always allowing values to be remembered and prefilled, or always off, never remembering values. However, user agents should not allow users to trivially override the resulting autocompletion state to on, as there are significant security implications for the user if all values are always remembered, regardless of the site's preferences.

(End unchanged snippet)


4.10.7.3.1.1 Specifying field data type hints

The autocomplete attribute can also provide a field data type hint to the user agent. If a field data type hint is specified, the input element's autocompletion state is on. User agents do not have to respect the field type specified in the autocomplete attribute, but it may be used as a hint.

The attribute’s value, if specifying a field data type hint, must be an ordered set of unique space-separated tokens, each of which indicates the data type of the input element. If the user agent supports type-specific autocomplete (a.k.a. "autofill") and is designed to follow the autocomplete field data type hints, it should iterate over the tokens from left to right and use as the data type the first token that it recognizes (with the exception of section tokens, as defined below).

In either of the following cases, the user agent should not autocomplete the field based on the field's data type, though non-datatype specific autocomplete may still be invoked; otherwise, the user agent may fall back on alternative means for detecting the input data type:

  • The user agent does not recognize any of the non-section tokens specified in the autocomplete attribute.
  • The field's autocomplete attribute is not specified, empty, or set to "on"; and there is at least one other field in the form that specifies a field data type hint using the autocomplete attribute.

In practice, this allows website authors to disable datatype-specific autocomplete for an entire form by setting the autocomplete attribute on one form element to something unrecognized by all browsers. This would still allow autocomplete through other methods (for example, by using the user’s form field history), which separates it from autocomplete="off".

There is no comprehensive list of tokens, as the number of possible input data types is many and ever-increasing. However, at least the following set of tokens should be recognized by user agents, if the user agent’s autofill feature is capable of filling the corresponding data type.

Token Description
Names
given-name given or first name
middle-name middle name
middle-initial middle initial
surname surname or last name
name-full full name
name-prefix prefix or title (Mr., Mrs. Dr., etc.)
name-suffix suffix (Jr., II, etc.)
Addresses
street-address full street address condensed into one line
address-line1 first line of street address
address-line2 second line of street address
address-line3 third line of street address
locality locality or city
city same as locality
administrative-area administrative area, state, province, or region
state same as administrative-area
province same as administrative-area
region same as administrative-area
postal-code postal or ZIP code
country country name
Contact Information
email email address
phone-full full phone number, including country code
phone-country-code international country code
phone-national national phone number: full number minus country code
phone-area-code area code
phone-local local phone number: full number minus country and area codes
phone-local-prefix first part of local phone number (not recommended, see note 1 below)
phone-local-suffix second part of local phone number (not recommended, see note 1 below)
phone-extension phone extension number
fax-full full fax number, including country code
fax-country-code international country code
fax-national national fax number: full number minus country code
fax-area-code area code
fax-local local fax number: full number minus country and area codes
fax-local-prefix first part of local fax number (not recommended, see note 1 below)
fax-local-suffix second part of local fax number (not recommended, see note 1 below)
fax-extension fax extension number
Credit Cards
cc-full-name full name, as it appears on credit card
cc-given-name given name, as it appears on credit card (not recommended, see note 2 below)
cc-middle-name middle name or initial, as it appears on credit card (not recommended, see note 2 below)
cc-surname surname, as it appears on credit card (not recommended, see note 2 below)
cc-number credit card number
cc-exp-month month of expiration of credit card
cc-exp-year year of expiration of credit card (see note 3 below about formatting)
cc-exp date of expiration of credit card (see note 4 below about formatting)
cc-csc credit card security code
Other
language preferred language
birthday birthday (see note 4 below about formatting)
birthday-month month of birthday
birthday-year year of birthday (see note 3 below about formatting)
birthday-day day of birthday
organization company or organization
organization-title user's position or title within company or organization
gender gender
section-***** used to group forms together (see note 5 below)
Notes on tokens:
  1. The tokens phone-local-prefix, phone-local-suffix, fax-local-prefix, and fax-local-suffix are added to support phone and fax formats where the local number is split into two parts (as in the US, for example). However, it is recommended that forms be constructed with no separation to maximize international support.
  2. The tokens cc-given-name, cc-middle-name, and cc-surname are added to support forms where the name on the credit card has been split into several form fields. However, it is recommended that forms be constructed with no separation.
  3. For the tokens cc-exp-year and birthday-year, the element’s maxlength attribute should be used as a hint to the formatting of the year. For example, maxlength="2" indicates a 2-digit year format. Beyond this hint, the user agent may fall back on other heuristics to determine the data format.
  4. For the tokens cc-exp and birthday, it is recommended to use the HTML5 attribute value type="month" or type="date" to distinguish fields requesting year and month from those requesting year, month, and day. In these cases the data should be formatted according to the proper formats for those fields. In other cases, the element's maxlength attribute can be used as a hint to proper formatting. For example, maxlength="7" indicates that a 2-digit month, 4-digit year, and 1-digit separator should be used. Beyond this hint, the user agent may fall back on other heuristics to determine the data format.
  5. To facilitate classification of logical groups of form fields, developers can use tokens that begin with "section-" to denote such sections. This is described in more detail below.
Section tokens

Form fields are often grouped into logical sections, such as shipping and billing addresses. This semantic information is useful to user agents with autofill capabilities. Web developers may specify this sectioning by a token beginning with "section-".

There may be zero or one section tokens in the autocomplete value. If there is one, it must be the first token in the list. Any characters may follow "section-" so long as the token remains a valid token. All fields in a logical grouping (such as shipping or billing addresses) should have the same section token (such as section-shipping or section-billing).


Alternatives Considered

We considered endorsing input naming standards. Web developers would name their forms according to a set of naming standards, such as RFC 3106, a.k.a. ECML. While this might be adopted by new websites, it would force developers of existing websites to change their naming conventions on both the front- and back-end of their websites. Adoption of these standards would therefore be slow at best, and likely never catch on (as has been the case for ECML).

The better solution therefore seems to be one that does not alter the name attribute of input elements, and instead standardizes labels or placeholder text. However, these texts are visible to users. Web developers should have full control over what is displayed in order to provide the best user experience (especially in cases where the web site is in a foreign language) and as such labels and placeholder texts are inappropriate for this purpose.

To avoid co-opting input element names and user-facing text, we considered using custom data attributes, which are new to the HTML5 specification. However, these are to be used for within-site data. The specification explicitly states that custom data attributes "are not a generic extension mechanism for publicly-usable metadata," which is exactly what we are attempting to do.

Adoption

We believe this addition to the specification to be the best solution because:

  • It is simple to add both to new and to existing web forms.
  • It does not require web developers to alter backend code.
  • It does not alter the display of forms or user-facing text.
  • There is precedent for this type of attribute in the autocomplete attribute.
  • It is extensible to future or experimental input data types.
  • It allows web developers to provide multiple input data types to fall back on.
  • It allows user agents to fall back to alternate heuristics if the attribute is not provided.
  • User agents that do not recognize the attribute will simply ignore it.

Limitations

The main drawback to this solution is that unless approved as a part of the HTML specification, a website that specifies field types using the autocomplete attribute would not be detected as valid HTML by most HTML validators. However, it is not uncommon to use experimental elements or attributes for new features.

We hope that this attribute be accepted into the HTML5 specification, eliminating this drawback.

Internationalization

The token names were chosen to support internationalization. While it is extremely difficult to develop a schema that will work for every case, we believe these tokens include the majority of users. In addition, the extensibility of the attribute allows other tokens to be used that are specific to different locales.

To encourage adoption, we included aliases for common terms in the US. For example, in addition to locality and administrative-area, we have included the aliases city and state. This introduces redundancy and increases the number of tokens, but we view it as necessary for adoption in the US. The extensibility of the attribute similarly allows for additional tokens that are specific to other locales.

Extensibility

It might be useful to encourage developers to register custom tokens on a wiki page, as is done for <link rel> and <meta name> attribute values.

Security and Privacy Implications

When dealing with user’s personal information, extra care must be taken to ensure that the data is protected and only transmitted with the user’s consent. This proposal improves the accuracy of autofill products to classify form elements, which could potentially assist malicious sites in identifying and extracting private user data. These vulnerabilities need to be addressed in the autofill products themselves, as any autofill product would be equally at-risk of privacy violations with or without explicit author-specified field types, whether specified via the extended autocomplete attribute or otherwise.

Experimental Implementation in Chrome

As of Chrome 15, this extension has been implemented under the experimental attribute x-autocompletetype. The experimental implementation only supports field types (not "on", "off", or default), and uses a few deprecated token names. We anticipate that the attribute’s success in improving autofill products will encourage other autofill solutions to implement the attribute. Additionally, we hope it will strengthen our proposal to add the attribute to the HTML5 specification.