Difference between revisions of "Autocomplete Types"
(Add suggestion to track custom field types using a wiki.)
m (moved Autocompletetype to Autocomplete Types: Switching away from using the "autocompletetype" attribute name.)
Revision as of 20:32, 6 February 2012
Many user agents provide autofill functionality, but there is not currently a good way for site authors to directly specify field types for autofill. Herein is a lightweight proposal to enable authors to provide these field type hints for autofill agents.
Use Case Description
Many user agents provide functionality to quickly fill frequently used form data – address and contact information, for example. For the purposes of this document, we will refer to this functionality as "autofill" functionality, and will refer to user agents that provide such functionality as "autofill agents". Autofill agents save users' time, and help site authors convert users in purchase and registration flows. Autofill works best when site authors are able to directly provide hints to autofill agents as to what data belongs in each field.
Current autofill products primarily rely on contextual clues to determine the type of data that should be filled into form elements. Examples of such contextual clues include the
name of an
input element, the text surrounding the element, and any
We have discussed the shortcomings of these ad hoc approaches with developers of several autofill products, and all have been interested in a solution that would let website authors classify their form fields themselves. While current methods of field classification work in general, for many cases they are unreliable or ambiguous due to the many variations and conventions used by web developers when creating their forms.
- Ambiguity - Fields named "name" can mean a variety of things, including given name, surname, full name, username, or others. Similar confusion can occur among other fields, such as email address and street address.
- Internationalization - Recognizing field names and context clues for all the world’s languages is impractical, time-intensive, and error-prone (as good context clues in one language may mean something else in another language)
- Unrelated Naming - Due to backend requirements (such as a framework that a developer is working within), developers may be constrained in what they can name their fields. As such, the name of a field may be unrelated from the data it contains.
We believe that website authors have strong incentive to facilitate autofill on their forms to help convert users in purchase and registration flows. Additionally, this assists users by streamlining their experience.
Current Usage and Workarounds
As mentioned above, current autofill products primarily rely on contextual clues. Thus, site authors who wish to "play nicely" with these autofill agents must reverse-engineer each agent's heuristics, and design their web site to match.
There has been a previous standard suggested in this space: RFC 3106, a.k.a. ECML. This standard has been largely unused, we believe for (at least) two reasons:
- RFC 3106 requires websites to conform to a set of input naming standards, effectively co-opting the
nameattribute &endash; which has other, sometimes conflicting, uses. For existing websites, changing the
nameattribute is an onerous task: Since this attribute serves as a key for parsing submitted form data, updating a field's
namerequires coordinated front-end and back-end changes. Even for new websites, the
nameattribute might have back-end restrictions conflicting with RFC 3106. For example, the SourceForge registration page appears to use the attribute as a way to provide an extra security token. Based on research done by in the summer of 2011, we believe that this is the primary reason that RFC 3106 has been largely unused.
- RFC 3106 lacks current user agent support. To the best of our knowledge, it is currently supported only by Google Toolbar. There is a bit of a chicken-and-egg problem here: On the one hand, site authors are hesitant to use RFC 3106 due to minimal user agent support. On the other hand, user agents are hesitant to support the RFC due both to minimal usage in the wild, and due to the aforementioned inconvenience to site-authors caused by co-opting of the
nameattribute. Any new standard will have to face a similar hurdle of bolstering initial adoption; but based on discussion with developers of several autofill products, we believe that many autofill agents would be happy to support a cleaner standard.
autocompletetype Attribute for Form Fields
To complement the current
autocomplete attribute, we propose the addition of the following
autocompletetype attribute to the HTML5 specification in order to eliminate ambiguity from the process of determining input data types.
User agents sometimes have features for helping users fill forms, for example pre-filling the user’s address based on earlier user input.
autocompletetypeattribute indicates to the user agent what type of data should be filled into the input element. Although user agents do not have to respect the value of the
autocompletetypeattribute, it may be made available as a hint.
The attribute’s value, if specified, must be an ordered set of unique space-separated tokens, each of which indicates the data type of the input element. If the user agent supports autofill and is designed to follow the
autocompletetypeattribute hint, it should iterate over the tokens from left to right and use as the data type the first token that it recognizes (with the exception of section tokens, as defined below).
If the user agent does not recognize any of the non-section tokens or the
autocompletetypeattribute is blank, the user agent should not attempt to autocomplete the field. If there is no
autocompletetypeattribute on a field but there is an
autocompletetypeattribute on at least one other field in the form, the user agent should similarly not autocomplete the field based on the data type. Otherwise, the user agent may fall back on alternative means for detecting the input data type.
In practice, this allows website authors to disable datatype-specific autocomplete for an entire form by setting the
autocompletetypeattribute on one form element to something unrecognized by all browsers. This would still allow autocomplete through other methods (for example, by using the user’s form field history), which separates it from
There is no comprehensive list of tokens, as the number of possible input data types is many and ever-increasing. However, at least the following set of tokens should be recognized by user agents, if the user agent’s autofill feature is capable of filling the corresponding data type.
Token Description Names given-name given or first name middle-name middle name middle-initial middle initial surname surname or last name name-full full name name-prefix prefix or title (Mr., Mrs. Dr., etc.) name-suffix suffix (Jr., II, etc.) Addresses street-address full street address condensed into one line address-line1 first line of street address address-line2 second line of street address address-line3 third line of street address locality locality or city city same as locality administrative-area administrative area, state, province, or region state same as administrative-area province same as administrative-area region same as administrative-area postal-code postal or ZIP code country country name Contact Information email address phone-full full phone number, including country code phone-country-code international country code phone-national national phone number: full number minus country code phone-area-code area code phone-local local phone number: full number minus country and area codes phone-local-prefix first part of local phone number (not recommended, see note 1 below) phone-local-suffix second part of local phone number (not recommended, see note 1 below) phone-extension phone extension number fax-full full fax number, including country code fax-country-code international country code fax-national national fax number: full number minus country code fax-area-code area code fax-local local fax number: full number minus country and area codes fax-local-prefix first part of local fax number (not recommended, see note 1 below) fax-local-suffix second part of local fax number (not recommended, see note 1 below) fax-extension fax extension number Credit Cards cc-full-name full name, as it appears on credit card cc-given-name given name, as it appears on credit card (not recommended, see note 2 below) cc-middle-name middle name or initial, as it appears on credit card (not recommended, see note 2 below) cc-surname surname, as it appears on credit card (not recommended, see note 2 below) cc-number credit card number cc-exp-month month of expiration of credit card cc-exp-year year of expiration of credit card (see note 3 below about formatting) cc-exp date of expiration of credit card (see note 4 below about formatting) cc-csc credit card security code Other language preferred language birthday birthday (see note 4 below about formatting) birthday-month month of birthday birthday-year year of birthday (see note 3 below about formatting) birthday-day day of birthday organization company or organization organization-title user's position or title within company or organization gender gender section-***** used to group forms together (see note 5 below)
Notes on tokens:
- The tokens phone-local-prefix, phone-local-suffix, fax-local-prefix, and fax-local-suffix are added to support phone and fax formats where the local number is split into two parts (as in the US, for example). However, it is recommended that forms be constructed with no separation to maximize international support.
- The tokens cc-given-name, cc-middle-name, and cc-surname are added to support forms where the name on the credit card has been split into several form fields. However, it is recommended that forms be constructed with no separation.
- For the tokens cc-exp-year and birthday-year, the element’s
maxlengthattribute should be used as a hint to the formatting of the year. For example,
maxlength="2"indicates a 2-digit year format. Beyond this hint, the user agent may fall back on other heuristics to determine the data format.
- For the tokens cc-exp and birthday, it is recommended to use the HTML5 attribute value
type="date"to distinguish fields requesting year and month from those requesting year, month, and day. In these cases the data should be formatted according to the proper formats for those fields. In other cases, the element's
maxlengthattribute can be used as a hint to proper formatting. For example,
maxlength="7"indicates that a 2-digit month, 4-digit year, and 1-digit separator should be used. Beyond this hint, the user agent may fall back on other heuristics to determine the data format.
- To facilitate classification of logical groups of form fields, developers can use tokens that begin with "section-" to denote such sections. This is described in more detail below.
Form fields are often grouped into logical sections, such as shipping and billing addresses. This semantic information is useful to user agents with autofill capabilities. Web developers may specify this sectioning by a token beginning with "section-".
There may be zero or one section token in the
autocompletetypevalue. If there is one, it must be the first token in the list. Any characters may follow "section-" so long as the token remains a valid token. All fields in a logical grouping (such as shipping or billing addresses) should have the same section token (such as section-shipping or section-billing).
We considered endorsing input naming standards. Web developers would name their forms according to a set of naming standards, such as RFC 3106, a.k.a. ECML. While this might be adopted by new websites, it would force developers of existing websites to change their naming conventions on both the front- and back-end of their websites. Adoption of these standards would therefore be slow at best, and likely never catch on (as has been the case for ECML).
The better solution therefore seems to be one that does not alter the
name attribute of input elements, and instead standardizes labels or placeholder text. However, these texts are visible to users. Web developers should have full control over what is displayed in order to provide the best user experience (especially in cases where the web site is in a foreign language) and as such labels and placeholder texts are inappropriate for this purpose.
To avoid co-opting input element names and user-facing text, we considered using custom data attributes, which are new to the HTML5 specification. However, these are to be used for within-site data. The specification explicitly states that custom data attributes "are not a generic extension mechanism for publicly-usable metadata," which is exactly what we are attempting to do.
We believe this addition to the specification to be the best solution because:
- It is simple to add both to new and to existing web forms.
- It does not require web developers to alter backend code.
- It does not alter the display of forms or user-facing text.
- There is precedent for this type of attribute in the autocomplete attribute.
- It is extensible to future or experimental input data types.
- It allows web developers to provide multiple input data types to fall back on.
- It allows user agents to fall back to alternate heuristics if the attribute is not provided.
- User agents that do not recognize the attribute will simply ignore it.
The main drawback to this solution is that unless approved as a part of the HTML specification, a website that implements the autocompletetype attribute would not be detected as valid HTML by most HTML validators. However, it is not uncommon to use experimental elements or attributes for new features.
We hope that this attribute be accepted into the HTML5 specification, eliminating this drawback.
The token names were chosen to support internationalization. While it is extremely difficult to develop a schema that will work for every case, we believe these tokens include the majority of users. In addition, the extensibility of the attribute allows other tokens to be used that are specific to different locales.
To encourage adoption, we included aliases for common terms in the US. For example, in addition to locality and administrative-area, we have included the aliases city and state. This introduces redundancy and increases the number of tokens, but we view it as necessary for adoption in the US. The extensibility of the attribute similarly allows for additional tokens that are specific to other locales.
Security and Privacy Implications
When dealing with user’s personal information, extra care must be taken to ensure that the data is protected and only transmitted with the user’s consent. This proposal improves the accuracy of autofill products to classify form elements, which could potentially assist malicious sites in identifying and extracting private user data. These vulnerabilities need to be addressed in the autofill products themselves, as any autofill product would be equally at-risk of privacy violations with or without the
Experimental Implementation in Chrome
As of Chrome 15, this specification has been implemented under the experimental attribute
x-autocompletetype. We anticipate that the attribute’s success in improving autofill products will encourage other autofill solutions to implement the attribute. Additionally, we hope it will strengthen our proposal to add the attribute to the HTML5 specification.