A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Validator.nu Web Service Interface: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
(Initial version as blogged)
 
No edit summary
 
(56 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<P>I am just writing this down so I don’t forget it. There are no
{{Obsolete|spec=https://github.com/validator/validator/wiki/Service-»-HTTP-interface}}
immediate implementation plans. There are no implementation promises,
Validator.nu can be called as a Web service. Input and output modes can be chosen completely orthogonally. Responses and requests can be optionally compressed (independently of each other).
either. There especially are no hosting promises at this time.
 
This is a inline-commentable wiki copy of [http://hsivonen.iki.fi/validator-ws-ideas/ the original acticle].</P>
(Please use the Web service API reasonably. See the [https://about.validator.nu/#tos Terms of Service].)
<H2 id='motivation'>Motivation</H2>
 
<P>First, I assume there is some level of interest in doing RELAX NG
==Input Modes==
/ Schematron validation and HTML5 conformance checking. Next, it
 
would be nice to enable applications that deal with documents to make
For most Web service use cases, you should probably POST the document as the HTTP entity body.
these checks automatically in addition to having the functionality
 
available for human operators as a Web app. For example, [http://golem.ph.utexas.edu/~distler/blog/archives/001054.html a content management system might check the input it is given].</P>
===Implemented===
<P>Java apps could just integrate a private copy of the Free Software
 
back end of the [http://hsivonen.iki.fi/validator/ validation]
* Document [[Validator.nu GET Input|URL as a GET parameter]]; the service retrieves the document by URL over HTTP or HTTPS.
/ [http://hsivonen.iki.fi/validator/html5/ conformance
* Document [[Validator.nu POST Body Input|POSTed as the HTTP entity body]]; parameters in query string as with GET.
checking] service. However, non-Java apps would benefit from
* Document [[Validator.nu Textarea Input|POSTed as a <code>textarea</code> value]].
having the validation / conformance checking service running out of
* Document [[Validator.nu Form Upload Input|POSTed as a form-based file upload]].
process and having an interface for talking to the out-of-process
 
Java service. The service instance could be hosted publicly or as a
===Not Implemented===
local copy. Even some Java developers would elect to use such a
 
service instead of integrating the back end as part of their own app.</P>
* Document in a <CODE>data:</CODE> URI as a GET parameter.
<H2 id='input'>Input Modes</H2>
* <code>application/x-www-form-urlencoded</code>
<P>The schemas are expected to be relatively static. Therefore, I
 
think preloading them into the service or letting the service
==Output Modes==
retrieve them is sufficient. Identification by URI works in both
 
cases.</P>
When using Validator.nu as a Web service back end, the [[Validator.nu XML Output|XML]] and [[Validator.nu JSON Output|JSON]] output formats are recommended for forward compatibility. The available JSON tooling probably makes consuming JSON easier. The XML format contains XHTML elaborations that are not available in JSON. Both formats are streaming, but streaming XML parsers are more readily available. XML cannot represent some input strings faithfully.
<P>What needs different input modes is the document that is checked.</P>
 
<P>I think the following modes would make sense:</P>
===Implemented===
<UL>
 
<LI><P>Document URI as a GET parameter; the service retrieves the
* HTML with microformat-style <CODE>class</CODE> annotations (default output; should not be assumed to be forward-compatibly stable).
document by URI (already implemented).</P>
* XHTML with microformat-style <CODE>class</CODE> annotations (append <code>&out=xhtml</code> to URL; should not be assumed to be forward-compatibly stable).
<LI><P>Document in a <CODE>data:</CODE> URI as a GET parameter.</P>
* [[Validator.nu XML Output|XML]] (append <code>&out=xml</code> to URL).
<LI><P>Document POSTed as the HTTP entity body (the preferred Web
* [[Validator.nu JSON Output|JSON]] (append <code>&out=json</code> to URL).
service mode).</P>
* [[Validator.nu GNU Output|GNU error format]] (append <code>&out=gnu</code> to URL).
<LI><P>Document POSTed as an <CODE>application/x-www-form-urlencoded</CODE>
* Human-readably plain text (append <code>&out=text</code> to URL; should not be assumed to be forward-compatibly stable for machine parsing—use the GNU format for that).
form field value.</P>
 
<LI><P>Document POSTed as a <CODE>multipart/form-data</CODE> file
===Not Implemented===
upload.</P>
 
</UL>
* Relaxed-compatible (lacks a spec)
<P>In the first three modes, additional parameters would be
* Unicorn-compatible (hoping that Unicorn changes instead)
communicated in the URI query string. In the last two modes,
* W3C Validator-compatible SOAP (legacy)
additional parameters would be communicated like corresponding from
* EARL (not implemented; domain modeling mismatch)
fields are communicated as <CODE>application/x-www-form-urlencoded</CODE>
 
and <CODE>multipart/form-data</CODE>.</P>
==Compression==
<P>I don’t particularly like the last two modes, but they are
 
needed to address feature requests and for parity with other
Validator.nu supports compression in order to save bandwidth.
services. Also, unlike the first three modes, the last two modes need
 
companion UI changes, which is not nice. As a further complication,
===Request Compression===
the last two don’t come naturally with a <CODE>Content-Type</CODE>
 
for dispatching to an HTML5 parser or to an XML parser.</P>
Validator.nu supports HTTP request compression. To use it, compress the request entity body using gzip and specify <code>Content-Encoding: gzip</code> as a ''request'' header.
<P>All these input modes would share the same “service endpoint
 
URI” (and the same servlet class). The different cases can be
===Response Compression===
distinguished from the HTTP method and in the POST cases from the
 
<CODE>Content-Type</CODE> request header.</P>
Validator.nu supports HTTP response compression. Please use it. Response compression is orthogonal to the input methods and output formats.
<H2 id='output'>Output Modes</H2>
 
<P>A Web service probably calls for an XML output format for maximal
The standard HTTP gzip mechanism is used. To indicated that you prepared to handle gzipped responses, include the <code>Accept-Encoding: gzip</code> request header. When the header is present, Validator.nu will gzip compress the response. You should also be prepared to receive an uncompressed, though, since in the future it may make sense to turn off compression under heavy CPU load.
tool chain integration even though the current HTML output format
 
makes sense for browsers and can carry all the necessary data.</P>
==Sample Code==
<P>I think the following modes would make sense:</P>
 
<UL>
There a [https://about.validator.nu/html5check.py sample Python program] that shows how to deal with compression and redirects. (It may not be exemplary Python, though.)
<LI><P>HTML with microformat-style <CODE>class</CODE> annotations
 
(already implemented except the annotation granularity could be
==CORS Example==
better).</P>
 
<LI><P>XHTML with microformat-style <CODE>class</CODE> annotations.</P>
You can also hit the API using [https://developer.mozilla.org/en-US/docs/HTTP_access_control CORS] over AJAX. [https://gist.github.com/gists/3902535 Basic example using jQuery].
<LI><P>A custom XML format that it super-simple and use element
 
names for easier processing with tools that are biased towards
==Sample Messages==
keying on element name rather than on attribute value.</P>
 
</UL>
There are [http://hsivonen.com/test/moz/messages-types/ documents for provoking different message types].
<P>For the HTML and XHTML output formats, there could be an option
 
for suppressing the input form. The output default should be HTML for
{|
the browser-targeted input formats. However, the custom XML format
|-
might be a reasonable default when the input document was POSTed as
! No message
the entity body.</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html HTML]
<H3 id='xml'>The XML Output Format (Draft)</H3>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html&out=xhtml XHTML]
<P>The elements in this XML vocabulary are in the namespace
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html&out=xml XML]
“<CODE>http://hsivonen.iki.fi/validator/messages/</CODE>”. The
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html&out=json JSON]
attributes in this XML vocabulary are not in a namespace. The
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html&out=gnu GNU]
attribute values defined for this XML vocabulary must not have
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fno-message.html&out=text Text]
preceding or trailing white space.</P>
|-
<P>Note: The format has been designed to support streaming generation
! Info
and consumption.</P>
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg HTML]
<H4 id='structure'>Structure and Semantics</H4>
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg&out=xhtml XHTML]
<P>The format consists of an XML 1.0 document that has the element
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg&out=xml XML]
<CODE>messages</CODE> as the root element.  
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg&out=json JSON]
</P>
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg&out=gnu GNU]
<P>The root element may zero or more child elements named <CODE>info</CODE>,
| [https://validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Finfo.svg&out=text Text]
<CODE>warning</CODE> and <CODE>error</CODE>. The element <CODE>info</CODE>
|-
means an informational message. The element <CODE>warning</CODE>
! Warning
signifies a potential problem that does not cause the
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html HTML]
validation/checking to fail. The element <CODE>error</CODE> signifies
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html&out=xhtml XHTML]
a problem that causes the validation/checking to fail. The character
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html&out=xml XML]
data content of these three elements may contain a human-readable
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html&out=json JSON]
message. (Entity-escaped HTML is <EM>not</EM> allowed. :-)</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html&out=gnu GNU]
<P>The elements <CODE>info</CODE>, <CODE>warning</CODE> and <CODE>error</CODE>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fwarning.html&out=text Text]
have three optional attributes for indicating the context of the
|-
message: <CODE>uri</CODE>, <CODE>line</CODE> and <CODE>column</CODE>.
! Error (precise location)
The <CODE>column</CODE> attribute must not be present unless the <CODE>line</CODE>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html HTML]
attribute is present as well.  
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html&out=xhtml XHTML]
</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html&out=xml XML]
<P>The <CODE>uri</CODE> attribute, if present, must containt the URI
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html&out=json JSON]
(not IRI) of the HTTP resource with which the message is associated
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html&out=gnu GNU]
or the literal string “<CODE>data:…</CODE>” (the last character
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Fprecise-error.html&out=text Text]
is U+2026) to signify that the message is associated with a data URI
|-
resource but the exact URI has been omitted. (If a client application
! Error (range location)
wishes to show IRIs to human users, it is up to the client
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html HTML]
application to convert the URI into an IRI.)</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html&out=xhtml XHTML]
<P>The <CODE>line</CODE> attribute, if present, must contain a string
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html&out=xml XML]
consisting of characters in the range U+0030 DIGIT ZERO to U+0039
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html&out=json JSON]
DIGIT NINE which when interpreted as a base-ten integer is a positive
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html&out=gnu GNU]
integer (not zero). This number means the approximate source text
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Frange-error.html&out=text Text]
line number associated with the message. The first line is 1.</P>
|-
<P>The <CODE>column</CODE> attribute, if present, must contain a
! Fatal
string consisting of characters in the range U+0030 DIGIT ZERO to
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml HTML]
U+0039 DIGIT NINE which when interpreted as a base-ten integer is a
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml&out=xhtml XHTML]
positive integer (not zero). This number means the approximate source
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml&out=xml XML]
column number associated with the message on the line indicated by
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml&out=json JSON]
the <CODE>line</CODE> attribute. The first character on a line is in
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml&out=gnu GNU]
column 1.</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2Ffatal.xhtml&out=text Text]
<P>The source lines and columns are approximate. For example, if a
|-
message is related to an attribute, the line and column may point to
! IO
the first character if the start tag, the character after the start
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html HTML]
tag or to the attribute inside the tag depending on implementation.
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html&out=xhtml XHTML]
If a message is related to character data, the line and column may be
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html&out=xml XML]
inaccurate within a run of text e.g. due to buffering. Furthermore,
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html&out=json JSON]
implementation may count column numbers in terms of UTF-16 code units
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html&out=gnu GNU]
instead of characters.</P>
| [https://html5.validator.nu/?doc=http%3A%2F%2Fhsivonen.com%2Ftest%2Fmoz%2Fmessages-types%2F404.html&out=text Text]
<P>The <CODE>error</CODE> element may have an attribute called <CODE>type</CODE>
|}
for indicating that an error is not a general error. Permissible
 
values for the <CODE>type</CODE> attribute are: <CODE>fatal</CODE>
 
(signifies a well-formedness violation or another error after which
 
no more checking was performed), <CODE>io</CODE> (signifies an
[[Category:Validator.nu Documentation]]
input/output error),  <CODE>schema</CODE> (indicates that
initializing a schema-based validator failed) and <CODE>internal</CODE>
(indicates that the validator/checker found an error bug in itself,
ran out of memory, etc., but was still able to emit a message).</P>
<P>The validation/checking is considered to have failed if there is
one or more <CODE>error</CODE> element.</P>
<H4 id='processing'>Processing Model</H4>
<P>Clients that consume the message format are referred to as
processors. They must use a conforming XML 1.0 processor to parse the
format.</P>
<P>If the root element is not an element named <CODE>messages</CODE>,
the document is deemed to be in an unknown format and not processable
according to this processing model.</P>
<P>If a processor encounters an element that it doesn’t recognize,
it must process the content of the element as if the start tag and
the end tag of the element were not there. If the processor encounter
character data as a child of the root element (after applying the
rule stated in the previous sentence), it must act as if the
character data was not there. If a processor encounters an attribute
that it does not recognize, it must ignore the entire attribute. If a
processor encounters an attribute that it does recognize but the
value of the attribute is not permissible under the previous section,
the processor must ignore the entire attribute. If an <CODE>info</CODE>,
<CODE>warning</CODE> or <CODE>error</CODE> element does not have a
<CODE>line</CODE> attribute with a permissible value, a <CODE>column</CODE>
attribute on the element must be ignored if present.</P>
<P>Note: These rules make it possible to add markup for source code
dumps, document outlines and parse trees later without breaking
clients. Also, it make it possible to introduce e.g. XHTML markup in
the human-readable messages.</P>
<P>Processors must process elements in a way that is consistent with
the semantics of the elements.</P>
<P>The determine if the validation/checking succeeded, processors
must determine whether the root element has no <CODE>error</CODE>
element children. If there are no <CODE>error</CODE> children, the
validation/checking succeeded. Otherwise, it failed.</P>
<H2 id='prior'>Prior Art</H2>
<P>The W3C has defined two XML output formats for the W3C Validator:
[http://validator.w3.org/docs/api.html the SOAP format]
and [http://www.w3.org/QA/2006/obs_framework/response/ the Unicorn format]. I think there are two problems with these
formats: they are unnecessarily complex and they don’t support
streaming output. For example, they require a redundant declaration
of the number of errors before the errors themselves (which a client
could count on its own if it wants to know the number).</P>
<P>The W3C Validator also provides simple pass/fail information as
[http://validator.w3.org/docs/api.html#http_headers HTTP
headers], which is nice if you only care about a boolean
pass/fail. However, this approach also has the problem the it
precludes streaming, because the validation process has to finish
before the HTTP headers can be written.</P>
<P>For these reasons, I am not particularly keen on reusing the
output formats of the W3C Validator unless it turns out that there
are significant [http://en.wikipedia.org/wiki/Network_effect network
benefits] to be reaped from plugging into an existing network of
client software. It seems to me that there isn’t a significant
network of existing client software.</P>

Latest revision as of 04:36, 29 December 2016

This document is obsolete.

For the current specification, see: https://github.com/validator/validator/wiki/Service-»-HTTP-interface

Validator.nu can be called as a Web service. Input and output modes can be chosen completely orthogonally. Responses and requests can be optionally compressed (independently of each other).

(Please use the Web service API reasonably. See the Terms of Service.)

Input Modes

For most Web service use cases, you should probably POST the document as the HTTP entity body.

Implemented

Not Implemented

  • Document in a data: URI as a GET parameter.
  • application/x-www-form-urlencoded

Output Modes

When using Validator.nu as a Web service back end, the XML and JSON output formats are recommended for forward compatibility. The available JSON tooling probably makes consuming JSON easier. The XML format contains XHTML elaborations that are not available in JSON. Both formats are streaming, but streaming XML parsers are more readily available. XML cannot represent some input strings faithfully.

Implemented

  • HTML with microformat-style class annotations (default output; should not be assumed to be forward-compatibly stable).
  • XHTML with microformat-style class annotations (append &out=xhtml to URL; should not be assumed to be forward-compatibly stable).
  • XML (append &out=xml to URL).
  • JSON (append &out=json to URL).
  • GNU error format (append &out=gnu to URL).
  • Human-readably plain text (append &out=text to URL; should not be assumed to be forward-compatibly stable for machine parsing—use the GNU format for that).

Not Implemented

  • Relaxed-compatible (lacks a spec)
  • Unicorn-compatible (hoping that Unicorn changes instead)
  • W3C Validator-compatible SOAP (legacy)
  • EARL (not implemented; domain modeling mismatch)

Compression

Validator.nu supports compression in order to save bandwidth.

Request Compression

Validator.nu supports HTTP request compression. To use it, compress the request entity body using gzip and specify Content-Encoding: gzip as a request header.

Response Compression

Validator.nu supports HTTP response compression. Please use it. Response compression is orthogonal to the input methods and output formats.

The standard HTTP gzip mechanism is used. To indicated that you prepared to handle gzipped responses, include the Accept-Encoding: gzip request header. When the header is present, Validator.nu will gzip compress the response. You should also be prepared to receive an uncompressed, though, since in the future it may make sense to turn off compression under heavy CPU load.

Sample Code

There a sample Python program that shows how to deal with compression and redirects. (It may not be exemplary Python, though.)

CORS Example

You can also hit the API using CORS over AJAX. Basic example using jQuery.

Sample Messages

There are documents for provoking different message types.

No message HTML XHTML XML JSON GNU Text
Info HTML XHTML XML JSON GNU Text
Warning HTML XHTML XML JSON GNU Text
Error (precise location) HTML XHTML XML JSON GNU Text
Error (range location) HTML XHTML XML JSON GNU Text
Fatal HTML XHTML XML JSON GNU Text
IO HTML XHTML XML JSON GNU Text