A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Parser tests: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
(Create tokenizer testcase documentation)
 
(typos)
Line 4: Line 4:


==Tokenizer Tests==
==Tokenizer Tests==
The test format is [http://www.json.org/ json]. This has the advantage that the syntax is extensible and the disadvantage that it is relatively verbose.
The test format is [http://www.json.org/ json]. This has the advantage that the syntax allows backward-compatible extensions to the tests and the disadvantage that it is relatively verbose.


===Basic Structure===
===Basic Structure===
Line 14: Line 14:
  "input":"input_string",
  "input":"input_string",
  "output":[expected_output_tokens]}
  "output":[expected_output_tokens]}
  ]
  ]
  }
  }


Line 33: Line 33:


=== Open Issues ===
=== Open Issues ===
* Is the format too verbose
* Is the format too verbose?
* Do we want to allow the starting content model flag of the tokenizer to be specified (e.g. through a "contentModel" field in the test objects
* Do we want to allow the starting content model flag of the tokenizer to be specified (e.g. through a "contentModel" field in the test objects?
* Do we want to allow the test to pass if only a subset of the actual tokens emitted matches the expected_output_tokens list.
* Do we want to allow the test to pass if only a subset of the actual tokens emitted matches the expected_output_tokens list?
* Do we want an "AtheistParseError"
* Do we want the "AtheistParseError"?

Revision as of 14:29, 11 December 2006

Parser Tests

This page documents the unit-test format(s) being used for implementations of the HTML5 parsing spec. The aim is to produce implementation-independent, self-describing tests that can be shared between any groups working on these technologies.

Tokenizer Tests

The test format is json. This has the advantage that the syntax allows backward-compatible extensions to the tests and the disadvantage that it is relatively verbose.

Basic Structure

{"tests":
[

{"description":"Test description",
"input":"input_string",
"output":[expected_output_tokens]}

]
}

input_string is a string literal containing the input string to pass to the tokenizer expected_output_tokens is a list of tokens, ordered with the first produced by the tokenizer the first (leftmost) in the list. The list must mach the complete list of tokens that the tokenizer should produce. Valid tokens are:

["DOCTYPE", name, error?]
["StartTag", name, {attributes}])
["EndTag", name]
["Comment", data]
["Character", data]
"ParseError"
"AtheistParseError"


Multiple tests per file are allowed simply by adding more objects to the "tests" list.

Open Issues

  • Is the format too verbose?
  • Do we want to allow the starting content model flag of the tokenizer to be specified (e.g. through a "contentModel" field in the test objects?
  • Do we want to allow the test to pass if only a subset of the actual tokens emitted matches the expected_output_tokens list?
  • Do we want the "AtheistParseError"?