Martin Spasov

Automated test runners - Part 2

06.10.2025 - 19.10.2025

Disclaimer: Please take everything I write here with a grain of salt—I’m by no means an expert. Some of these concepts I’m encountering for the first time, and my understanding might have significant gaps—and that’s okay. It’s one of the reasons I decided to take on this project. If you see anything wrong, let me know and I’ll fix it.

Last sprint I implemented the automated test runner for the parser and 90% of the test runner for the tokenizer. This sprint i wanted to finish the tokenizer runner and add more tests.

test format

The format i started with was mirroring the parser tests

#description
Unescaped ampersand in attribute value
#data
<h a='&'>
#errors
#states
Data state
#output
StartTag "h"
Attr "a" "&"

There were a few issues with the above format

First i started fixing these issues but at some point I realised that if i flatten the data more these issues wont happen.

The new format is basically 1 value per line with a header describing it. As long as the test data does not contain the header names we are good to go. I had a look over the data and didnt see any issues.

#description
Open angled bracket in unquoted attribute value state
#data
<a a=f<>
#errors
(1, 7): unexpected-character-in-unquoted-attribute-value
#states
Data state
#start-tag
a
#attr-name
a
#attr-value
f<
#end-test

There are more fields but you get the idea.

normalizing buffer

The spec dictates that the buffer should be normalized before processing. This means 2 things

Currently the buffer is copied so i can modify it freely.

more tests

After all the changes I was able to add more than 4000 new tests. Currently 11 of the 14 files in html5lib-tests are passing.

Supported Test suite name
No contentModelFlags.test
No domjs.test
Yes entities.test
Yes escapeFlag.test
Yes namedEntities.test
Yes numericEntities.test
Yes pendingSpecChanges.test
Yes test1.test
Yes test2.test
Yes test3.test
Yes test4.test
Yes unicodeChars.test
Yes unicodeCharsProblematic.test
No xmlViolation.test

I think this is enough work on the test runner for now. I have good amount of tests so now i can focus on refactoring the tokenizer. Initial implementation has a lot that can be improved.

Martin