Martin Spasov

Automated test runners - Part 2

06.10.2025 - 19.10.2025

Disclaimer: Please take everything I write here with a grain of salt—I’m by no means an expert. Some of these concepts I’m encountering for the first time, and my understanding might have significant gaps—and that’s okay. It’s one of the reasons I decided to take on this project. If you see anything wrong, let me know and I’ll fix it.

Last sprint I implemented the automated test runner for the parser and 90% of the test runner for the tokenizer. This sprint i wanted to finish the tokenizer runner and add more tests.

test format

The format i started with was mirroring the parser tests

#description
Unescaped ampersand in attribute value
#data
<h a='&'>
#errors
#states
Data state
#output
StartTag "h"
Attr "a" "&"

There were a few issues with the above format

need to escape new lines
need to escape quotes (“)
need to escape backward slashes (\)

First i started fixing these issues but at some point I realised that if i flatten the data more these issues wont happen.

The new format is basically 1 value per line with a header describing it. As long as the test data does not contain the header names we are good to go. I had a look over the data and didnt see any issues.

#description
Open angled bracket in unquoted attribute value state
#data
<a a=f<>
#errors
(1, 7): unexpected-character-in-unquoted-attribute-value
#states
Data state
#start-tag
a
#attr-name
a
#attr-value
f<
#end-test

There are more fields but you get the idea.

normalizing buffer

The spec dictates that the buffer should be normalized before processing. This means 2 things

replace \r\n with \n
replace remaining \r with \n

Currently the buffer is copied so i can modify it freely.

more tests

After all the changes I was able to add more than 4000 new tests. Currently 11 of the 14 files in html5lib-tests are passing.

Supported	Test suite name
No	contentModelFlags.test
No	domjs.test
Yes	entities.test
Yes	escapeFlag.test
Yes	namedEntities.test
Yes	numericEntities.test
Yes	pendingSpecChanges.test
Yes	test1.test
Yes	test2.test
Yes	test3.test
Yes	test4.test
Yes	unicodeChars.test
Yes	unicodeCharsProblematic.test
No	xmlViolation.test

I think this is enough work on the test runner for now. I have good amount of tests so now i can focus on refactoring the tokenizer. Initial implementation has a lot that can be improved.

Martin