HTML attributes without spaces

This is just another idea for an xss challenge that can teach us something new about HTML parsing.

Let’s say we need to sneak <img src onerror="alert(1)"> past a sanitizer that rejects any input if it contains whitespace characters. What can we do in such situation?


Attributes tokenizing

According to the WHATWG HTML Spec, after the tokenizer consumes <img characters, it is in the tag name state and our goal will be to get into the attribute name state from there. To do it in the intended way, we should append our payload with one of these characters:

U+0009 CHARACTER TABULATION (tab)
U+000A LINE FEED (LF)
U+000C FORM FEED (FF)
U+0020 SPACE
    Switch to the before attribute name state.

but the sanitizer won’t allow us to use any of these. What are the other options?

U+002F SOLIDUS (/)
    Switch to the self-closing start tag state.

U+003E GREATER-THAN SIGN (>)
    Switch to the data state. Emit the current tag token.

ASCII upper alpha
    Append the lowercase version of the current input character
    (...) to the current tag token's tag name.

U+0000 NULL
    This is an unexpected-null-character parse error. 
    Append a U+FFFD REPLACEMENT CHARACTER character
        to the current tag token's tag name.

EOF
    This is an eof-in-tag parse error. Emit an end-of-file token.

Anything else
    Append the current input character to the current tag token's tag name.

U+002F SOLIDUS (/) looks interesting. Let’s see where that leads us.


After the tokenizer consumes <img/ it is in the self-closing tag state and then we can try:

U+003E GREATER-THAN SIGN (>)
    Set the self-closing flag of the current tag token. 
    Switch to the data state. Emit the current tag token.

which won’t help us

EOF
    This is an eof-in-tag parse error. Emit an end-of-file token.

which also won’t help us, even if we could control where the file with our payload ends.

What about the last option?

Anything else
    This is an unexpected-solidus-in-tag parse error.
    Reconsume in the before attribute name state.

Reconsume in the before attribute name state, bingo!


So, after the tokenizer consumes <img and /, it ends up in the same state as it would after consuming <img and a space!

Therefore:

<img/src/onerror="alert(1)">

is treated as:

<img src onerror="alert(1)">

which is what we need.


Happy xss’ing! :)