October 28, 2020

Bogus comments and XSS

Let’s say we have a sanitizer that:

Is not based on some HTML Parser
Is able to correctly strip away any ‘dangerous’ html (like script tags, on* attributes etc.) outside comments
Leaves  as is
Does not replace <’s and >’s with html entities

This is of course rather unlikely to be seen in the wild, but bypassing such sanitizer can be a fun xss challenge. So how can we smuggle some javascript past this sanitizer?

Bogus comments

According to the whatwg’s html spec:

Comments must have the following format:

1. The string "".

but there are other ways to get comments in our DOM.

For examle, when the parser comes across a ? preceeded by a <, then the unexpected-question-mark-instead-of-tag-name error occurs and everything till the nearest > will be treated as a comment.

<? this is a comment > this is not a comment >

Cool, but how can we use this? Lets start simple:

<script>alert(1)</script>

this will obviously be removed by the sanitizer and this:

<!-- <script>alert(1)</script> -->

will be preserved, but will not lead to javascript execution. Hmm… but what if we could somehow comment out the <!--?

Let’s try:

<? <!-- a> <script>alert(1)</script> -->

This produces the following DOM:

├─ #text:
├─ #comment: ? <!-- a
├─ #text:
└─ SCRIPT
    └─ #text: alert(1)

Bingo! Since the <!-- was embedded in <? >, it was treated by the parser as a comment’s text and thus <script> was treated as an opening tag.

And that’s one of many reasons to use sanitizers based on HTML parsers, as this would not be possible if 1. wasn’t the case.

Happy xss’ing! :)

← → Top