Bogus comments and XSS
Let’s say we have a sanitizer that:
- Is not based on some HTML Parser
- Is able to correctly strip away any ‘dangerous’ html (like script tags, on* attributes etc.) outside comments
- Leaves
<!-- -->
as is - Does not replace
<
’s and>
’s with html entities
This is of course rather unlikely to be seen in the wild, but bypassing such sanitizer can be a fun xss challenge. So how can we smuggle some javascript past this sanitizer?
Bogus comments
According to the whatwg’s html spec:
Comments must have the following format:
1. The string "<!--".
2. Optionally, text, with the additional restriction that the text must not start with the string ">", nor start with the string "->", nor contain the strings "", or "--!>", nor end with the string "<!-".
3. The string "-->".
but there are other ways to get comments in our DOM.
For examle, when the parser comes across a ?
preceeded by a <
, then the unexpected-question-mark-instead-of-tag-name error occurs and everything till the nearest >
will be treated as a comment.
<? this is a comment > this is not a comment >
Cool, but how can we use this? Lets start simple:
<script>alert(1)</script>
this will obviously be removed by the sanitizer and this:
<!-- <script>alert(1)</script> -->
will be preserved, but will not lead to javascript execution. Hmm… but what if we could somehow comment out the <!--
?
Let’s try:
<? <!-- a> <script>alert(1)</script> -->
This produces the following DOM:
├─ #text:
├─ #comment: ? <!-- a
├─ #text:
└─ SCRIPT
└─ #text: alert(1)
Bingo! Since the <!--
was embedded in <? >
, it was treated by the parser as a comment’s text and thus <script>
was treated as an opening tag.
And that’s one of many reasons to use sanitizers based on HTML parsers, as this would not be possible if 1.
wasn’t the case.
Happy xss’ing! :)