Text escaping of $& becomes >< #158

SvanteRichter · 2022-09-09T13:38:34Z

Hello!

I think I have found a slight issue with how text is escaped. using this code:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('')
document.appendChild(document.createElement('script'))
document.querySelector("script").textContent = 'var $=1;$&&true'
console.log(document.toString())

logs <script>var $=1;><&true</script> as output. $& has been replaced by ><.

If I run similar code in a browser I get <script>var $=1;$&&true</script>. The code I ran in the browser is

var doc = document.createDocumentFragment()
doc.appendChild(document.createElement('script'))
doc.querySelector("script").textContent = 'var $=1;$&&true';
console.log(doc.querySelector("script").outerHTML)

This is a bit problematic for me since it means that inlineing my JS after SSR produces invalid code which will not run.

I have tested it in deno (v1.24.3), node (v18.2.0) and bun.sh (v0.1.11) so I don't think it's anything platform specific.

Thanks!

The text was updated successfully, but these errors were encountered:

WebReflection · 2022-09-09T22:10:10Z

what's document.appendChild supposed to do there? you have a body, a head, why document to append scripts? just to be sure it's not an issue caused yet another time by empty documents to parse

SvanteRichter · 2022-09-10T13:55:07Z

Sorry, that was just an example since In my SSR I first run the app and then use document.querySelector("script").textContent = myScript to inline the script. This example shows the same bug and might be more proper:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>var $=1;$&&true</script>')
console.log(document.toString())

Output: <script>var $=1;><&true</script>

WebReflection · 2022-09-10T19:05:44Z

those are two different code paths/logic ... one thing is textContent = '...' which is not changed in code (for what I could tell) the other is parseHTML with stuff that the parser might get wrong so it's a 3rd party issue ... which one is true? both?

the toString actuallt escape chars, don't drop these https://github.com/WebReflection/linkedom/blob/main/esm/interface/text.js#L41

SvanteRichter · 2022-09-10T23:41:15Z

one thing is textContent = '...' which is not changed in code (for what I could tell)

I'm sorry, I don't quite understand what you mean. Do you mean that it's not reproducible with the code I gave or that it isn't a bug?

which one is true? both?

I don't know, but it seems to me like both code paths produce the same (seemingly wrong) output. If they are completely different code-paths then it would seem that they both have the same bug.

the toString actuallt escape chars, don't drop these

Sorry, I don't understand what you mean?

If it helps to clarify it seems to only happen with script elements, for example this does both textContent and html parsing for both script elements and divs, and it only seems to produce the bug with the script elements (and there does it for both):

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>$&</script><script class="t"></script><div>$&</div><div class="t2"></div>')
document.querySelector(".t").textContent = '$&'
document.querySelector(".t2").textContent = '$&'
console.log(document.toString())

output: <script>><</script><script class="t">><</script><div>$&</div><div class="t2">$&</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text escaping of $& becomes >< #158

Text escaping of $& becomes >< #158

SvanteRichter commented Sep 9, 2022

WebReflection commented Sep 9, 2022

SvanteRichter commented Sep 10, 2022

WebReflection commented Sep 10, 2022

SvanteRichter commented Sep 10, 2022

Text escaping of $& becomes >< #158

Text escaping of $& becomes >< #158

Comments

SvanteRichter commented Sep 9, 2022

WebReflection commented Sep 9, 2022

SvanteRichter commented Sep 10, 2022

WebReflection commented Sep 10, 2022

SvanteRichter commented Sep 10, 2022