Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text escaping of $& becomes >< #158

Open
SvanteRichter opened this issue Sep 9, 2022 · 4 comments
Open

Text escaping of $& becomes >< #158

SvanteRichter opened this issue Sep 9, 2022 · 4 comments

Comments

@SvanteRichter
Copy link

Hello!

I think I have found a slight issue with how text is escaped. using this code:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('')
document.appendChild(document.createElement('script'))
document.querySelector("script").textContent = 'var $=1;$&&true'
console.log(document.toString())

logs <script>var $=1;><&true</script> as output. $& has been replaced by ><.

If I run similar code in a browser I get <script>var $=1;$&&true</script>. The code I ran in the browser is

var doc = document.createDocumentFragment()
doc.appendChild(document.createElement('script'))
doc.querySelector("script").textContent = 'var $=1;$&&true';
console.log(doc.querySelector("script").outerHTML)

This is a bit problematic for me since it means that inlineing my JS after SSR produces invalid code which will not run.

I have tested it in deno (v1.24.3), node (v18.2.0) and bun.sh (v0.1.11) so I don't think it's anything platform specific.

Thanks!

@WebReflection
Copy link
Owner

what's document.appendChild supposed to do there? you have a body, a head, why document to append scripts? just to be sure it's not an issue caused yet another time by empty documents to parse

@SvanteRichter
Copy link
Author

Sorry, that was just an example since In my SSR I first run the app and then use document.querySelector("script").textContent = myScript to inline the script. This example shows the same bug and might be more proper:

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>var $=1;$&&true</script>')
console.log(document.toString())

Output: <script>var $=1;><&true</script>

@WebReflection
Copy link
Owner

those are two different code paths/logic ... one thing is textContent = '...' which is not changed in code (for what I could tell) the other is parseHTML with stuff that the parser might get wrong so it's a 3rd party issue ... which one is true? both?

the toString actuallt escape chars, don't drop these https://github.com/WebReflection/linkedom/blob/main/esm/interface/text.js#L41

@SvanteRichter
Copy link
Author

one thing is textContent = '...' which is not changed in code (for what I could tell)

I'm sorry, I don't quite understand what you mean. Do you mean that it's not reproducible with the code I gave or that it isn't a bug?

which one is true? both?

I don't know, but it seems to me like both code paths produce the same (seemingly wrong) output. If they are completely different code-paths then it would seem that they both have the same bug.

the toString actuallt escape chars, don't drop these

Sorry, I don't understand what you mean?


If it helps to clarify it seems to only happen with script elements, for example this does both textContent and html parsing for both script elements and divs, and it only seems to produce the bug with the script elements (and there does it for both):

import { parseHTML } from "https://esm.sh/linkedom"
const { document } = parseHTML('<script>$&</script><script class="t"></script><div>$&</div><div class="t2"></div>')
document.querySelector(".t").textContent = '$&'
document.querySelector(".t2").textContent = '$&'
console.log(document.toString())

output: <script>><</script><script class="t">><</script><div>$&amp;</div><div class="t2">$&amp;</div>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants