Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML escape YAML after parsing to prevent invalidating YAML string #454

Merged
merged 1 commit into from
May 5, 2024

Conversation

dometto
Copy link
Member

@dometto dometto commented Dec 27, 2023

Resolves gollum/gollum#2024

We were previously HTML escaping the entire YAML string before parsing it. This means that some valid YAML was being scrubbed (in the case of the reported bug, the > character, but who knows if there are more?). This new approach first parses the YAML, and then recursively HTML escapes every key and value in the resulting hash.

@dometto
Copy link
Member Author

dometto commented Dec 27, 2023

A (obviously faster) alternative is to not HTML escape/sanitize the frontmatter hash at all, and make the frontend (mustache) responsible for escaping. @bartkamphorst what makes more sense do you think?

@bartkamphorst
Copy link
Member

Implementation looks clean. Can we expect a performance hit for recursively sanetizing medium-sized YAML frontmatter?

@dometto
Copy link
Member Author

dometto commented Dec 30, 2023

In theory yes, but in practice probably not? I think it is unlikely that front matter would consist of very deeply nested dicts and arrays. As a design question I'm unsure what the proper place for escaping is though (here or in the frontend). We sanitize most macro and page data in gollum-lib, but that's because in most cases we want to allow certain html tags and disallow others (for which we use loofah). For the front matter, we could (and maybe should?) just escape all html. So we could

a) let this macro return unescaped content and let mustache handle it in the frontend (a break with other macros).
Or
b) handle escaping as in this PR, but instead of using the standard Sanitization class (i.e. loofah), use something presumably faster like CGI.escape_html?

@dometto
Copy link
Member Author

dometto commented May 4, 2024

@bartkamphorst just noticed this PR again, it's been a while but do you have any thoughts on the proper way to handle this?

Copy link
Member

@bartkamphorst bartkamphorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this solution is fine!

@dometto dometto merged commit 9fc7bd9 into gollum:master May 5, 2024
4 checks passed
@dometto dometto deleted the fix_yaml_escaping branch May 5, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YAML frontmatter cannot have a folded scalar
3 participants