This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.
You can install this project directly via pip install lxml_html_clean
or soon as an extra of lxml
via pip install lxml[html_clean]
. Both ways installs this project together with lxml itself.
For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.
https://lxml-html-clean.readthedocs.io/
BSD-3-Clause