Skip to content

Html cleaner and sanitizer for Python projects and as standalone app

License

Notifications You must be signed in to change notification settings

ProstoKSI/html-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Html cleaner and sanitizer for Python projects and as standalone app

  • python >= 2.5
  • BeautifulSoup

html_cleaner.clear.clear_html_code(text)

Clean up HTML code from tags that are not allowed. Structure of allowed tags can be found at needs.cfg. clear.py is generated by html_cleaner/generator.py with needs.cfg as config file.

Simple usage:

from html_cleaner.clear import clear_html_code

clear_html_code("""
    <a href="/" title="test" alt="test">link</a>
    <javascript>alert(0);</javascript>
""")

./generator.py

Will generate clear.py source code file, according to rules specified at needs.cfg. Example of simpler configuration file can be found in example.cfg.

Configuration file contains hierarchical rules for white-list of html cleaner. For example look at example.cfg and needs.cfg (we use this one).

Development of html-cleaner happens at github: https://github.com/ProstoKsi/html-cleaner/

Copyright (C) 2009-2013 Illia Polosukhin, Vladyslav Frolov. This program is licensed under the MIT License (see LICENSE)

About

Html cleaner and sanitizer for Python projects and as standalone app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages