Skip to content

Latest commit

 

History

History
60 lines (38 loc) · 1.32 KB

README.rst

File metadata and controls

60 lines (38 loc) · 1.32 KB

Html cleaner and sanitizer for Python projects and as standalone app

  • python >= 2.5
  • BeautifulSoup

html_cleaner.clear.clear_html_code(text)

Clean up HTML code from tags that are not allowed. Structure of allowed tags can be found at needs.cfg. clear.py is generated by html_cleaner/generator.py with needs.cfg as config file.

Simple usage:

from html_cleaner.clear import clear_html_code

clear_html_code("""
    <a href="/" title="test" alt="test">link</a>
    <javascript>alert(0);</javascript>
""")

./generator.py

Will generate clear.py source code file, according to rules specified at needs.cfg. Example of simpler configuration file can be found in example.cfg.

Configuration file contains hierarchical rules for white-list of html cleaner. For example look at example.cfg and needs.cfg (we use this one).

Development of html-cleaner happens at github: https://github.com/ProstoKsi/html-cleaner/

Copyright (C) 2009-2013 Illia Polosukhin, Vladyslav Frolov. This program is licensed under the MIT License (see LICENSE)