Skip to content

RichardVasquez/kses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the initial checkin of code in the process of being copied from its source at sourceforge and getting ready to be updated many years later.

I was surprised to see that kses is still being used these days, and that has inspired me to pick it back up and see what might be needed to be added to it to see if it can be made to work even better.

Updates to follow, soon.


kses 0.2.2 README  [kses strips evil scripts!]
=================

* INTRODUCTION *


Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted
HTML elements and attributes, no matter how malformed HTML input you give it.
It also does several checks on attribute values. kses can be used to avoid
Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks,
among other things.

The program is released under the terms of the GNU General Public License. You
should look into what that means, before using kses in your programs. You can
find the full text of the license in the file COPYING.


* FEATURES *


Some of kses' current features are:

* It will only allow the HTML elements and attributes that it was explicitly
told to allow.

* Element and attribute names are case-insensitive (a href vs A HREF).

* It will understand and process whitespace correctly.

* Attribute values can be surrounded with quotes, apostrophes or nothing.

* It will accept valueless attributes with just names and no values (selected).

* It will accept XHTML's closing " /" marks.

* Attribute values that are surrounded with nothing will get quotes to avoid
producing non-W3C conforming HTML
(<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML).

* It handles lots of types of malformed HTML, by interpreting the existing
code the best it can and then rebuilding new code from it. That's a better
approach than trying to process existing code, as you're bound to forget about
some weird special case somewhere. It handles problems like never-ending
quotes and tags gracefully.

* It will remove additional "<" and ">" characters that people may try to
sneak in somewhere.

* It supports checking attribute values for minimum/maximum length and
minimum/maximum value, to protect against Buffer Overflows and Denial of
Service attacks against WWW clients and various servers. You can stop
<iframe src= width= height=> from having too high values for width and height,
for instance.

* It has got a system for whitelisting URL protocols. You can say that
attribute values may only start with http:, https:, ftp: and gopher:, but no
other URL protocols (javascript:, java:, about:, telnet:..). The functions that
do this work handle whitespace, upper/lower case, HTML entities
("jav&#97;script:") and repeated entries ("javascript:javascript:alert(57)").
It also normalizes HTML entities as a nice side effect.

* It removes Netscape 4's JavaScript entities ("&{alert(57)};").

* It handles NULL bytes and Opera's chr(173) whitespace characters.

* There is a procedural version and two object-oriented versions (for PHP 4
  and PHP 5) of kses.


* USE IT *


It's very easy to use kses in your own PHP web application! Basic usage looks
like this:


<?php

include 'kses.php';

$allowed = array('b' => array(),
                 'i' => array(),
                 'a' => array('href' => 1, 'title' => 1),
                 'p' => array('align' => 1),
                 'br' => array());

$val = $_POST['val'];
if (get_magic_quotes_gpc())
  $val = stripslashes($val);
# You must strip slashes from magic quotes, or kses will get confused.

$val = kses($val, $allowed); # The filtering takes place here.

# Do something with $val.

?>


This definition of $allowed means that only the elements B, I, A, P and BR are
allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR
may not have any attributes. A may only have the attributes HREF and TITLE,
while P may only have the attribute ALIGN. You can list the elements and
attributes in the array in any mixture of upper and lower case. kses will also
recognize HTML code that uses both lower and upper case.

It's important to select the right allowed attributes, so you won't open up
an XSS hole by mistake. Some important attributes that you mustn't allow
include but are not limited to: 1) style, and 2) all intrinsic events
attributes (onMouseOver and so on, on* really). I'll write more about this in
the documentation that will be distributed with future versions of kses.

It's also important to note that kses' HTML input must be cleaned of all
slashes coming from magic quotes. If the rest of your code requires these
slashes to be present, you can always add them again after calling kses with
a simple addslashes() call.

You should take a look at the documentation in the docs/ directory and the
examples in the examples/ directory, to get more information on how to use
kses. The object-oriented versions of kses are also worth checking out, and
they're included in the oop/ directory.


* UPGRADING TO 0.2.2 *


kses 0.2.2 is backwards compatible with all previous releases, so upgrading
should just be a matter of using a new version of kses.php instead of an old
one.


* NEW VERSIONS, MAILING LISTS AND BUG REPORTS *


If you want to download new versions, subscribe to the kses-general mailing
list or even take part in the development of kses, we refer you to its
homepage at  http://sourceforge.net/projects/kses . New developers and beta
testers are more than welcome!

If you have any bug reports, suggestions for improvement or simply want to tell
us that you use kses for some project, feel free to post to the kses-general
mailing list. If you have found any security problems (particularly XSS,
naturally) in kses, please contact Ulf privately at  metaur at users dot
sourceforge dot net  so he can correct it before you or someone else tells the
public about it.

(No, it's not a security problem in kses if some program that uses it allows a
bad attribute, silly. If kses is told to accept the element body with the
attributes style and onLoad, it will accept them, even if that's a really bad
idea, securitywise.)


* OTHER HTML FILTERS *


Here are the other stand-alone, open source HTML filters that we currently know
of:

* Htmlfilter for PHP - the filter from Squirrelmail
  PHP
  Konstantin Riabitsev
  http://linux.duke.edu/projects/mini/htmlfilter/

* HTML::StripScripts and related CPAN modules
  Perl
  Nick Cleaton
  http://search.cpan.org/perldoc?HTML%3A%3AStripScripts

* SafeHtmlChecker [is this really open source?]
  PHP
  Simon Willison
  http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker

There are also a lot of HTML filters that were written specifically for some
program. Some of them are better than others.

Please write to the kses-general mailing list if you know of any other
stand-alone, open-source filters.


* DEDICATION *


kses 0.2.2 is dedicated to Audrey Tautou and Jean-Pierre Jeunet.


* MISC *


The kses code is based on an HTML filter that Ulf wrote on his own back in 2002
for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/
gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been
improved a lot since then.

To stop people from having sleepless nights, we feel the urgent need to state
that kses doesn't have anything to do with the KDE project, despite having a
name that starts with a K.

In case someone was wondering, Ulf is available for kses-related consulting.

Finally, the name kses comes from the terms XSS and access. It's also a
recursive acronym (every open-source project should have one!) for "kses
strips evil scripts".


// Ulf and the kses development group, February 2005