+------ | Update 2004-01-05: You may read _much_ more in my newly published | book: "Innocent Code: A Security Wake-up Call for Web Programmers" | http://innocentcode.thathost.com/ +------ From: "Sverre H. Huseby" Subject: Re: CSS and PHP question Date: Thu, 14 Mar 2002 08:57:16 +0100 To: webappsec@securityfocus.com I personally like to split the "filtering" in two or three parts: 1. Input validation. Can (and should?) be done as the first step of every script. Validation checks that input has the correct type, eg. that an integer is in fact an integer, that an E-mail address is a legal address, that a name contains only characters from a certain set, and so on. The action for illegal input depends on whether the input is user generated (input fields in a form) or server generated (hidden fields, drop down lists, check boxes, URLs from a previous page, cookies, ...). Errors in user input may be caused by misspelling or lack of knowledge of our input validation rules. The action taken is typically to redisplay the input form with an error message. Errors in server generated input should not happen and may be a sign of an intrusion attempt. I normally just display a very simple error page noting that the incident has been logged in such cases. 2. Sub-system meta-character "washing" (escaping). The handling of meta-characters depend on what sub-system the data is passed to. An SQL query needs other washing than an XPath query, which again needs other washing than a system call dealing with the file system. I like to delay the meta-character washing to when data is passed to the sub-system in question, as washing transforms the input. Also, washing needs to be done on anything passed to subsystems, not just user generated input. Washing is not the same as validation. 3. Washing of HTML output (may be seen as meta-character washing for an HTML sub-system, so this point is somewhat redundant). Whatever is passed to the client side passes through an HTML character encoding filter. Some people like to do HTML encoding on input when it comes in, but I prefer delaying it to output time for a couple of reasons: * It's not just user generated input that must be HTML encoded: When reading from a file, a database or whatever, HTML encoding should be done before passing the content to the client. It is easier to remember that if the rule is "wash output when output is done". * I don't like storing HTML character entities in a database. It makes it harder to search from non-web based interfaces, and it conflicts with number of characters in a field (a name of 20 characters may take up more after HTML encoding). As you can see, doing both input validation and meta-character washing may be redundant if the input validation step forbids the meta-characters. Often, redundancy is considered bad, but when it comes to security I think it is a must. And I would like to add: I like to build a framework for doing validation and washing, and if possible, force all scripts to use the framework rather than directly picking up the input variables. Sverre.