+------ | Update 2004-01-05: You may read _much_ more in my newly published | book: "Innocent Code: A Security Wake-up Call for Web Programmers" | http://innocentcode.thathost.com/ +------ From: "Sverre H. Huseby" Subject: Re: CSS implication Date: Sat, 23 Mar 2002 10:30:33 +0100 To: vuln-dev@securityfocus.com | Once XSS code is injected into a persistent environment (HTML | Chat/Auction/Mail,etc.) it will stay there (for a length of | time), even if the input filtering problem is fixed at some point. That's another very good example on why one should not filter HTML on input, but rather on output. HTML filtering may be seen as just another meta character washing, like handling of single quotes in SQL strings. Meta character filtering should, in my opinion, be done right before passing the data to the system that needs filtering. In the HTML case this system is the visitor's browser, so filtering should be done in the output process. I like to separate sanitzion of input in two parts: Validation and meta character handling. Validiation is defined by the application domain, and may be done immediately when receiving the input. (Is this an integer? A valid mail address? A family name?) Sometimes validation includes the stripping of HTML meta characters. Meta character (or character sequences) washing depends on the system your application passes the data to. Different systems need different handling of meta characters. (To have security in depth, one should treat validation and meta character handling as independent operations. Even if a validation rule forbids HTML markup, the meta character handler should escape any occurrences of such characters.) I know people have next to religious opinions on this matter, so all og us will probably never agree on the "correct" way to sanitize HTML output. Those who say that HTML filtering should be done at input time often say that it is hard to remember to always escape the HTML meta characters for all outputs, so it's better to do it once and for all. I agree that it may be hard to remember, but I nevertheless think that handling meta characters on input is the wrong solution. One should rather build a framework that would encapsulate the output object/stream, and take responsibility for the washing. The same framework could hide the request object/stream, and force validation by eg. hiding the raw getPostData("FOO") and instead provide eg. getIntegerPostData("FOO") and similar for other domain types. The more security stuff we can hide from the average programmer, the better. Sverre.