Safe HTML checker
23rd February 2003
I’ve finally enabled a subset of HTML in my comments. In doing so, I had several requirements that needed to be fulfilled:
- Entered markup must be valid to XHTML strict, to stop comments form breaking validation and keep things nice and tidy.
- No presentational markup! I want to maintain control over how things look via my stylesheets—comments posted should only be able to use structural HTML elements.
- Attributes should be restricted to those that add semantic meaning. Javascript event attributes and CSS related attributes should not be allowed.
- I should retain full control over the tags and attributes allowed in the comments.
- Submitted HTML must be kept free from anything that could pose a security risk, such as
javascript:
URLs.
The system I have implemented works by running submitted posts through an XML parser, which checks that each element is in my list of allowed elements, is nested correctly (you can’t put a blockquote
inside a p
for example) and doesn’t have any illegal attributes. My initial test have shown it to work pretty well, but if anyone wants to have a go at breaking it please, be my guest.
The code for the main class is available here: SafeHtmlChecker.class.php
More recent articles
- Reverse engineering some updates to Claude - 31st July 2025
- Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM - 31st July 2025
- My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX - 29th July 2025