Comments on: Using WP_HTML_Tag_Processor: What You Need to Know https://webdevstudios.com/2024/08/13/using-wp-html-tag-processor/ WordPress Design and Development Agency Tue, 20 Aug 2024 22:18:26 +0000 hourly 1 https://wordpress.org/?v=6.6.2 By: Dennis Snell https://webdevstudios.com/2024/08/13/using-wp-html-tag-processor/#comment-190111 Tue, 20 Aug 2024 22:18:26 +0000 https://webdevstudios.com/?p=27365#comment-190111 Hello Ramsés,

Thanks for sharing the article. One thing I always like pointing out is that even with all that code for your PCRE-based approach, it’s still wrong.

In fact, the DOMDocument approach is also broken. A trivial case to see this is if that example HTML is placed inside a TEXTAREA element, as even DOMDocument will falsely identify tags in there (PHP 8.4 with \DOM\HtmlDocument will help this). Worse still, DOMDocument will corrupt documents it doesn’t understand and remove legitimate content, injecting potential security exploits. These are just two small examples among thousands.

So why go through all the effort and pain and manual work just to find out that it’s still vulnerable?

Something is unexpected in your timings. In my own benchmarks I’ve found that the Tag Processor is roughly 50% faster than DOMDocument. I wonder what’s different. Please reach out if you’d like to examine deeper.

There appear to be a few extraneous escaping characters in this blog post. One example shows two slashes when creating the Tag Processor for the root-level namespace, but only one belongs. Similarly, some of the PCRE patterns have extra backslashes.

If you haven’t seen them already, you’ll probably love what’s coming in WordPress 6.7 and 6.8 in the HTML Processor.

]]>