June 16, 2023

Performant HTML manipulation in WordPress 6.2

WordPress 6.2 quietly shipped with an update that will have a significant impact on our ability to work with markup: The WP_HTML_Tag_Processor class!

This class provides a performant way to work with tags and attributes in HTML markup – but not content within tags.

The idea is to provide a performant class that is capable of doing this. To quote the documentation:

… On the other hand, it will be faster than full-blown HTML parsers such as DOMDocument and use considerably less memory. It requires a negligible memory overhead, enough to consider it a zero-overhead system.

This is music to my ears because this means that we now have a reliable and performant alternative to DOMDocument and its many quirks to work with attributes and tags, or even RegEx.

However, we will still have to reach out to DOMDocument to fully parse and work with tag content – for now at least.

Here are a few use cases that I can think of for this:

Counting the number of list items
Getting the SRC attributes of all images
Look for images with missing ALT text
Counting how many items of a particular tag have a class
Adding a class to alternating tags
Adding an attribute to the second last column in a table that is not a whose cell is a TD, not a TH
Add a preload attribute to LINK tags in the HEAD tag
Adding classes to navigation menu items

This is exciting stuff, and allows us to do things that would traditionally have required JavaScript to do reliably.

My wishlist for the future:

Add the ability to create tags
Add the ability to get outer and inner HTML of tags

Performant HTML manipulation in WordPress 6.2

One response to “Performant HTML manipulation in WordPress 6.2”

Leave a Reply Cancel reply