Performance Test

Why I started down this road

WordPress has been my working environment for more than a decade, and for most of that time I’ve taken its
performance profile as a given. Render a page, hit the database a few dozen times, run the content through
wp_kses, ship some HTML, done. The hot paths are hot for good reason: they’re the only way
to guarantee that user-submitted content stays safe before it reaches a browser. But “good reason” and “optimal
implementation” aren’t the same thing, and there’s a lot of room between a correct answer and a fast one.

This post is a rough set of notes from the past few weeks spent looking at what a Rust-backed rewrite of some of
those hot paths might look like, whether it’s practical, and how much performance is actually left on the table. It’s
not a finished product, and it’s not a recommendation for anyone else — it’s more of a working diary than a
tutorial. Take it with the usual caveats: experimental software, personal server, nothing you should run in
production without reading the whole thing twice
.

The functions that matter

Profiling a handful of pages with SPX makes one thing clear very quickly: a small number of functions account for a
surprisingly large slice of wall time. In rough order, the usual suspects are:

  1. wp_kses and its many wrappers — wp_kses_post, wp_kses_data,

    wp_filter_post_kses, and friends — running against post content, comment bodies, excerpts, and

    anything else the editor saves.
  2. esc_html and esc_attr, called hundreds of times per page by themes and plugins as the

    last line of defense before values reach the browser.
  3. wpautop, which does paragraph and line-break conversion on older content that wasn’t written in

    blocks.
  4. sanitize_title and make_clickable, which show up less individually but add up once the

    page has a long list of posts.

Of these, kses is the heaviest by a wide margin. On a 3 KB blob of realistic post content,
sanitization takes more than a millisecond of pure PHP work, and on a 5 KB blob with nested tags and inline markup
it can approach five. Multiply that by every post on an archive page, every widget, every filter hook that re-runs the
same content through wp_kses_data, and it starts to matter.

Where the time actually goes

Most of it is — unsurprisingly — string manipulation. PHP’s preg_replace and
preg_replace_callback dominate the profile, followed by strtolower calls for tag/attribute
name normalization, htmlspecialchars for entity escaping, and a long tail of array lookups against the
global allowed-tags table. There’s not much obviously wasted work, which is both encouraging and a little disappointing:
it means a rewrite has to beat PHP on the fundamentals, not just on silly mistakes.

“Premature optimization is the root of all evil” is one of the most frequently quoted and least understood
aphorisms in programming — but if the hot path you’re optimizing is demonstrably hot, under a profiler, on real
traffic, then it’s not premature.

A quick benchmark

Here’s the shape of what we’re measuring. Numbers are from 10,000 iterations on PHP 8.3, against a real
WordPress install:

InputSizeStock PHPOptimizedSpeedup
Short paragraph with one link76 bytes162.44 ms55.65 ms2.9x
Medium post fragment740 bytes1,110.74 ms183.88 ms6.0x
Paragraph with an embedded <script> tag~1.3 KB265.05 ms74.48 ms3.6x
Long article body~3 KB3,610.77 ms526.77 ms6.9x

The speedup isn’t uniform, and that’s expected: the smaller the input, the more of the total time is spent on fixed
overhead rather than on the actual sanitization work. Where this matters most is for large posts with heavy inline
markup, which is also exactly the case most likely to bottleneck a content-heavy site. A 5x realistic
3-7x speedup
on the kses family alone doesn’t transform page load times on its own, but it materially reduces the
blocking work done on every save and on every render that runs content through a secondary kses filter.

What the code looks like

For anyone curious, here’s a sketch of the override activation code. The real version is a bit more involved —
it has to be careful about PHP’s opcode caching and function-table swapping, and it uses a PHP user-function shim to
work around some ABI constraints — but the shape is simple enough:

// At activation, define PHP shims that mirror each target's signature:                                      
  // function __patina_esc_html_shim__($text) {
  //     return patina_esc_html_internal((string) $text);                                                                 
  // }                                                                                                                  
  // Then swap the wp_kses / esc_html / esc_attr entries in the Zend                                                      
  // function table to point at the shim instead of the original.                                                         
  //                                                                                                                      
  // The cast is important: PHP's esc_html() accepts any scalar, but the                                                  
  // Rust internal expects a proper string. Doing the cast in the shim                                                    
  // keeps the Rust side strictly typed while still matching stock PHP                                                    
  // behavior for int, float, bool, and null inputs.

The shim dance exists for two reasons: it keeps pre-compiled PHP callers happy (they were compiled when the target
was a user function, and PHP bakes that assumption into the call-site opcodes), and it lets PHP do the scalar coercion
at a layer where the cost is essentially free.

Caveats, pitfalls, and things I’m still unsure about

  • Filter compatibility is critical. Any replacement has to still fire pre_kses,

    wp_kses_allowed_html, kses_allowed_protocols, and wp_kses_uri_attributes, or it

    will silently break plugins that customize those lists. Plugins like Gutenberg add several.
  • Output fidelity is non-negotiable. A faster kses that produces even slightly different output

    from stock WordPress will break something somewhere — a unit test, a downstream hash check, a plugin that compares

    content against itself. Fixtures generated from real PHP output are the only way to verify this, not hand-written

    tests.
  • The safecss_filter_attr path is its own rabbit hole. CSS-in-attribute validation has

    dozens of allowed properties and several regex-driven sanitization rules, and it’s worth tackling separately.
  • Testing on a personal site is a fine way to catch the obvious bugs — like an

    esc_attr override that crashes on integer inputs from wp-admin’s pagination controls — but it won’t

    find the subtle ones. For that you need fixture coverage and integration tests that register real WordPress

    filters.

What I’d do differently

If I were starting over knowing what I know now, I’d skip the “direct function-table swap” path entirely and always
use a PHP shim from day one. The direct swap feels like it should be a free optimization — why pay for an
extra PHP frame when you can overwrite the function pointer? — but it turns out there’s a whole class of subtle
bugs around ABI compatibility and type coercion that the shim avoids entirely. The shim frame costs about a microsecond
per call, which is invisible next to the underlying speedup, and uniformity is worth more than that microsecond.

Links for further reading, if you’re interested: the wp_kses
reference in the developer handbook
, the PHP internals manual, and the ext-php-rs crate which is doing most of the heavy lifting on the Rust
side of the FFI boundary.

A screenshot of a benchmark run, showing 
  the optimized path running roughly four times faster than stock PHP
The kind of number that looks good in a README but has to be earned the hard way — by matching stock
WordPress output exactly before you can claim a speedup. See & compare.

Wrapping up

This is still very much a work in progress, and I’d caution anyone against running it on a site that matters to them.
If you’re curious to follow along, the code is on GitHub, the integration tests are comprehensive enough to catch most
kinds of regression, and I’m happy to field questions. But don’t install it on a client site. Really. Not yet.
</disclaimer>

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *