Your code works, but is it safe? When writing your theme and plugin code, you’ll need to be extra cautious of how you handle data coming into WordPress and how it’s presented to the end user. This commonly comes up when building a settings page for your theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post. There is a distinction between how input and output are managed, and this document will walk you through that.
(If you’re interested in more thoughts on why WordPress.com VIP takes these practices so seriously, read The Importance of Escaping All The Things from June 2014.)
Guiding Principles #
- Never trust user input.
- Escape as late as possible.
- Escape everything from untrusted sources (like databases and users), third-parties (like Twitter), etc.
- Never assume anything.
- Never trust user input.
- Sanitation is okay, but validation/rejection is better.
- Never trust user input.
“Escaping isn’t only about protecting from bad guys. It’s just making our software durable. Against random bad input, against malicious input, or against bad weather.”
Validating: Checking User Input #
To validate is to ensure the data you’ve requested of the user matches what they’ve submitted. There are several core methods you can use for input validation; usage obviously depends on the type of fields you’d like to validate. Let’s take a look at an example.
Say we have an input area in our form like this:
<input id="my-zipcode" type="text" maxlength="5" name="my-zipcode" />
Just like that, we’ve limited my user to five characters of input, but there’s no limitation on what they can input. They could enter “11221” or “eval(“. If we’re saving to the database, there’s no way we want to give the user unrestricted write access.
This is where validation plays a role. When processing the form, we’ll write code to check each field for its proper data type. If it’s not of the proper data type, we’ll discard it. For instance, to check “my-zipcode” field, we might do something like this:
$safe_zipcode = intval( $_POST['my-zipcode'] ); if ( ! $safe_zipcode ) $safe_zipcode = ''; update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );
The intval() function casts user input as an integer, and defaults to zero if the input was a non-numeric value. We then check to see if the value ended up as zero. If it did, we’ll save an empty value to the database. Otherwise, we’ll save the properly validated zipcode.
Note that we could go even further and make sure the the zip code is actually a valid one based on ranges and lengths we expect (e.g. 111111111 is not a valid zip code but would be saved fine with the function above).
This style of validation most closely follows WordPress’ whitelist philosophy: only allow the user to input what you’re expecting. Luckily, there’s a number of handy helper functions you can use for most every data type.
Sanitizing: Cleaning User Input #
Sanitization is a bit more liberal of an approach to accepting user data. We can fall back to using these methods when there’s a range of acceptable input.
For instance, if we had a form field like this:
<input id="title" type="text" name="title" />
We could sanitize the data with the
$title = sanitize_text_field( $_POST['title'] ); update_post_meta( $post->ID, 'title', $title );
Behind the scenes, the function does the following:
- Checks for invalid UTF-8
- Converts single < characters to entity
- Strips all tags
- Remove line breaks, tabs and extra white space
- Strip octets
The sanitize_*() class of helper functions are super nice for us, as they ensure we’re ending up with safe data and require minimal effort on our part.
In some instances using WP_KSES and it’s related functions might be a good idea as you can easily clean HTML while keeping anything relevant to your needs present.
Escaping: Securing Output #
For security on the other end of the spectrum, we have escaping. To escape is to take the data you may already have and help secure it prior to rendering it for the end user. WordPress thankfully has a few helper functions we can use for most of what we’ll commonly need to do:
esc_html() we should use anytime our HTML element encloses a section of data we’re outputting.
<h4> <!-- <?php echo esc_html( $title ); ?> --> </h4>
esc_url() should be used on all URLs, including those in the ‘src’ and ‘href’ attributes of an HTML element.
<img alt="" src="<?php echo esc_url( $great_user_picture_url ); ?>" />
var value = '<?php echo esc_js( $value ); ?>';
esc_attr() can be used on everything else that’s printed into an HTML element’s attribute.
<ul class="<!--<?php echo esc_attr( $stored_class ); ?>-->">
wp_kses() can be used on everything that is expected to contain HTML. There are several variants of the main function, each featuring a different list of built-in defaults. A popular example is wp_kses_post(), which allows all markup normally permitted in posts. You can of course roll your own filter by using wp_kses() directly.
<?php echo wp_kses_post( $partial_html ); echo wp_kses( $another_partial_html , array( 'a' => array( 'href' => array(), 'title' => array() ), 'br' => array(), 'em' => array(), 'strong' => array(), );) ?>;
As an example, passing an array to wp_kses() containing the member
'a' => array( 'href' , 'title', )
means that only those 2 html attributes will be allowed for a tags — all the other ones will be stripped. Referencing a blank array from any given key means that no attributes are allowed for that element and they should all be stripped.
There has historically been a perception that wp_kses() is slow. While it is a bit slower than the other escaping functions, the difference is minimal and does not have as much of an impact as most slow queries or uncached functions would.
It’s important to note that most WordPress functions properly prepare the data for output, and you don’t need to escape again.
<h4><?php the_title(); ?></h4>
rawurlencode() should be used over urlencode() for ensure URLs are correctly encoded. Only legacy systems should use `urlencode()`.
<?php echo esc_url( 'http://example.com/a/safe/url?parameter=' . rawurlencode( $stored_class ) ); ?>
Always Escape Late #
It’s best to do the output escaping as late as possible, ideally as data is being outputted.
// Okay, but not that great $url = esc_url( $url ); $text = esc_html( $text ); echo '<a href="'. $url . '">' . $text . '</a>'; // Much better! echo '<a href="'. esc_url( $url ) . '">' . esc_html( $text ) . '</a>';
This is for a few reasons:
- It makes our code reviews and deploys happen faster, because rather than hunting through many lines of code, we can glance at it and know it’s safe for output.
- Something could inadvertently change the variable between when it was first cast and when it’s output, introducing a potential vulnerability.
- Future changes could refactor the code significantly. We review code under the assumption that everything is being output escaped/cast – if it’s not and some changes go through that make it no longer safe to output, we may incorrectly allow the code through, since we’re assuming it’s being properly handled on output.
- Late escaping makes it easier for us to do automatic code scanning (saving us time and cutting down on review/deploy times) – something we’ll be doing more of in the future.
- Escaping/casting on output simply removes any ambiguity and adds clarity (always develop for the maintainer).
Escape on String Creation #
It is sometimes not practical to escape late. In a few rare circumstances you cannot pass the output to wp_kses since by definition it would strip the scripts that are being generated.
In situations like this always escape while creating the string and store the value in a variable that is a postfixed with _escaped, _safe or _clean. So instead of $variable do $variable_escaped or $variable_safe.
Functions must always return “safe” html that do not rely on them being late escaped.This allows you to do echo my_custom_script_code(); without needing the script tag to be passed thru a version of WP_KSES that would allow such tags.
Case Studies and FAQs #
We know that validating, sanitizing and escaping can be a complex topic; we’ll add some specific case studies and frequently asked questions here as we think they might be helpful.
Q: Doesn’t a function like WP_Query handle sanitizing user input before running a query for me? Why do I need to also sanitize what I send to it?
A: For maximum security, we don’t want to rely on WP_Query to sanitize our data and hope that there are no bugs or unexpected interactions there now or in the future. It’s a good practice to sanitize anything coming from user-land as soon as you begin to interact with it, treating it as potentially malicious right away.
Q: Isn’t WP_KSES_* slow?
A: Even on large strings WP_KSES_* will not add a significant overhead to your pageload. Most of your pageloads should be cached pageloads and the first thing to focus on should be to make sure as many of your end users as possible are getting cached pages. Slow SQL Queries as well as Remote requests are often next on the list. Escaping is often negligible compared to those items.
Q: Why do I need to escape these values? It is impossible for them to be unsafe.
A: It is currently impossible for them to be unsafe. But a later code change could easily make it that the variable is modified and therefore can no longer be trusted. Always late escaping whenever possible makes the code much more robust and future proof.
To recap: Follow the whitelist philosophy with data validation, and only allow the user to input data of your expected type. If it’s not the proper type, discard it. When you have a range of data that can be entered, make sure you sanitize it. Escape data as much and as late as possible on output to avoid XSS and malformed HTML.
Take a look through the Data Validation Codex page to see all of the sanitization and escaping functions WordPress has to offer.