The Importance of Escaping All The Things

Nick Daugherty is WordPress.com VIP Lead Engineer. Here he shares some important information about escaping in code and how that can increase security in WordPress sites anywhere in the world. 

If there’s one issue we flag more often than all others in code reviews…it’s escaping.

For starters, we should all agree that escaping (fundamentally, sanitizing input and escaping output) is a critical aspect of web application security. What may be less universally agreed upon is where to escape. On that point, we require “late escaping“- escaping as close as possible to the point of output – and further, we now require it everywherealways.

You may now be thinking:

“Do I really need to “late escape” everything? Always? Even core WordPress functions?”

We hear you. And, here’s why this is important to us:

In addition to some automated scanning, we manually review every line of code our VIP customers commit to the VIP platform. And, while the original author of a particular piece of code may know exactly where they’ve already escaped their output and/or it’s convenient to trust a WordPress core function’s escaping, it’s much, much faster and more reliable for our reviewers to check for “late escaping”. This way a reviewer can be 100% positive that output has been escaped properly by simply looking at the point of output.

We acknowledge this standard requires a bit more effort from developers writing code for the VIP platform. But, we see the benefit as three fold:

1. “late escaping” makes VIP reviewers more efficient, which means customer code is reviewed and deployed faster,

2. a consistent practice of “late escaping” makes missed escaping obvious, thereby reducing the chances that unescaped output makes it into production,

3. a consistently applied escaping standard- and we’ve chosen “late escaping” as ours- allows automated tools to better augment our human reviewers…further improving on #1 and #2 above.

To illustrate the importance of escaping everything, let’s look at a pattern where escaping is commonly omitted: Widget form elements.

A Widget form may look like this:

<label for="<?php echo $this->get_field_id( 'title' ); ?>"><?php _e( 'Title:' ); ?></label>
<input type="text" id="<?php echo $this->get_field_id( 'title' ); ?>" title="<?php echo $this->get_field_id( 'title' ); ?>" name="<?php echo $this->get_field_name( 'title' ); ?>" value="<?php echo esc_attr( $title ); ?>"/>

Those get_field_id( 'title' ); ?> and get_field_name( 'title' ); ?> calls should be safe right, since they are core WordPress functions?

Let’s see what happens when we drop this bit of code anywhere in our codebase:

add_action( 'widget_form_callback', function( $instance, $widget ){
    $widget->id_base = '"><script>alert("Greetings! You have been hacked.");</script>"<';

    return $instance;
}, -999, 2);

Oh no! Javascript has been injected where it shouldn’t be.

Here is a more real world case illustrating how easy it is to get to a point where we’re outputting values of indeterminate origin:

add_action( 'widget_form_callback', function( $instance, $widget ){
    My_Widget_Controller::setup_widget_form( $instance, $widget );

    return $instance;
}, 10, 2);

// ...

class My_Widget_Controller {
    static function setup_widget_form( $instance, $widget ) {
        $widget->id_base    .= self::get_widget_id_base( $instance, $widget );
        $widget->name       .= self::get_widget_name( $instance, $widget );
    }

    static function get_widget_id_base( $instance, $widget ) {
        global $my_config_object;

        return get_option( 'my_widget_id_base_prefix' ) . '_' . $my_config_object['current_site']['widgets']['id_base'];
    }

    static function get_widget_name( $instance, $widget ) {
        $name = '';

        // ... arbitrary processing to arrive at a $name

        return $name;
    }
}

Now we’re down a rabbit hole, and it’s not so clear that get_field_id( 'title' ) will give us safe values.

Even values that are ‘100% safe and there is no way this could ever be abused’ need to be escaped, because future refactorings can introduce hard-to-detect vulnerabilities if there is unescaped code hanging around:

$class = ( 'featured' == $category ) ? 'home-featured' : 'standard';

?>

<div class="<?php echo $class; ?>">...

Seems harmless enough – $class can ever only have two values. Great, we’re safe!

Until 6 months from now, when a new business need refactors this to:

function my_get_post_class( $post ) {
   // ... arbitrary processing to determine a post class. Maybe we pull it from meta now?
   return get_post_meta( $post->ID, 'custom_post_class' );
}

// ...

$class = my_get_post_class( $post );

?>

<div class="<?php echo $class; ?>">...

Hmmm, now we’re outputting meta values directly, and there is no way to know that without following a potentially complex program flow – a recipe for an exploitable site.

What about constants? Those are the foolproof, never changing pillars of security, right? Consider the following:

// let's say this is for setting a class name, depending on the site we're on
$my_setting = get_option( 'safe_data' );

// ... elsewhere

define( 'MY_SAFE_CONSTANT', $my_setting );

// ...

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

later down the line, our option gets updated (somehow):

update_option( 'safe_data', '"><script>alert("hax0rd");</script>' );

Another example of how constants can be exploited is conditional constants:

if ( ! defined( 'MY_SAFE_CONSTANT' ) ) {
    define( 'MY_SAFE_CONSTANT', 'safe-value' );
}

// ... elsewhere

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

As a hacker, all I need to do to inject anything I like into the page is to add this somewhere before the previous code:

define( 'MY_SAFE_CONSTANT', 'unsafe value' );

What About Core Functions?

This concept applies to nearly all code in a theme, including many core functions that return a value. Some core functions that output, such as bloginfo(), have output escaping applied automatically – we recommend using the equivalent ‘return’ function and manually escaping

Example: bloginfo( 'name' ); could be rewritten as esc_html( get_bloginfo( 'name' ) );. This approach ensures everything is properly escaped and removes ambiguity.

A post on the merits of escaping would be incomplete without addressing the fact that most esc_*() functions in WordPress apply a filter before returning. While true, the simple answer is: Filters on the escaping functions simply are not allowed on WP.com, and would be quickly caught during code review. Your site is always much safer when escaping all output.

The Bottom Line

If it’s not escaped on output, it’s potentially exploitable. Never underestimate the abilities of an attacker – they’re experts at finding the way to make the ‘this should never, ever, be possible‘ things happen :). For maximum security, we must escape all the things.

11 thoughts on “The Importance of Escaping All The Things

  1. That’s a great write up Nick. Specially the examples put things in perspective. Next time when I am reviewing my team’s code instead of looking for missed escaping, I will look for potential vulnerabilitlies for unwanted code injection. I am going to make it a compulsory read for me and my team. Thanks!

  2. I understand the need for sanitizing of user input – but if that input is already sanitized – what is the purpose of escaping? I did some reading of the WP codex to try and understand why but can’t seem to grasp the concept.

    • For example, all your example include a hacker having access to the code that is run. If a hacker has access to the code, it’s already too late. If a hacker is able to write “define( ‘MY_SAFE_CONSTANT’, ‘unsafe value’ );” he is also equally able to just write “echo $mess_everything_up;” – no?

      • If an attacker has access to the code, then they could definitely do something obviously bad, very easily – though you may be more likely to detect that change quickly.

        We also need to protect ourselves against obfuscated, hard to follow chains of logic where evil things can hide. A legitimate looking change to an obscure plugin file could open a door and go undetected for years, while an ‘obvious’ change to a main theme file would be noticed quickly.

        Here is an analogy: Driving down the freeway, there are 1,000 things that can go wrong – another driver may swerve into your lane, a sinkhole could open up, or your tire can blow out. All of these have negative consequences for the occupants, and it may already be ‘too late’ – but we still wear our seatbelts to maximize our chances of surviving.

    • Assuming a site perfectly sanitizes 100% of input, which is not a reality given the nature of many plugins / themes, and potential Zero Day exploits, there are still ways for malicious content to find its way in.

      One example would be direct db manipulation – perhaps an attacker has gained access to your database, but not your webserver. They could change certain values directly in the db, hoping they are not escaped on output.

      Or, maybe they have somehow gained access to a memcached server your site is using for caching – they could inject a post object for an expected key with malicious values and bypass any input sanitizations.

      Output escaping also protects against inadvertent vulnerabilities in the code itself – that one innocuous filter that is supposed to do X, but ends up doing X and Y due to a bug.

      In short, trust nothing.

  3. @Nick

    I’ve never worked on a VIP site but have worked on a corporate site that uses similar hosting with their own security review team (who happen to also be one of the VIP partners.) I found this “escape all output” rule makes code maintenance difficult and puts the burden on the developer (who spends months developing) to keep the workload of the reviewer light (who spends hours reviewing)

    I do agree with all your use-cases above but I explicitly build PHP classes whose job is to abstract post types and taxonomy terms, to minimize duplicated code, and to ensure method output is safe. This latter ensures the logic in the theme templates is easy to follow. So I have difficultly with escaping that which is already known to be safe because escaping obscures the logic in the theme templates and makes them significantly harder to read.

    Now I do understand that VIP is your platform and as such you get to make the rules to make it easier on your code reviewers. However I do wish your article emphasized more that this is your choice optimized for your benefit so that all the cargo cultists won’t simply presume that late escaping is “the one true way” to achieve security for other scenarios.

    • “I found this “escape all output” rule makes code maintenance difficult and puts the burden on the developer […] to keep the workload of the reviewer light.” I have to disagree on this point. I’ve been in the seats of both the developer and the reviewer and can say first-hand it has nothing to do with keeping the workload of the reviewer light. The point of escaping is to protect output from potential vulnerabilities lurking in the database. Rule of thumb: Don’t trust any data, whether it comes from user input or your own data store.

      The point of escaping late isn’t to lighten the workload of the reviewer, it’s to make escaping more discoverable for ANY developers who come along later. That could be an internal reviewer, an external reviewer, a new dev on the team, or even yourself months later. Looking at templates that send output to the screen it should be immediately obvious whether or not output is escaped, otherwise all of the aforementioned groups will need to dig to see whether or not escaping is happening.

      Every time you have to check “is this output escaped this elsewhere” you’re wasting valuable dev time and adding to those months spent building the system.

      • Hi @Eric Mann,

        I see your points, but I respectfully disagree.

        I’ll use this approach when coding on a project where I’m forced to, but I can’t (currently) see myself advocating for this approach. Yes, I can see it’s benefit for the sloppy theme coding styles of most WordPress themers but if the site is using an OO views in your architecture then I think the object methods and properties should be able to be trusted.

        And just to show I’m not the only one who thinks like this I was just reading about AuraPHP thanks to a PHP OOP class I attended by Brandon Savage and they say “Aura View automatically escapes data assigned to the template when you access that data. So, in general, you do not need to manually apply escaping in your template scripts.”

        http://auraphp.com/manuals/v1/en/view/

        -Mike

  4. Do you have recommendations when output may contain html? Is wp_kses() appropriate?

    • Exactly. Ideally, we try to output any HTML directly so that it doesn’t need to be escaped. If that’s unavoidable, wp_kses() works great.

Comments are closed.