Coding Best Practices: Preventing XSS in JavaScript

Nick Daugherty, from the VIP Platform team, shares some best practices for VIP developers and anyone wanting to write secure WordPress code. For more, see our VIP Documentation

The primary vulnerability we need to be careful of in Javascript is Cross Site Scripting (XSS). We’re probably all familiar with the escaping functions we use with PHP in WordPress to avoid that — esc_html(), esc_attr(), esc_url(), etc. Given that, it only seems natural that we would also need to escape HTML in Javascript.

As it turns out out, however, that’s the wrong way to approach Javascript security. To avoid XSS, we want to avoid inserting HTML directly into the document and instead, programmatically create DOM nodes and append them to the DOM. This means avoiding .html(), .innerHTML, and other related functions, and instead using .append(), .prepend(), .before(), .after(), and so on.

Here is an example:

jQuery.ajax({
    url: 'http://any-site.com/endpoint.json'
}).done( function( data ) {
    var link = '<a href="' + data.url + '">' + data.title + '</a>';

    jQuery( '#my-div' ).html( link );
});

This approach is dangerous, because we’re trusting that the response from any-site.com includes only safe data – something we can not guarantee, even if the site is our own. Who is to say that data.title doesn’t contain <script>alert( "haxxored");</script>;?

Instead, the correct approach is to create a new DOM node programmatically, then attach it to the DOM:

jQuery.ajax({
    url: 'http://any-site.com/endpoint.json'
}).done( function( data ) {
    var a = jQuery( '<a />' );
    a.attr( 'href', data.url );
    a.text( data.title );

    jQuery( '#my-div' ).append( a );
});

Note: It’s technically faster to insert HTML, because the browser is optimized to parse HTML. The best solution is to minimize insertions of DOM nodes by building larger objects in memory then insert it into the DOM all at once, when possible.

By passing the data through either jQuery or the browser’s DOM API’s, we ensure the values are properly sanitized and remove the need to inject insecure HTML snippets into the page.

To ensure the security of your application, use the DOM APIs provided by the browser (or jQuery) for all DOM manipulation.

Escaping Dynamic JavaScript Values

When it comes to sending dynamic data from PHP to JavaScript, care must be taken to ensure values are properly escaped. The core function esc_js() helps escape JavaScript for us in DOM attributes, while all other values should be encoded with json_encode().

From the WP Codex on esc_js():

It is intended to be used for inline JS (in a tag attribute, for example onclick=”…”).

If you’re not working with inline JS in HTML event handler attributes, a more suitable function to use is json_encode, which is built-in to PHP.

This approach is incorrect:

var title = '<?php echo esc_js( $title ); ?>';
var content = '<?php echo esc_js( $content ); ?>';
var comment_count = '<?php echo esc_js( $comment_count ); ?>';

Instead, it’s better to use json_encode() (note that json_encode() adds the quotes automatically):

var title = <?php echo wp_json_encode( $title ); ?>;
var content = <?php echo wp_json_encode( $content ); ?>;
var comment_count = <?php echo wp_json_encode( $comment_count ); ?>;

Stripping Tags

It may be tempting to use .html() followed by .text() to strip tags – but this approach is still vulnerable to attack:

// Incorrect
var text = jQuery('<div />').html( some_html_string ).text();
jQuery( '.some-div' ).html( text );

Setting the HTML of an element will always trigger things like src attributes to be executed, which can lead to attacks like this:

// XSS attack waiting to happen
var some_html_string = '<img src="a" onerror="alert(\'haxxored\');" />';

As soon as that string is set as a DOM element’s HTML (even if it’s not currently attached to the DOM!), src will be loaded, will error out, and the code in the onerror handler will be executed…all before .text() is ever called.

The need to strip tags is indicative of bad practices – remember, always use the appropriate API for DOM manipulation.

// Correct
jQuery( '.some-div' ).text( some_html_string );

Other Common XSS Vectors

  • Using eval(). Never do this.
  • Un-whitelisted / un-sanitized data from urls, url fragments, query strings, cookies
  • Including un-trusted / un-reviewed 3rd party JS libraries
  • Using out-dated / un-patched 3rd party JS libraries

Helpful Resources

The Importance of Escaping All The Things

Nick Daugherty is WordPress.com VIP Lead Engineer. Here he shares some important information about escaping in code and how that can increase security in WordPress sites anywhere in the world. 

If there’s one issue we flag more often than all others in code reviews…it’s escaping.

For starters, we should all agree that escaping (fundamentally, sanitizing input and escaping output) is a critical aspect of web application security. What may be less universally agreed upon is where to escape. On that point, we require “late escaping“- escaping as close as possible to the point of output – and further, we now require it everywherealways.

You may now be thinking:

“Do I really need to “late escape” everything? Always? Even core WordPress functions?”

We hear you. And, here’s why this is important to us:

In addition to some automated scanning, we manually review every line of code our VIP customers commit to the VIP platform. And, while the original author of a particular piece of code may know exactly where they’ve already escaped their output and/or it’s convenient to trust a WordPress core function’s escaping, it’s much, much faster and more reliable for our reviewers to check for “late escaping”. This way a reviewer can be 100% positive that output has been escaped properly by simply looking at the point of output.

We acknowledge this standard requires a bit more effort from developers writing code for the VIP platform. But, we see the benefit as three fold:

1. “late escaping” makes VIP reviewers more efficient, which means customer code is reviewed and deployed faster,

2. a consistent practice of “late escaping” makes missed escaping obvious, thereby reducing the chances that unescaped output makes it into production,

3. a consistently applied escaping standard- and we’ve chosen “late escaping” as ours- allows automated tools to better augment our human reviewers…further improving on #1 and #2 above.

To illustrate the importance of escaping everything, let’s look at a pattern where escaping is commonly omitted: Widget form elements.

A Widget form may look like this:

<label for="<?php echo $this->get_field_id( 'title' ); ?>"><?php _e( 'Title:' ); ?></label>
<input type="text" id="<?php echo $this->get_field_id( 'title' ); ?>" title="<?php echo $this->get_field_id( 'title' ); ?>" name="<?php echo $this->get_field_name( 'title' ); ?>" value="<?php echo esc_attr( $title ); ?>"/>

Those get_field_id( 'title' ); ?> and get_field_name( 'title' ); ?> calls should be safe right, since they are core WordPress functions?

Let’s see what happens when we drop this bit of code anywhere in our codebase:

add_action( 'widget_form_callback', function( $instance, $widget ){
    $widget->id_base = '"><script>alert("Greetings! You have been hacked.");</script>"<';

    return $instance;
}, -999, 2);

Oh no! Javascript has been injected where it shouldn’t be.

Here is a more real world case illustrating how easy it is to get to a point where we’re outputting values of indeterminate origin:

add_action( 'widget_form_callback', function( $instance, $widget ){
    My_Widget_Controller::setup_widget_form( $instance, $widget );

    return $instance;
}, 10, 2);

// ...

class My_Widget_Controller {
    static function setup_widget_form( $instance, $widget ) {
        $widget->id_base    .= self::get_widget_id_base( $instance, $widget );
        $widget->name       .= self::get_widget_name( $instance, $widget );
    }

    static function get_widget_id_base( $instance, $widget ) {
        global $my_config_object;

        return get_option( 'my_widget_id_base_prefix' ) . '_' . $my_config_object['current_site']['widgets']['id_base'];
    }

    static function get_widget_name( $instance, $widget ) {
        $name = '';

        // ... arbitrary processing to arrive at a $name

        return $name;
    }
}

Now we’re down a rabbit hole, and it’s not so clear that get_field_id( 'title' ) will give us safe values.

Even values that are ‘100% safe and there is no way this could ever be abused’ need to be escaped, because future refactorings can introduce hard-to-detect vulnerabilities if there is unescaped code hanging around:

$class = ( 'featured' == $category ) ? 'home-featured' : 'standard';

?>

<div class="<?php echo $class; ?>">...

Seems harmless enough – $class can ever only have two values. Great, we’re safe!

Until 6 months from now, when a new business need refactors this to:

function my_get_post_class( $post ) {
   // ... arbitrary processing to determine a post class. Maybe we pull it from meta now?
   return get_post_meta( $post->ID, 'custom_post_class' );
}

// ...

$class = my_get_post_class( $post );

?>

<div class="<?php echo $class; ?>">...

Hmmm, now we’re outputting meta values directly, and there is no way to know that without following a potentially complex program flow – a recipe for an exploitable site.

What about constants? Those are the foolproof, never changing pillars of security, right? Consider the following:

// let's say this is for setting a class name, depending on the site we're on
$my_setting = get_option( 'safe_data' );

// ... elsewhere

define( 'MY_SAFE_CONSTANT', $my_setting );

// ...

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

later down the line, our option gets updated (somehow):

update_option( 'safe_data', '"><script>alert("hax0rd");</script>' );

Another example of how constants can be exploited is conditional constants:

if ( ! defined( 'MY_SAFE_CONSTANT' ) ) {
    define( 'MY_SAFE_CONSTANT', 'safe-value' );
}

// ... elsewhere

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

As a hacker, all I need to do to inject anything I like into the page is to add this somewhere before the previous code:

define( 'MY_SAFE_CONSTANT', 'unsafe value' );

What About Core Functions?

This concept applies to nearly all code in a theme, including many core functions that return a value. Some core functions that output, such as bloginfo(), have output escaping applied automatically – we recommend using the equivalent ‘return’ function and manually escaping

Example: bloginfo( 'name' ); could be rewritten as esc_html( get_bloginfo( 'name' ) );. This approach ensures everything is properly escaped and removes ambiguity.

A post on the merits of escaping would be incomplete without addressing the fact that most esc_*() functions in WordPress apply a filter before returning. While true, the simple answer is: Filters on the escaping functions simply are not allowed on WP.com, and would be quickly caught during code review. Your site is always much safer when escaping all output.

The Bottom Line

If it’s not escaped on output, it’s potentially exploitable. Never underestimate the abilities of an attacker – they’re experts at finding the way to make the ‘this should never, ever, be possible‘ things happen :). For maximum security, we must escape all the things.