Coding Best Practices: Preventing XSS in JavaScript

Nick Daugherty, from the VIP Platform team, shares some best practices for VIP developers and anyone wanting to write secure WordPress code. For more, see our VIP Documentation

The primary vulnerability we need to be careful of in Javascript is Cross Site Scripting (XSS). We’re probably all familiar with the escaping functions we use with PHP in WordPress to avoid that — esc_html(), esc_attr(), esc_url(), etc. Given that, it only seems natural that we would also need to escape HTML in Javascript.

As it turns out out, however, that’s the wrong way to approach Javascript security. To avoid XSS, we want to avoid inserting HTML directly into the document and instead, programmatically create DOM nodes and append them to the DOM. This means avoiding .html(), .innerHTML, and other related functions, and instead using .append(), .prepend(), .before(), .after(), and so on.

Here is an example:

    url: ''
}).done( function( data ) {
    var link = '<a href="' + data.url + '">' + data.title + '</a>';

    jQuery( '#my-div' ).html( link );

This approach is dangerous, because we’re trusting that the response from includes only safe data – something we can not guarantee, even if the site is our own. Who is to say that data.title doesn’t contain <script>alert( "haxxored");</script>;?

Instead, the correct approach is to create a new DOM node programmatically, then attach it to the DOM:

    url: ''
}).done( function( data ) {
    var a = jQuery( '<a />' );
    a.attr( 'href', data.url );
    a.text( data.title );

    jQuery( '#my-div' ).append( a );

Note: It’s technically faster to insert HTML, because the browser is optimized to parse HTML. The best solution is to minimize insertions of DOM nodes by building larger objects in memory then insert it into the DOM all at once, when possible.

By passing the data through either jQuery or the browser’s DOM API’s, we ensure the values are properly sanitized and remove the need to inject insecure HTML snippets into the page.

To ensure the security of your application, use the DOM APIs provided by the browser (or jQuery) for all DOM manipulation.

Escaping Dynamic JavaScript Values

When it comes to sending dynamic data from PHP to JavaScript, care must be taken to ensure values are properly escaped. The core function esc_js() helps escape JavaScript for us in DOM attributes, while all other values should be encoded with json_encode().

From the WP Codex on esc_js():

It is intended to be used for inline JS (in a tag attribute, for example onclick=”…”).

If you’re not working with inline JS in HTML event handler attributes, a more suitable function to use is json_encode, which is built-in to PHP.

This approach is incorrect:

var title = '<?php echo esc_js( $title ); ?>';
var content = '<?php echo esc_js( $content ); ?>';
var comment_count = '<?php echo esc_js( $comment_count ); ?>';

Instead, it’s better to use json_encode() (note that json_encode() adds the quotes automatically):

var title = <?php echo wp_json_encode( $title ); ?>;
var content = <?php echo wp_json_encode( $content ); ?>;
var comment_count = <?php echo wp_json_encode( $comment_count ); ?>;

Stripping Tags

It may be tempting to use .html() followed by .text() to strip tags – but this approach is still vulnerable to attack:

// Incorrect
var text = jQuery('<div />').html( some_html_string ).text();
jQuery( '.some-div' ).html( text );

Setting the HTML of an element will always trigger things like src attributes to be executed, which can lead to attacks like this:

// XSS attack waiting to happen
var some_html_string = '<img src="a" onerror="alert(\'haxxored\');" />';

As soon as that string is set as a DOM element’s HTML (even if it’s not currently attached to the DOM!), src will be loaded, will error out, and the code in the onerror handler will be executed…all before .text() is ever called.

The need to strip tags is indicative of bad practices – remember, always use the appropriate API for DOM manipulation.

// Correct
jQuery( '.some-div' ).text( some_html_string );

Other Common XSS Vectors

  • Using eval(). Never do this.
  • Un-whitelisted / un-sanitized data from urls, url fragments, query strings, cookies
  • Including un-trusted / un-reviewed 3rd party JS libraries
  • Using out-dated / un-patched 3rd party JS libraries

Helpful Resources

Introducing the WordPress Security White Paper

We’re very proud to share the WordPress Security White Paper with the WordPress community!

The white paper is an analysis and explanation of the WordPress core software development and its related security processes, as well as an examination of the inherent security built directly into the software. Decision makers evaluating WordPress as a content management system or web application framework should use the white paper in their analysis and decision-making, and for developers to refer to it to familiarize themselves with the security components and best practices of the software.

The WordPress Security White Paper is available directly on the site. In addition, the HTML and PDF versions are available at Automattic’s Documattic Updated! Now on the WordPress GitHub repository for any updates and/or additions.

We’d really love to encourage and help share translations of the white paper to the global WordPress community. If you have a translation to contribute, please add it to the WordPress GitHub repo so others can benefit, too. Pull requests welcome!

The text in the white paper (not including the WordPress logo or trademark) is licensed under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Thank you to all who contributed to the initial release and compilation of this document: Barry Abrahamson, Michael Adams, Jon Cave, Helen Hou-Sandí, Dion Hulse, Mo Jangda, and Paul Maiorana.

Below is the table of contents for the white paper, which you can find here.

Executive Summary
An Overview of WordPress
The WordPress Core Leadership Team
The WordPress Release Cycle
Version Numbering and Security Releases
Version Backwards Compatibility
WordPress and Security
The WordPress Security Team
WordPress Security Risks, Process, and History
Automatic Background Updates for Security Releases
2013 OWASP Top 10
A1 – Injection
A2 – Broken Authentication and Session Management
A3 – Cross Site Scripting (XSS)
A4 – Insecure Direct Object Reference
A5 – Security Misconfiguration
A6 – Sensitive Data Exposure
A7 – Missing Function Level Access Control
A8 – Cross Site Request Forgery (CSRF)
A9 – Using Components with Known Vulnerabilities
A10 – Unvalidated Redirects and Forwards
Further Security Risks and Concerns
XXE (XML eXternal Entity) processing attacks
SSRF (Server Side Request Forgery) Attacks
WordPress Plugin and Theme Security
The Default Theme
The Theme Review Team
The Role of the Hosting Provider in WordPress Security
A Note about and WordPress security
Core WordPress APIs
White paper content License
Additional Reading

A special note: As you can see in the table of contents, the white paper is specific to the open source core WordPress software. The core WordPress software is the foundation of and there are additional Security FAQ related to VIP here.

Inside GigaOm’s Search and Alerts

Casey Bisson from GigaOm presented “Search and Alerts” at the recent Big Media & Enterprise Meetup in San Francisco.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

Using APIs to display sports data through WordPress with USA Today

Matt Harvey, Director of Product, USA Today Sports Media Group, presented how USA Today uses APIs and WordPress to display sports data in “Keeping the Score: Using APIs to display sports data through WordPress,” at the recent Big Media and Enterprise Meetup in San Francisco.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

One Theme, One Multisite, 30+ Unique Websites – Now With Full Transcript

Simon Dickson and Simon WheatleyCode for the People, presented “One Theme, One Multisite, 30+ Unique Websites” at the recent Big Media & Enterprise Meetup in New York City. We’ve shared this post previously, but we’re publishing it again now with full transcript below.


Okay, so I’m Simon Wheatley, my partner Simon Dickson is just over there and we’re from a company called Code For The People.

We’re one of the VIP partners and I want to talk to you today about a client who came to us, similar I guess to the Oomph guys,the Interactive One guys, just been talking about one thing, but dealing with many websites initially 30.

This is for a magazine publisher in the UK, so they wanted to move 30 of their titles initially on to this platform but they wanted one standardized theme, one standardized set of functionalities that they could use.

So our solution for them is based in a couple things that I’m going to talk about tonight one is the WordPress theme customizer and one is the way we’re handling layouts using widgets and widget areas so these solutions are things you can apply in other organization. You can have your WordPress themed cake made by my partner’s wife and you can eat it at the same time.

So you get in this way, using this customization but based on the standardized theme, you get to reduce maintenance and at the same time keep the editorial teams happy.

So I’m going to talk through three of the areas where we allow editorial control so obviously there’s colours, I’m going to talk about typography and I’m going to talk about layout.

So the first element, colour, we started off with the idea that we would have the user pick half a dozen colours and we would then do colour calculations based around on let’s find some complimentary colours let’s find some lights and dark equivalents and then we’ll be able to work out how of those six colours, we can deal with the header and the footer and the body post but that actually gradually became unwieldy.

So you get in this way, using this customization but based on the standardized theme, you get to reduce maintenance and at the same time keep the editorial teams happy.

We found ourselves adding more and more colour options to avoid a clash of dark colours appearing on dark colours or red appearing on green, that kind of thing and the solution that we arrived at eventually was that we split it into two colour palettes, so there are two colour palettes which we call palette A and palette B and then we split the page into three areas. We’ve got the header area which can have one palette assigned the body of the page which can have another palette separately applied to it and then obviously the footer which can have a completely different palette.

So there are three palettes there, with about 12 different areas and we’re just using the standard WordPress theme customizer to allow you to pick the colour for that we’re still doing a little bit of colour calculation, lightening and darkening and so on but essentially it’s the two palettes applied to different areas of the page. We don’t take the standard approach that some themes take of just injecting a whole bunch of CSS into the head. Instead, we’re using LESS with CSS Preprocessor.

Probably now looking at the fact that core have adopted SASS, we could be using SASS but at the end of the day it’s all CSS Preprocessing. It all really does the same thing, it’s taking variables from the customizer and injecting them into CSS and using that to build the final styles for the website.

It’s simpler and cleaner than shoving a load of overrides in your head. So that’s colours, let’s talk about typography. Obviously there’s a number of font services out there and we’re going to want to give 30 editorial teams a good choice of fonts for their websites.

So we’re using the Google Fonts API, there’s a wide wide variety of fonts there and we’ve built a custom control for the customizer so can pick say the open sans fonts and because we’re dealing with the API. We know that there are these variants and weights associated with that and then we can be applying a text transform so that you’ve got fully uppercased for the navigation, but you’re just capitalizing for the headers or whatever.

That’s the one customizer control, which has got three sub-controls within it we looked around and found a couple of those on the internet in the .org repository but they all seemed to be making a bit of a meal of it and we ended up making something that turned out simpler but works quite nicely.

It’s simpler and cleaner than shoving a load of overrides in your head.

What you’ll see we haven’t got there is a font size for each individual element. We’re not setting a font size for the heading and then font-size for the subheading. Instead, we’re setting a base font size and then we’re using multipliers up from that. So maybe 16 pixels or something and then the heading is 1.5x that and the meta is 1x that or whatever.

So let’s talk about layout. We started out with layout with some very grandiose ideas that you might recognize from other themes and options that you’ve got out there. We were going to allow the user to draw areas on the screen and we we’re going to then use those as widget areas and drop stuff into those and then we we’re going to magically work out how we calculated the break points so that you could you know have tablet portrait, tablet landscape.

Eventually we took a step back from that and realized we could accomplish pretty much the same thing but in a much much simpler way.

Eventually we took a step back from that and realized we could accomplish pretty much the same thing but in a much much simpler way. So if we look at the primary content area on the left there, we’ve got a grid of widget areas so we’ve got a widget area at the top spanning then we’ve got the two-column side by side and then we repeat the same again. But of course with the widget area in WordPress you don’t need to put widgets in it.

So if you wanted to have just a single column of news in the primary content area then you just put widgets in the double span that comes second in there. Or, if you want a two-column layout, then you can just use the top two. Every so often in the year, when you’ve got a promotional item, you can be putting that in your double span above those to columns, so it gives it a lot of flexibility.

Because it’s a known quantity, it means that we can scale down to the various breakpoints and we know exactly what we’re doing and we’ve got a really nice responsive website and that comes out really really well when you start actually putting content in it this website, the fields, they started building that yesterday at 11 o’clock in the morning and by 3 o’clock in the afternoon, they had a site, fully migrated, fully customized with all the old content in it from the old custom content management system and up and running, so it comes up through the breakpoints.

Nice shotgun advert there for the shooting season coming up.

And then the desktop, full desktop width…so let’s, just taking a look at this page, we’ve got one widget that’s controlling a lot of this stuff. So if you look at the news sequence of posts and the food and drink sequence of posts, they’re using the same widget, and that’s something that we call the post query widget which is essentially a wp query builder for those you who know what I mean by that.

It’s putting together a series of parameters by which you’re going to reach into the database, grab the post that you want and get to display them on the page so you can choose the post type that you want to display in the particular widget that you’re editing at the moment. You can filter it down by the taxonomies and then you can go to actually start displaying that.

We do that by breaking the sequence of posts up into sections, so section one here has just got one post in it, it’s a list with a large image. Section two, you’ve got two posts, smaller images, and we’ll show the author and we’ll show the date there. Then section three is just a text bulleted list without any additional detail in there.

What that comes up as is something like that so it gives you really quite a flexible display of how you’re going to pull the posts in and then how you’re going to actually show them on the screen and you could have all large images or all bullet points, pretty much anything you want

We don’t limit the number of sections there so another thing I wanted to mention was category archives so again, we’ve got a customizer control in there so select your category and then choose similar again to the way that we’re dealing with the query widget so similar, we look at the style that you want that in, maybe this category you’ve got some really nice images, maybe the review images you’ve got are great and you want to highlight that

So you’ve got the ability to customize the display on the category there, so I’ve whistle stopped through this we talked a little bit about colours, so we’re using the colour API a little bit of calculation, we’re using LESS in CSS Preprocessing there talked about typography, so we’ve used the Google Fonts API to allow you to choose a font we know from the Google Fonts API, what the variants are, so we can pick that and we can give you a transform, we’ve got the base font size we talked about layout, we talked about the post query widget and about the custom layouts for categories so has anybody got any questions?

Q: Are you guys supporting live previewing in addition to the standard customizer stuff?

A: Yeah, absolutely, so all of this stuff, I mean if you’re not familiar with the customizer, one of the great things about it is nothing is live until you click the save and publish so all of this customization is happening just for you personally so even with the LESS Preprocessing, that’s being piped off into a separate stylesheet which is only being served to the editor that’s actually doing the customization at the moment

Q: ( […] )

A: Yeah, we’re working with posts, obviously the built in post type which they’re using for articles, we’ve got a custom post type for events and for reviews as well so the post query widget that I showed you, you can say I want to see just reviews here or just events here and it will allow you to display those

Q: ([…])

A: Some of the titles that we’re dealing with are relatively low staffed So I don’t think that kind of title would be necessarily looking at clicks we have got an evolution of the post query widget which looks at Google Analytics and uses the Google Analytics API to evaluate what’s popular in a particular category so you can use that as the sort mechanism, but that’s not something that’s live on the site at the moment

Q: ([…])

A: Yeah, so the widget areas that are there for the, where are we, let’s skip back through yeah the widget areas that are here are exactly the same widget areas, they’re just, they cascade through with the different break points and we move them around so this is the full desktop width but if you can quickly scan you can see that the same widget areas are just linearizing basically as you move down through the sizes so it’s exactly the same stuff ([…]) absolutely yeah responsive break points any more for any more

Q: ([…])

A: At the moment, pre-3.9 the disadvantage is anything you do to a widget is live on the site immediately, post 3.9 widgets move into the customizer so we’re able then to choose the widget layout and mess around in the same way exactly the same was as I said for the rest of the customizer, you can change your colours, change your fonts it’s not live until you click save and publish so 3.9 is going to herald a grand new dawn in terms of that being able to get right before it’s live

Q: ([…])

A: The brief for the widgets was that it wasn’t so much of a manual curation process, so if we needed to manually curate this particular post into position in this particular area of the homepage

I guess you could get around that by hacking with tags, but it wasn’t a core part of the brief that we were able to do that, so using something like zoninator where you can precisely choose which post to go and in which order they appear in wasn’t a requirement we could develop a different widget that did something like that I think we would probably still stick with widgets we’re also looking at doing some work to customize

so you can take the homepage layout and then for a particular purpose maybe for a sponsorship section have all of the sidebars completely custom for that but hidden from normal view so It’s only when you’re editing that page that you go in and those side bars are only live when you’re editing that page, that set of sidebars so you don’t end up with this situation wherein the WordPress admin area, the widget section you’re looking at all the sidebars and there’s like 300 sidebars which one am I adding the widgets in and which one am I not we’re able to actually filter which sidebars are being shown for a particular purpose

Q: ([…])

A: Yeah exactly that principle yeah.

Q: ([…])

A: Yeah, so like I say, some of these are fairly low staffed publications so the key for them is probably that they’ll set something up and then they won’t touch it for a little while we’re using a plugin which is available on the .org repository called the customizer settings revisions which allows you to save what you’ve created so you might go like “okay, this is the Christmas layout” with all the pretty snowflakes and the lovely snowy red design and then you can pop back to that when Christmas comes around again or when Easter comes around or whatever you want to do thematically so we’re using that plugin for that purpose

Q: ([…])

A: So the ads are outside of the widget areas, they’re placed at various points in the page that we know how to deal with for again, for the responsive break points are we concerned about the responsive kind of nature of it and so on, yeah so we have, we haven’t got the ability to do the thing that really you only do to show your boss that the site’s responsive which is you know, move the site edges in and out and change the width of the page, the adverts won’t change at that point because they only change on page load, it will look at the width and then ascertain what ads you need and then load them at that point does that answer the question?

Cool. Thank you.


See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

Building Quartz – Now With Full Transcript

Josh Kadis is the web applications technologist at Quartz. At our August Big Media Meetup, he gave a short “flash talk” on building on WordPress, which we’ve shared here and republished now with the full transcript below. Quartz just celebrated its one-year anniversary, and you can learn more about it by reading our case study here.

See Josh’s slides here.


My name is Josh Kadis, I work for Quartz, which is a business and economics site from Atlantic Media, which is the publisher of The Atlantic. We launched in September last year, and our most recent numbers for July were about 5 million uniques (views). My role at Quartz is I do the bulk of the WordPress work and then I’ve also been heavily involved in building the Backbone application that runs the front-end of the site.

If you haven’t seen Quartz, it looks like this. It’s a responsive web app, WordPress backend, publishes on JSON API, that gets picked up by the backbone front-end and the bar across the top under the word Quartz – that expands and that’s how you navigate between different sections and within a section you can navigate by scrolling up and down in the column on the left which we call the queue, or in the “item well”, which is the main content area on the right.

So the basic architecture is that Automattic and the VIP team host our WordPress installation, the Backbone files and our CSS and some other Javascript kind of stuff are hosted on the same CDN that’s used by the rest of Atlantic Media, which is like, and other non-WordPress sites.

WordPress publishes the JSON API, and we get all the backend post authoring and media and we have some custom backend stuff, like a post type that allows us to publish the newsletter through the MailChimp API from within WordPress.

We have a self-hosted system for reader accounts, which is what you would use for the commenting feature that we recently rolled out which is more for annotating individual paragraphs within a post, and then managing your subscription to the newsletter The Daily Brief and that kind of stuff, so that’s also WordPress.

We’ve actually found that the comments that come back are less awful because people get to the specifics of what they want to say about this specific paragraph instead of a general “you suck”. That’s kind of nice.

We have this division of labour between WordPress and backbone where WordPress handles what you would expect: generating the basic HTML markup which kind of gives the page the basic structure and is useful for search engines. WordPress publishes the JSON API, and we get all the backend post authoring and media and we have some custom backend stuff, like a post type that allows us to publish the newsletter through the MailChimp API from within WordPress.

The order of the posts you see on the homepage is not initially chronological, it’s manually ordered by the editorial staff, that’s the ‘Top’ post. So there’s a plugin that allows them to do that with a drag and drop interface from among the recent posts. Backbone also handles what you’d expect on the client side: fetching data from the API, deciding where and when to render it on the page, reflowing based on the device and on the screen size. We’re doing some offline reading with local storage and all of the annotation’s functionality is contained within the Backbone app and the entire thing can be turned on and off without even touching WordPress.

The URL and getting Backbone and WordPress to interpret the URL in the same way is really where the two things come together. The whole site relies on that or else the front-end and back-end are out of sync.

So we have these two things and where do they really intersect? It’s here: The URL and getting Backbone and WordPress to interpret the URL in the same way is really where the two things come together. The whole site relies on that or else the front-end and back-end are out of sync. So just to kind of quickly walk through it, if you haven’t written a Backbone app before, the router is the foundation of it, it essentially determines how the URL is parsed and then triggers a series of events that come one after another and ultimately result in stuff showing up on the page.

Good URL design is really a key to what we’re doing.

To work, the router reads the permalinks, and Backbone has some kind of build in syntax for how you read a permalink and decide what’s a variable within that and what’s a key. The permalinks come back to WordPress to run off the rewrite rules, and the rewrite rules run Quartz.

Good URL design is really a key to what we’re doing. So something like this: is the URL of the most viewed post of the history of Atlantic Media, it’s about bees. This is something that doesn’t have to run through a URL shortener, doesn’t get redirected, both Backbone and WordPress will understand this URL and parse out that single ID in the exact same way. Here is a little bit of code from the router, grossly over simplified.

Basically the router initializes, you give it this regex, it looks for these core sections: ‘Top’ – which is, I explained, is the manually ordered, “here’s what’s important right now” segment of posts. ‘Most recent’ is the latest one, ‘Popular’ we kind of calculated near real time using Chartbeat, ‘About’ is some static pages.

When the router recognizes one of these keywords from the regex and the URL, it triggers the core function which passes the particular one to this event which then gets triggered and a whole bunch of other stuff happens that results in a bunch of posts as you scroll through.

When you look at the WordPress theme, if you see rewrite rules, you would kind of recognize the regular expression: Top, Latest, Popular, About, and for both the front-end query, which is this first set of rewrite rules and then the API they both resolve to pass these two parameters, these two query variables to WordPress. API = true or false and then request = one of these things in this array.

For handling those two variables, we add_rewrite_tag request, we hook into query_vars and add API and then WordPress knows to look for those two things so that when the parse_request action comes around, we are able to, and in my oversimplification, I left out an if statement here, then we can fire up this qz API class and kind of pre-empt the main WordPress loops and that’s how we get JSON back without really needing to run through anything else that WordPress would do.

So this kind of enables us to go from a regular URL with a parameter like JSON tacked onto the end which is how in a lot of situations if you were building a JSON API on top of WordPress, you might do something like this and get back basically the same data structure that’s in the WordPress post object.

For us, we haven’t done that for a couple of reasons. We’ve gone with a custom API for clarity’s sake, being able to put all our endpoints inside API and then on the server side, we want to do all our processing of the meta data before we return it through JSON or else all that work needs to be done and that slows things down.

So for example, we’re able to return the URL to multiple sizes of the same image which we’ll ultimately be able to serve differently using this new SRC set attribute for different screen resolutions, stuff like that that is not necessarily apparent if you’re just reading the meta data straight out of the database.

So the Backbone side touches WordPress in a couple of other places. One is we need a way to keep track of version numbers, because they really are so separate. When we load the current Backbone version, it’s a different actual number than the WordPress version, so WordPress needs to know what’s what and keep one step ahead of the VIP team, really, because we put in a deploy to them, and we’re not sure when it’s going to come back so we want to know that as soon as it does, we’re ready and we’ll load the correct version of the application.

We’re also sort of separate from WordPress but still in Automattic, we’re using an Akismet API for kind of like profanity and spam filtering when annotations come in.

So in a second I’ll show you a quick shot of a plugin that enables us to do that. We’re also sort of separate from WordPress but still in Automattic, we’re using an Akismet API for kind of like profanity and spam filtering when annotations come in. Previews get pretty complicated in fact, with the Backbone app because it doesn’t know if the person is logged in to WordPress or not, it doesn’t know what permissions they have, so we need to sort of render some special Javascript in the markup that comes back initially from WordPress in order for Backbone to pick up that preview.

And then finally, there’s something that we’re working on that David in the third row in the red shirt is going to be working on soon, which is kind of figuring out how to keep the WordPress post templates and the underscore templates that Backbone uses, keep those in sync. It’s kind of hard right now, and ultimately doing a better job of that will allow us to load more of the application initially from WordPress, instead of having to process it within the browser in the Backbone app and then put it on the page.

So, this is a quick look at the plugin that manages the version number, essentially it allows you to stay on this auto-pilot mode that kicks whichever version is higher between the constant that’s set in the theme that can get committed to VIP or a setting that’s saved in the options table that you can set here so if for example you have a new version in the Javascript application that’s not making any adjustments to WordPress, we can just update the setting here in the plugin as soon as we put it on the CDN and we’re ready to push it.

But if we have stuff that’s sort of related to some changes in the WordPress theme and some stuff in the Backbone app, we put the Backbone app up first change the constant in the WordPress theme to sort of point to that new version of the Backbone application and as long as we’re sort of incrementing the number, the plug in will kick the higher number as soon as the new code with the constant is live on VIP.

This is just a quick look at annotations. You should all check it out, it’s really really nice. The responsive aspects of it are really cool and it’s just an interesting way of diving into the content. We’ve actually found that the comments that come back are less awful because people get to the specifics of what they want to say about this specific paragraph instead of a general “you suck”. That’s kind of nice.

Andrew Nacin on How WordPress Evolves Without Breaking Everything – Now With Full Transcript

Andrew Nacin is one of the lead developers of WordPress. At the August 2013 Big Media & Enterprise Meetup, he gave a talk on how WordPress evolves while maintaining backwards compatibility — which we shared previously and we’re publishing it again now with the full transcript below. 

My name is Andrew Nacin, I’m a lead developer at WordPress. I live in Washington DC. I’m talking about, really quickly, how WordPress evolves without breaking absolutely everything. I’m going to give two case studies.

First, some general considerations and what I’m talking about for this is three in particular:

  • One, we don’t really rush to fix what isn’t broken. There are a lot of other platforms out there that might rewrite a lot of their API’s pretty much every version, every three years. Just to name a few, like Drupal. We’re not trying to do that; we are trying to evolve at maybe a slower pace. Our slower pace might still be over 3 years; it’s just over six or seven releases.
  • We’re also trying to really make it worth it. If we are going to rewrite something, we’re going to go all out. And that’s actually one of the case studies here.
  • And then the third one is backwards compatibility. As you probably know, we’re presently backwards compatible, or 99% backwards compatible from version to version. Great for users, great for the ecosystem, it’s actually not that great for us, but that’s okay, we accept that.No “breaking changes” means that users don’t really need to worry about this when they update it. At the same time, we have to absorb extra technical debt. So if you look at the new WordPress, you’ll find some things that you’re like “wow, that’s still there”? Yes because the plugin that worked five years ago that was perfect then, should still work now.  We don’t try and just deliberately break your code.

We’re also trying to really make it worth it. If we are going to rewrite something, we’re going to go all out.

First case study: WordPress 3.4 – Themes API

So the first case that I’m going to talk about is WP theme, which came in WordPress 3.4 so this is actually a bit of a comparison from 3.3 to 3.4. There were some really big problems with the existing software wp_get_themes(). It was actually terrible. It’s a function that was not an API, it was that bad. It had a huge memory footprint; we’re talking 10s of megabytes, every time you called it. Very slow, weak error handling, pretty much nothing was good about it. It couldn’t fit into a single cache bucket, that’s how big the data was. So if you tried saving it in the cache, and tried doing this, they had to chunk it into individual keys, and put it together when it was done. It really didn’t make any sense. You’re probably doing it wrong if you ever have to do this with your data. Getting page templates for one theme required loading everything., which had at the time, 170-something default themes and on top of that something like 600 VIP themes, which by the way aren’t used on almost all sites, they were loading 700 pieces of data looking up every single page template for whatever Duster’s page templates are. This really didn’t make any sense at all, was really slow, and that’s why they had to cache it into multiple cache buckets. It was also really painful, because it’s one giant multi-dimensional associative array. If you try adding a feature to this, all you’re doing is making it worse.

We also needed this for an absolute ton of things, some of these on here I didn’t even know existed. You can dig into this a little bit, like multiple theme roots, cross-root parent themes, things like that. You can actually nest themes inside directories, which is what does for some stuff. Really weird – we had to support all these things. And you can’t break anything. You have to be 100% backwards compatible.

So how do we do it? First is that we scrap the entire array, this giant mess of crap and try to do one object with WP_Theme. So you have a method like get: $theme ->get(‘Name’) or get (‘Description’), or get (‘Author’). And also getting a header for display, so we have a display method, which automatically translates it, which is a new feature in case you’re doing that kind of stuff and also dealing with HTML markup – that could each go into some of these pieces. And then a number of helpers that can mimic a lot of the regular functions that you’re already seeing so you’re very used to all the different pieces here. Dealing with page templates: “hey look now we can only fetch one theme’s page templates”.

So we were looking through this and the pages’ page, the list table, was 5-6 times slower to load than a post table just because we were loading the list of page templates for one theme, for quick edit, that you don’t even open unless you really need to. Really stupid, but that’s kind of sad right? And things like template files, which we were only really loading for the theme editor, which on is disabled, but they still had to on load this for every single theme. That’s 40% of all of its memory.

It’s not easy to build code that just works from version to version, and many of you might not even need to deal with this. At WordPress, we do. It makes your lives easier, so you can buy me a beer later.

Let’s use as an example, even though I don’t work there, because they’re obviously working on a pretty incredible scale, especially the number of themes they have.

So the next step that we did, step two, is a lot of magic. We have this problem where we have $theme passed into functions, passed into hooks. How are all of the old, all the existing plugins working with this going to be able to accept this data? So in PHP there’s a class called  ArrayObject, that implements a few interfaces, one of them is called ArrayAccess. What it enabled us to do is things like this, where $theme[‘Name’] we’re able to treat that like a function call and then we can wrap it in this case, with the get map: #theme->get(‘Name’). Sometimes, this array, for whatever reason, one of the functions, WordPress decided to convert to an object, so we had to handle that as well. Well, there are some magic methods in php including __get() and __isset().

So now, we’re able to take this giant, stupid array and convert it to actually, a really smart object. We’re doing this in WordPress as well with some other things, we’re also doing dumb objects like a standard class and converting those more specifically to proper objects like wp_post if you’ve been looking around. A lot of this is just for sanity reasons, not even for future reasons. So, function __get($property), we’re able to map exactly where we need to go. Caching is non-persistent by default, but it does exist, which is pretty cool. So, the problem is that if you had caching on persistent and you would maybe doing a deploy, if you’re not actually clearing that cache, well there’s a problem. You need to be able to do that. APC is buggy enough as it is when it comes to upcode caching, you don’t need to mess with it here as well. So it does support persistent caching if you know what you’re doing. So for example has this enabled. They wanted to be able to store in cache bucket, so they do. So if for some reason, you ever wanted to enable it, there is a filter:



‘_return_true’ );

Overall, the class itself is somewhere around 2,000 lines long. The patch that landed, that had the bulk of this was somewhere around 14,000 lines long and we wrote it in about six days and it worked.

And you can turn it on and it will work. So if you’re dealing with a lot of themes, maybe not just one on a giant multi-site installed, this might be something for you. So you have this new API that deals with array( ‘allowed’ => true ) and array( ‘errors’ => true ) and all these these different pieces. array( ‘allowed’ => true ) being for multi-site which is again, something else that we were able to speed up quite a bit.

And then we also had to make sure it worked. So, on top of a lot of functional testing, this is a few years ago this stat (29 tests, 684 assertions), there are even more tests now. Existing tests had to demonstrate of course backwards compatibility, so those existing tests did not break when we did all of this. New tests ensured the WP_Theme returned what we expected and then we practiced TDD (test-driven development) specifically when we were dealing with any bugs that came in.

Overall, the class itself is somewhere around 2,000 lines long. The patch that landed, that had the bulk of this was somewhere around 14,000 lines long and we wrote it in about six days and it worked. Also doing profiling, you’re going to find bottlenecks in some cases where you had no idea you had them. So maybe we saw some pages that were slow and we didn’t really understand why, sometimes a post request is actually slow, you might not notice this because you might think “oh yeah, Chrome is just resolving the DNS, that always takes forever”.

Profiling is really important for these things. So, for example, this is a KCachegrind right here, we were able to take 28% in theme.php to 0.76% of the page load. Total time cost was reduced by a factor of almost 6 and then we’re also able to look at weird things like this.

For sanitize_titles_with_dashes, one particular thing, we were searching for a theme on the backend and for some reason it was taking 42% of the page load. We we’re like “what the heck is going on here?”. Turns out it was being run 3529 times and here’s the best thing: the function call shouldn’t have even been there. So we removed it and the entire page sped up like you wouldn’t believe. It actually went from 44% to basically nothing. So we were able to speed things up – we would never have found this because it’s just like “oh Chrome is being stupid, it’s not loading”. No, it was actually a really slow request.


Second case study: Taxonomy Meta and Post Relationships

You might have heard of these, you might be using them, you might be using post-to-post taxonomy meta plugins. Working on this, this is a roadmap that was posted to a few weeks ago, during WordCamp San Francisco actually, explaining all the different things that we’re working on to make this happen. Now, the problem if you’ve worked in depth with terms is that shared terms was just a bad idea, we shouldn’t have done it. It came in actually, and this is a really funny history, originally in 2.2, was removed and went into 2.3 with a new schema, based in part on Drupal’s schema at the time. So the one time we did copy them, we realized we made a bad mistake.

Term_id as you might be familiar with – let’s say the term_id is ‘apple’ and then that ‘apple’ term might be in multiple taxonomies. So you might have the ‘apple’ in the tag taxonomy, and ‘apple’, in the company taxonomy. The problem is when you, let’s say, rename the ‘apple’ to ‘applecomputer’, suddenly things begin to go wrong very quickly. Unfortunately shared terms have very limited practical benefit. It would be much better if they were separate. So we have these two tables: wp_term_taxonomy and wp_terms and these fields in them, and you can kind of see how these come together with term_tax_id being the primary key for one, and term_id being the primary key in the other table, things get related and we have a third table of relationships. The joins are a mess, slow things down, and are not really fun.

So we’re going to add a new table, like this and if you see, it’s the exact same fields as the term_tax_id table, except we’re going to add all the fields that were in the term_id table. So we’re going to drop one of these tables and move all of the fields into the other table. We’re going to reduce everything to one table. Now if you’ve ever written a direct query for this stuff, if you’ve ever dealt with this, “Oh crap, things are going to break” right? “I’m sorry, it was Nacin’s fault, blame him”, or whatever you want to do. We can actually fix this. In fact, we didn’t come up with just one way to fix this we came up with two.

The first one is that we’re just going to redefine what a table means in WordPress. So if you try and reference $wpdb->terms, it will simply think, “oh, you must mean the $wpdb->term_taxonomy table”. So we’re actually self-joining. So if you’re doing interjoined terms on “terms_t” on term taxonomy and you’re do all these different fields, it’s just going to join itself. And because these fields are a superset, it will work. You also can do something like this with a view. You can create a view in MySQL as of MySQL 5, which is the current version for WordPress. You’re able to do something as simple as this: we’re able to re-create our old table in place. So after we do all these crazy upgrades and everything else, we can make this kind of work. We tested this with WordPress, we dropped the table, we merged all of them, took a 20-line patch, without changing anything, all the direct queries and everything worked. So plugins that are trying to do anything special with terms, we can do this to the point where we’re really not going to break anything. Pretty cool.

We’re also doing this over the course quite a number of releases. So, we’re able to combine these term tables, let us have on ID, finally we have one real ID that represents what a term actually is that we can pass around. Term meta is finally within reach, maybe post-relationships isn’t far behind, because that might depend on term relationships and that becomes a whole other story. So we have this long-term road map, unfortunately this actually requires we integrate a half dozen different changes, each of which is dependant on the previous one, over at least 3, 4 or maybe 5 releases.

So, we’re not rushing this, we can’t rush this. We need to do it step by step, to make sure that we don’t break absolutely everything. Maybe we slow it down, speed it up depending on how things go. Ultimately backwards compatibility prevents a lot of challenges. It’s not easy to build code that just works from version to version, and many of you might not even need to deal with this. At WordPress, we do. Iit makes your lives easier, so you can buy me a beer later. And we continue to evolve at rapid speed, WordPress 3.7, if you don’t know about the plans, is being released in October, 3.8 which will be a little bit of a different release in December. And if you’d like to join us helping out, I would go to and check those things out. (For the latest WordPress version, go to



Q: For the major version changes, now that we’re speeding up the timeframe for releases, typically in the past it’s been every 6 months for a major release and it’s been documented that you go back and support the last major release version. How is that going to change now that we’re speeding up the major versions?

A: Don’t know yet. In this case we’ve always aimed for 4 months, and normally end up at 5 and it slips to 6 or 7 on occasion. Sometimes we’ve actually been really on target with those. So what we’re trying to do now is 3.7 is acting as a bit of a reset, I can talk a little about 3.7.

3.7 is a platform-focused release, we’re doing it in 2 months. It’s focused on a few different things, Scott was talking earlier about some of our developer tools stuff. We’re trying to improve a lot of our own processes, so whether it’s trying to make it easier to contribute, trying to make it so tickets don’t rot for a long period of time, or people aren’t getting feedback or whatever it is – that’s important. And then a lot of our developer tools as well.

So this is really cool: you might have seen this new develop repository on which replaces the old core.svn repository. This is pretty interesting because it pulls in all of our tests, all of our tools, our bill process now, everything is in one place and finally we’re trying to modernize here. We’ve been around for 10 years, we can start to do it at this point. And then we’re rebuilding a lot of our developer documentation. So if you go to right now, you’ll get a “Coming Soon” message, but we’re working right now on fully automated code reference that is very smart and deals with documenting every hook, every function, all from inline documentation to what else it can need from code. (This feature is now live, check it out)

The actual focus of the release in part is security, stability, updates and fixing a lot of bugs. We’ve already closed around 300 tickets in the last 2-3 weeks and I expect that number to continue to drop successfully with each week. This won’t affect most of you, because you will be doing manual deployments anyway, but in 3.7, security minor release updates will happen automatically. They shouldn’t be nearly as painful as they are and we want to try and ship this to you – like 5 people just went “oh God, what are you doing?” – don’t worry, relax, you can turn it off and in most cases, this won’t affect you. If you have things like automatic updates turned off on your dashboard, then this obviously will not occur. Which you should, and if you don’t, and you’re trying to do deployment anyway, how one of your editors isn’t screwing it up by pushing a button, I’m really interested.

Any further questions on 3.7? We don’t know how yet we’ll work on that, but that said, because we’re going to start doing automatic updates for security releases, we’ll probably support security backup a few more versions as well. If only because we can be much more confidant shipping those and because our security vulnerabilities we’re dealing with now are really esoteric.

We’re talking about like safe HTTP requests and XML injection and things along those lines, we’re not really dealing with the run of the mill like XSS that we might have been dealing with 5-6 years ago. So, supporting further back, yes, that said, I don’t think we’ll always be doing too much with these cycles, I would like to settle between 3 and 4, but we really don’t know yet. 2,3,4, not sure. 3 releases a year would be great, 4 maybe, I don’t know.

Metro UK’s Powerful Content Algorithm – Now With Full Transcript

At our first London WordPress Big Media & Enterprise WordPress Meetup,  Dave Jensen (@elgrom) from Metro UK (hosted right here on VIP) explained how Metro continually experiments with their content algorithm to promote and feature the most interesting content for their readers and increase engagement on their site and mobile apps.

Dave recently shared some insight on how Metro UK has grown 350% through some growth hack experiments, so he provides another great inside look on what Metro has been doing internally to tweak their site content.

Below is his slide deck and the video from his presentation which we’ve shared previously, and we’re publishing it again now with full transcript below. 

Welcome, I’m Dave Jensen, I’m the head of development for Metro UK. We’ve been a WordPress VIP client since about December 2012. I think we’re one of the larger enterprise, largest publishers in the UK that use WordPress VIP.

So we’ve been playing around with all sorts of different crazy ways of interacting with our content for the last while, when we first released, we had this whole swipe based interaction that we’d been using which was fun to build, a little bit too complicated. Over the last while, we’ve been playing around with how we could automate some of the placement of stuff on the page so, our algorithms.

We’re a pretty lean operation. Maybe some of you won’t see this as lean but for big publishers there’s 6 developers, 20 people writing content all the time. We have a 24/7 mindset though, so we have champagne aspirations on a beer budget. We’re always doing constant experimentations of how a development process works.

We’ve kind of gone through a process of building something around some kind of trending content and we’ve kind of stretched that out into a kind of newsfeed and this is what we’ll go through today.

How can we do something clever and combine those into something that we might be able to move up the page and take over?

So basically we started collecting a whole bunch of data sources just to begin with, it started as one of our guys needed a dissertation project. So we grabbed a lot of data from Facebook, shares likes, comments, grabbed some information from Twitter, from Omniture, which is our analytics and from WordPress, so we can kind of build everything together.

We took all this data and stuck it in MySQL, then we started doing some pretty basic calculations on that. I wanted to keep everything really simple, so the feedback loops that we could have from everyone involved, they could all feel part of this process and having them engaged with us, is gonna give us a lot of benefit.

Because you know, with running some crazy big data thing, and nobody understands what we’re doing, they can’t tell me when I’m doing something wrong – which is really often. So basically we took the views, we took the social interactions and we times it by a multiplier, we get a score.

We ran this every half an hour and every half an hour, we took this one from the previous one and it kind of gave us a rate of change, a number that is going on. When we released the new site, we were lucky enough to convince the editorial team to stick this at the top, second thing down on the homepage, underneath their thing.

It might sound funny, but editorial people are usually pretty protective around what they put on their homepage, this was a reasonably large step for them at the time.

We also kind of snuck in our sidebar program, so we were pretty interested to see the top bit here. It’s how the trending stuff does the bottom bit is kind of how the clicks on the top stuff work.

We were pretty interested to see that even without all the images, having it text based underneath that there was a real pretty similar level of interaction between the two modules. So we thought we’d probably stumbled across something which was interesting and seemed to jive with our users.

They also changed 24/7 without anybody from our team kind of having to touch it, which was quite nice, so on a Sunday morning, before anyone was in the office, that was still reasonably fresh.

It also, from a commercialization point of view, gave us a way of promoting native content and some kind of native display units to hopefully play around making some more money. ‘Cause it’s always nice to do that.

So from that, when we removed swipe from the site, it was far too complicated, we went down kind of a hole, we started playing with a stream of news. At the bottom of the homepage we just grabbed the latest posts and we put them in a kind of you know page, infinite scroll-type approach.

This got quite a high level, we were really surprised even that the bottom of the homepage, kind of screw it away, at the number of interactions that were getting within this.

There were a lot of people kind of scrolling, playing, clicking around, from that we thought well we kind of have this trending stuff at the top, which is doing quite interesting and we have all these people clicking on these kind of lists down here at the bottom.

How can we do something clever and combine those into something that we might be able to move up the page and take over?

So the timeline is pretty straightforward, it’s just kind of sets the thing. The interesting thing was that we took the information that we calculated and put that back into WordPress to be post meta data and then we used that information to style the front end.

So the big image over there was something that was within the trending, the second one down was just normal one and the one at the bottom was something that had been promoted by an editor.

The neat thing was there was a high-level of consistency across all the platforms. We have a responsive site and it worked quite well.

We were kind of like, even just within a normal time-based feed, we were using the data that we had to change the appearance to give the stuff that was popular a larger percentage of screen time, even in something which was time-based.

Then we started playing around with some advertising and things that looked less like advertising and more fit into the style of the site. The neat thing was there was a high-level of consistency across all the platforms, we have a responsive site, it worked quite well. We were playing around with that and we spent lots of time optimizing it and all the graphs went like that which was pretty neat, so I was enjoying myself on that.

The next kind of phase of this was you know, when we were playing around with the trending stuff, it was great but recency was a real problem that we had. Cause you know, you had to get all the data to the point and then it was what was the biggest one between them.

So we had to come up with a simple calculation to you know call that up and so we just introduced a coefficient to that, to kind of give it a shape and boost things up, which were very early on, to allow to give them a bit of airtime and get them closer to the top of the streams.

So add a coefficient to the end and just by taking that, and giving it a score with the coefficient, we built something which seems to get a, seems to be performing pretty well.

We’ve been optimizing that in quite a high-level gain and you have some graphs going up, so the scrolls and clicks. First of all we track a lot of information, so you can see down at the bottom, from going to timelines when we started to newsfeed version to infinite. Each one of those had a gradual increase.

Our statistics, some of the biggest learnings that we’ve had is because we have such variable traffic numbers.

In order to figure out actually what’ going on, we had to break everything down per daily active user which was a kind of,it was a bit one of those moments, like “why didn’t I do this before”?

Because it’s just a kind of number, we can say “hey, you like news or sports”, give them another 5,000 views and the things you read are much more likely to be closer to the top.

When we moved from a time-based feed to a news-based feed, clicks increased by 9 percent across the board, which we’re quite happy with. This has allowed us to kind of take over the homepage and it’s kind of the third thing down.

We’ve been A/B testing and content density, is one of the things we’re moving towards kind of increasing the content density, even more things on the page…more opportunities to click, more people click.

Infinite scroll had a pretty big impact as well because if you stop content then people leave and they don’t have the opportunity to click. Then we would get some good clicks on our native display and the content drivers to native content.

Some of the lessons learned…Content volume is a big problem and kind of a bit of a beast that you keep on feeding and if you don’t feed it or you give it too much of the same type of content, I get complaints, because then it kind of looks clustered within that.

Because we’re running on a scale, we’ve had some pretty fun times with MySQL and Cloudfront. Making sure we cut all of the caches at the highest level, so kind of cutting it at a category level and not playing around with it too much beneath that has allowed us to keep that going, keeping everything fast. The faster things load, the less people notice that and they click.

The common understanding has definitely helped us get feedback throughout that. so some of the things we learned from a WordPress point of view.

So we’ve been using this VIP caching thing to be able to grab the first page of information and make it kind of available and part of the page rather than having to go into it grab via ajax.

That’s the third thing that, down on the homepage. If something falls over, making it always there is good otherwise I get shouted at.

We have also with the API, we built, we actually mimicked the public API’s format,so we can kind of interchangeably use our API versus the kind of latest public API stuff, which is probably one of the quite nice hacks that we did. We were playing around with storing lots of stuff in large options for a while but that didn’t scale very well and we were using post-meta to store information.

We’ve also been playing around with CHEEZETEST to kind of give us the kind of A/B testing results but it can add quite a lot of complication if you’re trying to test too much with it.

So we took a kind of microservice architecture approach to this. We kind of have a service for data mining, a service for the newsfeed, a service for the commercial feed, which keeps all the services quite nicely separated.

We’ve been using Backbone for the templates and Cloudfront for our caching. We’ve also plugged it into an Android app, that we’ve built which has been quite fun which is just a top 10 stories on the site any one time. We are been able to pass in the channels people read and give them a boost up.

Because it’s just a kind of number, we can say hey, you like news or sports, give them another 5,000 views and the things you read are much more likely to be closer to the top.

Of the thing which would be fun, a few installs to that marketing still, the biggest fun challenge we’re having with that… and right on the money, thank you for listening.

To see the presentations from previous Big Media & Enterprise WordPress Meetups, click hereFor Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Austin Smith on Elastic Search on – Now With Full Transcript

Austin Smith is a managing partner at Alley Interactive, a VIP Featured Partner Agency. At our August Big Media Meetup, he gave a short “flash talk” on Elastic Search on in Action, which we’ve shared previously, and we’re publishing it again now with full transcript below. You can read more about the VIP Search Add-On here, and see it in action at

My name is Austin Smith, I’m a partner at a consulting firm called Alley Interactive, and my main project there is for the Kaiser Family Foundation (KFF), for whom I’m a developer. We went live on VIP in May – feels like so long ago. So what Elastic Search does for KFF is it replaces WordPress core search wholesale and it replaces the technology they were using before which was Google custom search clients.

Using a JSN on their new site would have been really tricky because of the nested nature of the data that we migrated for them and it also just wouldn’t surface as much information as they wanted to surface. They do facets, kind of, but it’s hard. Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good.

So this is the default site service screen. Another cool thing we were able to do was to quickly build other kinds of pages, things you would normally use a WordPress loop for, maybe, we were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website. Sites like have been doing it for years; using facets on the left panel to filter down.

We were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website.

With Elastic Search, you can run a search that has no keyword and maybe doesn’t even look like a search and we took that to an even further level by making it power the “Also of interest” spots on article pages and it took some tweaking, but we have it working pretty effectively and I’ll show you in the code that generates that, it’s actually really slick. So I’m going to break into my browser here.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good. They (KFF) write up about healthcare topics, so I’m going to search for “affordable care” and I get a ton of results and it comes back pretty quickly. So we’re doing a lot here: Date filtering – you can specify one or the other or both, Topics – that’s their word for category, they banished the word category from the entire site.

Filtering it is pretty fast. Tags – same things and there are a lot of tags, so we built an expander widget and it ranks them and then Content type, which became kind of an interesting topic for us. Whereas, generally when we had previously architectured a WordPress site, we would have decided what content types to deploy based on shared functionality and we would have used categories and tags to differentiate between them in the site hierarchy.

But in this case, we knew we could get a free facet out of this so we made different content types do the same thing, so that they could have their own facets. They think of their documents, like even if it’s a report, this kind of report is an issue brief, that kind of report is a poll finding and this kind of report is a factsheet. And then they’re all supposed to be called a report.

We could have had one content type, but instead we have four. But I think it’s easier for them to use on the backend, because they know what kind each thing is and it’s much easier on the front end, for them anyway, I don’t know how many other people know the difference between an issue brief and a factsheet, they do.

We also built one other thing for them, right into the search engine. It’s here; I didn’t even have to search for anything else. This is sort of like Google AdWords where they can sponsor their own search results and drive you down a path they think might be more useful. So, if you search for “teens”, well they don’t use the word teens, they use the word adolescents and it will suggest you search for adolescent. So that’s the site’s main search.

This is just like one giant search engine query right here, it’s all Elastic Search.

But there are a number of sections in the site and a lot of them have their own search engine. “State Health Facts” – I’ll show you what this would have looked like on the main site section.  We broke out the result into everything and then “Health Facts”, which are collections of data about healthcare in the United States and around the world which resulted in graphs and maps, giant tables of data and there are about a 1,000 of them and they match just a ton of common keywords, ’cause they’re about common health topics, so they all wanted that in there. They also don’t look as nice because they don’t have the teaser. And then slides, there are like tens of thousands of slides and they just don’t want those to be in the same thing.

Again, because of the control we have here, we’re able to separate out the interface based on each tab and I don’t know if VIP knows we’re doing this, maybe I shouldn’t tell you. Every time you load a search page, it does three Elastic Search queries, the second two by AJAX, because the tabs have counts, so the global results, the global steady data that has to go back to Elastic Search and say “well, if I were to search for this, how many would I get” and it’s pretty fast, I don’t notice them coming in, it’s almost instant. So then if I were to search “health reform”, this specific search engine, it takes me back to the main site search but with a particular facet turned on, the further example of that is in this slide search engine here, this is just like one giant search engine query right here, it’s all Elastic Search.

Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

I think this is particularly funny. The one thing on their site that looks kind of like a blog is the “Perspectives”, it’s a column which their CEO writes and it’s also powered by Elastic Search, so I think we’re maybe using the loop in a couple places but I couldn’t tell you where. If you click into a Perspective here, you’d see the “also of interest” is again dynamically generated by Elastic Search, not in real time, because nothing changes that fast, it’s all cached. The way that we do “also of interest”, which I think is the coolest bit of code you can do with Elastic Search that you can’t really do with a conventional database is we take taxonomies in priority order and then we take tags in priority order. You’ll notice this is not the standard WordPress taxonomy widget, these are re-orderable drop downs.

The tags are here, it’s an autocomplete field, but you can’t add a new tag, they don’t want you to be able to add a new tag, they actually have a taxonomy committee that approves changes. I’m not kidding. Taxonomy committees are great, they’re very very helpful. We’re basically using the term order column, which is already in the WordPress schema, to store the order of every individual taxonomy term, which allows us to send it to Elastic Search in that order and the code to do it is actually very small very elegant. It’s this here: Takes the terms with the post, it does some sort of building an array before this that I won’t show you because you’ve all seen the add something to an array operator.

But the actual query here is this “should” thing, I’m going to give you a list of things that would be cool if they matched and match as many of them and return result in the order of as many of them match, I’m sending you category with an id and tag with an id, and another tag with an id. It’s going to return a match for all 3 first and then a match for the category and the first tag second and the category in the second, third. That’s a big reason why they control their taxonomy so tightly because if they had people adding terms left and right, this would stop being useful because you’d end up with posts with a tag, and it’s the only post with that tag.

The actual search configuration, also pretty simple, this we had to do a lot of background on. VIP wrote a wrapper for the Elastic Search API, we wrote a wrapper for VIP’s wrapper and the result of it is this: which we can use to create a search engine of a given URL by saying “set default, we’re telling our plug in, we want to use this configuration for the core site search. So if you search using a WordPress search mechanism, it’s going to use this. Not in the admin area yet but we’d like to do that too, because it would be very helpful for their administrators.

And then for taxonomies it’s this easy, so we can do some really fast facet configuration, but to add another search engine, it’s that simple, so this creates a search engine that uses search, each search engine is affiliated with a post, because they could have like a teaser, like use this search engine to find XYZ, and then a set up of the facets like news posts get daily news tags and that much code is as much as it takes to create this entire search engine and we had to make it that abstract because I only had 13 minutes.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

VIP Developer Orientation Slides & Video

For a few months now we’ve been hosting live VIP Developer Orientations for new client & partner developers and team members. During the call, we introduce them to the whys of the VIP platform, how we can work together as a strategic partner and collaborative colleague, what the code review & deploy process looks like, and we highlight some tools which will be essential to developing and debugging scalable and secure WordPress sites.

We recorded the last Developer Orientation this past week and we’d like to make it available for any developers or project owners who are curious about how VIP works and how they can get a jumpstart on their projects by getting better acquainted with our documentation, workflow, and best practices. At the end we segued into a Town Hall where current clients can hear what’s coming up and they can ask questions as well.

We’ve embedded the Developer Orientation presentation into the VIP site so you can view it on your own — there are links to some resources at the end which you’ll want to click through & check out. Below, we’ve embedded the video of the orientation and the audio of one of our VIP engineers walking everyone through the material, including some questions at the end.

For the best experience, we recommend you open the presentation in one tab, and have the video walkthrough running in another tab, so you can hear the commentary while you browse and click around.

Open the Developer Orientation slides in another tab.

Want more information about WordPress services for media or enterprise sites? Get in touch.