Austin Smith on Elastic Search on WordPress.com – Now With Full Transcript

Austin Smith is a managing partner at Alley Interactive, a VIP Featured Partner Agency. At our August Big Media Meetup, he gave a short “flash talk” on Elastic Search on WordPress.com in Action, which we’ve shared previously, and we’re publishing it again now with full transcript below. You can read more about the VIP Search Add-On here, and see it in action at KFF.org.

My name is Austin Smith, I’m a partner at a consulting firm called Alley Interactive, and my main project there is for the Kaiser Family Foundation (KFF), for whom I’m a developer. We went live on VIP in May – feels like so long ago. So what Elastic Search does for KFF is it replaces WordPress core search wholesale and it replaces the technology they were using before which was Google custom search clients.

Using a JSN on their new site would have been really tricky because of the nested nature of the data that we migrated for them and it also just wouldn’t surface as much information as they wanted to surface. They do facets, kind of, but it’s hard. Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good.

So this is the default site service screen. Another cool thing we were able to do was to quickly build other kinds of pages, things you would normally use a WordPress loop for, maybe, we were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website. Sites like Amazon.com have been doing it for years; using facets on the left panel to filter down.

We were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website.

With Elastic Search, you can run a search that has no keyword and maybe doesn’t even look like a search and we took that to an even further level by making it power the “Also of interest” spots on article pages and it took some tweaking, but we have it working pretty effectively and I’ll show you in the code that generates that, it’s actually really slick. So I’m going to break into my browser here.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good. They (KFF) write up about healthcare topics, so I’m going to search for “affordable care” and I get a ton of results and it comes back pretty quickly. So we’re doing a lot here: Date filtering – you can specify one or the other or both, Topics – that’s their word for category, they banished the word category from the entire site.

Filtering it is pretty fast. Tags – same things and there are a lot of tags, so we built an expander widget and it ranks them and then Content type, which became kind of an interesting topic for us. Whereas, generally when we had previously architectured a WordPress site, we would have decided what content types to deploy based on shared functionality and we would have used categories and tags to differentiate between them in the site hierarchy.

But in this case, we knew we could get a free facet out of this so we made different content types do the same thing, so that they could have their own facets. They think of their documents, like even if it’s a report, this kind of report is an issue brief, that kind of report is a poll finding and this kind of report is a factsheet. And then they’re all supposed to be called a report.

We could have had one content type, but instead we have four. But I think it’s easier for them to use on the backend, because they know what kind each thing is and it’s much easier on the front end, for them anyway, I don’t know how many other people know the difference between an issue brief and a factsheet, they do.

We also built one other thing for them, right into the search engine. It’s here; I didn’t even have to search for anything else. This is sort of like Google AdWords where they can sponsor their own search results and drive you down a path they think might be more useful. So, if you search for “teens”, well they don’t use the word teens, they use the word adolescents and it will suggest you search for adolescent. So that’s the site’s main search.

This is just like one giant search engine query right here, it’s all Elastic Search.

But there are a number of sections in the site and a lot of them have their own search engine. “State Health Facts” – I’ll show you what this would have looked like on the main site section.  We broke out the result into everything and then “Health Facts”, which are collections of data about healthcare in the United States and around the world which resulted in graphs and maps, giant tables of data and there are about a 1,000 of them and they match just a ton of common keywords, ’cause they’re about common health topics, so they all wanted that in there. They also don’t look as nice because they don’t have the teaser. And then slides, there are like tens of thousands of slides and they just don’t want those to be in the same thing.

Again, because of the control we have here, we’re able to separate out the interface based on each tab and I don’t know if VIP knows we’re doing this, maybe I shouldn’t tell you. Every time you load a search page, it does three Elastic Search queries, the second two by AJAX, because the tabs have counts, so the global results, the global steady data that has to go back to Elastic Search and say “well, if I were to search for this, how many would I get” and it’s pretty fast, I don’t notice them coming in, it’s almost instant. So then if I were to search “health reform”, this specific search engine, it takes me back to the main site search but with a particular facet turned on, the further example of that is in this slide search engine here, this is just like one giant search engine query right here, it’s all Elastic Search.

Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

I think this is particularly funny. The one thing on their site that looks kind of like a blog is the “Perspectives”, it’s a column which their CEO writes and it’s also powered by Elastic Search, so I think we’re maybe using the loop in a couple places but I couldn’t tell you where. If you click into a Perspective here, you’d see the “also of interest” is again dynamically generated by Elastic Search, not in real time, because nothing changes that fast, it’s all cached. The way that we do “also of interest”, which I think is the coolest bit of code you can do with Elastic Search that you can’t really do with a conventional database is we take taxonomies in priority order and then we take tags in priority order. You’ll notice this is not the standard WordPress taxonomy widget, these are re-orderable drop downs.

The tags are here, it’s an autocomplete field, but you can’t add a new tag, they don’t want you to be able to add a new tag, they actually have a taxonomy committee that approves changes. I’m not kidding. Taxonomy committees are great, they’re very very helpful. We’re basically using the term order column, which is already in the WordPress schema, to store the order of every individual taxonomy term, which allows us to send it to Elastic Search in that order and the code to do it is actually very small very elegant. It’s this here: Takes the terms with the post, it does some sort of building an array before this that I won’t show you because you’ve all seen the add something to an array operator.

But the actual query here is this “should” thing, I’m going to give you a list of things that would be cool if they matched and match as many of them and return result in the order of as many of them match, I’m sending you category with an id and tag with an id, and another tag with an id. It’s going to return a match for all 3 first and then a match for the category and the first tag second and the category in the second, third. That’s a big reason why they control their taxonomy so tightly because if they had people adding terms left and right, this would stop being useful because you’d end up with posts with a tag, and it’s the only post with that tag.

The actual search configuration, also pretty simple, this we had to do a lot of background on. VIP wrote a wrapper for the Elastic Search API, we wrote a wrapper for VIP’s wrapper and the result of it is this: which we can use to create a search engine of a given URL by saying “set default, we’re telling our plug in, we want to use this configuration for the core site search. So if you search using a WordPress search mechanism, it’s going to use this. Not in the admin area yet but we’d like to do that too, because it would be very helpful for their administrators.

And then for taxonomies it’s this easy, so we can do some really fast facet configuration, but to add another search engine, it’s that simple, so this creates a search engine that uses search, each search engine is affiliated with a post, because they could have like a teaser, like use this search engine to find XYZ, and then a set up of the facets like news posts get daily news tags and that much code is as much as it takes to create this entire search engine and we had to make it that abstract because I only had 13 minutes.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

VIP Developer Orientation Slides & Video

For a few months now we’ve been hosting live VIP Developer Orientations for new client & partner developers and team members. During the call, we introduce them to the whys of the VIP platform, how we can work together as a strategic partner and collaborative colleague, what the code review & deploy process looks like, and we highlight some tools which will be essential to developing and debugging scalable and secure WordPress sites.

We recorded the last Developer Orientation this past week and we’d like to make it available for any developers or project owners who are curious about how WordPress.com VIP works and how they can get a jumpstart on their projects by getting better acquainted with our documentation, workflow, and best practices. At the end we segued into a Town Hall where current clients can hear what’s coming up and they can ask questions as well.

We’ve embedded the Developer Orientation presentation into the VIP site so you can view it on your own — there are links to some resources at the end which you’ll want to click through & check out. Below, we’ve embedded the video of the orientation and the audio of one of our WordPress.com VIP engineers walking everyone through the material, including some questions at the end.

For the best experience, we recommend you open the presentation in one tab, and have the video walkthrough running in another tab, so you can hear the commentary while you browse and click around.

Open the Developer Orientation slides in another tab.

Want more information about WordPress services for media or enterprise sites? Get in touch.

The Importance of Escaping All The Things

Nick Daugherty is WordPress.com VIP Lead Engineer. Here he shares some important information about escaping in code and how that can increase security in WordPress sites anywhere in the world. 

If there’s one issue we flag more often than all others in code reviews…it’s escaping.

For starters, we should all agree that escaping (fundamentally, sanitizing input and escaping output) is a critical aspect of web application security. What may be less universally agreed upon is where to escape. On that point, we require “late escaping“- escaping as close as possible to the point of output – and further, we now require it everywherealways.

You may now be thinking:

“Do I really need to “late escape” everything? Always? Even core WordPress functions?”

We hear you. And, here’s why this is important to us:

In addition to some automated scanning, we manually review every line of code our VIP customers commit to the VIP platform. And, while the original author of a particular piece of code may know exactly where they’ve already escaped their output and/or it’s convenient to trust a WordPress core function’s escaping, it’s much, much faster and more reliable for our reviewers to check for “late escaping”. This way a reviewer can be 100% positive that output has been escaped properly by simply looking at the point of output.

We acknowledge this standard requires a bit more effort from developers writing code for the VIP platform. But, we see the benefit as three fold:

1. “late escaping” makes VIP reviewers more efficient, which means customer code is reviewed and deployed faster,

2. a consistent practice of “late escaping” makes missed escaping obvious, thereby reducing the chances that unescaped output makes it into production,

3. a consistently applied escaping standard- and we’ve chosen “late escaping” as ours- allows automated tools to better augment our human reviewers…further improving on #1 and #2 above.

To illustrate the importance of escaping everything, let’s look at a pattern where escaping is commonly omitted: Widget form elements.

A Widget form may look like this:

<label for="<?php echo $this->get_field_id( 'title' ); ?>"><?php _e( 'Title:' ); ?></label>
<input type="text" id="<?php echo $this->get_field_id( 'title' ); ?>" title="<?php echo $this->get_field_id( 'title' ); ?>" name="<?php echo $this->get_field_name( 'title' ); ?>" value="<?php echo esc_attr( $title ); ?>"/>

Those get_field_id( 'title' ); ?> and get_field_name( 'title' ); ?> calls should be safe right, since they are core WordPress functions?

Let’s see what happens when we drop this bit of code anywhere in our codebase:

add_action( 'widget_form_callback', function( $instance, $widget ){
    $widget->id_base = '"><script>alert("Greetings! You have been hacked.");</script>"<';

    return $instance;
}, -999, 2);

Oh no! Javascript has been injected where it shouldn’t be.

Here is a more real world case illustrating how easy it is to get to a point where we’re outputting values of indeterminate origin:

add_action( 'widget_form_callback', function( $instance, $widget ){
    My_Widget_Controller::setup_widget_form( $instance, $widget );

    return $instance;
}, 10, 2);

// ...

class My_Widget_Controller {
    static function setup_widget_form( $instance, $widget ) {
        $widget->id_base    .= self::get_widget_id_base( $instance, $widget );
        $widget->name       .= self::get_widget_name( $instance, $widget );
    }

    static function get_widget_id_base( $instance, $widget ) {
        global $my_config_object;

        return get_option( 'my_widget_id_base_prefix' ) . '_' . $my_config_object['current_site']['widgets']['id_base'];
    }

    static function get_widget_name( $instance, $widget ) {
        $name = '';

        // ... arbitrary processing to arrive at a $name

        return $name;
    }
}

Now we’re down a rabbit hole, and it’s not so clear that get_field_id( 'title' ) will give us safe values.

Even values that are ‘100% safe and there is no way this could ever be abused’ need to be escaped, because future refactorings can introduce hard-to-detect vulnerabilities if there is unescaped code hanging around:

$class = ( 'featured' == $category ) ? 'home-featured' : 'standard';

?>

<div class="<?php echo $class; ?>">...

Seems harmless enough – $class can ever only have two values. Great, we’re safe!

Until 6 months from now, when a new business need refactors this to:

function my_get_post_class( $post ) {
   // ... arbitrary processing to determine a post class. Maybe we pull it from meta now?
   return get_post_meta( $post->ID, 'custom_post_class' );
}

// ...

$class = my_get_post_class( $post );

?>

<div class="<?php echo $class; ?>">...

Hmmm, now we’re outputting meta values directly, and there is no way to know that without following a potentially complex program flow – a recipe for an exploitable site.

What about constants? Those are the foolproof, never changing pillars of security, right? Consider the following:

// let's say this is for setting a class name, depending on the site we're on
$my_setting = get_option( 'safe_data' );

// ... elsewhere

define( 'MY_SAFE_CONSTANT', $my_setting );

// ...

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

later down the line, our option gets updated (somehow):

update_option( 'safe_data', '"><script>alert("hax0rd");</script>' );

Another example of how constants can be exploited is conditional constants:

if ( ! defined( 'MY_SAFE_CONSTANT' ) ) {
    define( 'MY_SAFE_CONSTANT', 'safe-value' );
}

// ... elsewhere

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

As a hacker, all I need to do to inject anything I like into the page is to add this somewhere before the previous code:

define( 'MY_SAFE_CONSTANT', 'unsafe value' );

What About Core Functions?

This concept applies to nearly all code in a theme, including many core functions that return a value. Some core functions that output, such as bloginfo(), have output escaping applied automatically – we recommend using the equivalent ‘return’ function and manually escaping

Example: bloginfo( 'name' ); could be rewritten as esc_html( get_bloginfo( 'name' ) );. This approach ensures everything is properly escaped and removes ambiguity.

A post on the merits of escaping would be incomplete without addressing the fact that most esc_*() functions in WordPress apply a filter before returning. While true, the simple answer is: Filters on the escaping functions simply are not allowed on WP.com, and would be quickly caught during code review. Your site is always much safer when escaping all output.

The Bottom Line

If it’s not escaped on output, it’s potentially exploitable. Never underestimate the abilities of an attacker – they’re experts at finding the way to make the ‘this should never, ever, be possible‘ things happen🙂. For maximum security, we must escape all the things.

A WordPress Agile Journey Through the Eyes of a Project Manager – Big Media & Enterprise Meetup Toronto

Joey Ryken, Rogers Digital Media, presented “A WordPress Agile Journey Through the Eyes of a Project Manager” at the recent Big Media & Enterprise Meetup in Toronto, Canada.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

WordPress and Olympic.ca – Big Media & Enterprise Meetup Toronto

Anthony Moore, TrewKnowledge, presented “WordPress and Olympic.ca” at the recent Big Media & Enterprise Meetup in Toronto, Canada. We’ve also featured the official Canadian Olympic Committee site before on VIP News.

You can see a copy of his presentation online at TrewKnowledge.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

How to Set a Vagrant Development System – Big Media & Enterprise Meetup Toronto

Paul Bearne presented “How to Set a Vagrant Development System” at MetroNews.ca, at the recent Big Media & Enterprise Meetup in Toronto, Canada.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

One Theme, One Multisite, 30+ Unique Websites – Big Media & Enterprise Meetup NYC

Simon Dickson and Simon WheatleyCode for the People, presented “One Theme, One Multisite, 30+ Unique Websites” at the recent Big Media & Enterprise Meetup in New York City.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Big Media & Enterprise WordPress Meetup: Seeing your content as WordPress sees it

Simon Dickson, Code for the People, Director, presented at the Big Media & Enterprise WordPress Meetup in London, with his presentation “Seeing your content as WordPress sees it.”

Simon explores using WordPress for a potential election campaign site and how visualizing the data from a slightly different viewpoint makes it easier to see how it can fit in with WordPress’ data structures and taxonomies.

Watch the video of his presentation and see his slide deck below!

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Defining New Urban Media with WordPress – Big Media & Enterprise Meetup NYC

Dave McKinley, CTO of Oomph, and Grant Cerny, SVP Products & Studios at Interactive One, presented “Defining New Urban Media with WordPress” at the recent Big Media & Enterprise Meetup in New York City.

The presentation focuses on the technology and business challenges the teams faced in launching the 75+ sites in their network. Below are the video and slides from their presentation.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Big Media & Enterprise WordPress Meetup: RUFFLR – not just another WordPress site

Ed Coke-Steel, founder of Rufflr, presented at the recent Big Media & Enterprise Meetup in London.

His presentation, “Rufflr, not just another WordPress site,” focuses on the online wardrobe site he founded which allows users to share and follow fashion online. The site features popular users like musicians from Sony Music who share their fashion looks and fashion bloggers and other noted personalities. WordPress powers the entire site which features rich interaction in the form of profiles, collections, rankings, ratings, and more.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group.