Austin Smith on Elastic Search on WordPress.com – Now With Full Transcript

Austin Smith is a managing partner at Alley Interactive, a VIP Featured Partner Agency. At our August Big Media Meetup, he gave a short “flash talk” on Elastic Search on WordPress.com in Action, which we’ve shared previously, and we’re publishing it again now with full transcript below. You can read more about the VIP Search Add-On here, and see it in action at KFF.org.

My name is Austin Smith, I’m a partner at a consulting firm called Alley Interactive, and my main project there is for the Kaiser Family Foundation (KFF), for whom I’m a developer. We went live on VIP in May – feels like so long ago. So what Elastic Search does for KFF is it replaces WordPress core search wholesale and it replaces the technology they were using before which was Google custom search clients.

Using a JSN on their new site would have been really tricky because of the nested nature of the data that we migrated for them and it also just wouldn’t surface as much information as they wanted to surface. They do facets, kind of, but it’s hard. Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good.

So this is the default site service screen. Another cool thing we were able to do was to quickly build other kinds of pages, things you would normally use a WordPress loop for, maybe, we were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website. Sites like Amazon.com have been doing it for years; using facets on the left panel to filter down.

We were able to swap in Elastic Search so now you have a loop with facets, which is really a cool way to browse a website.

With Elastic Search, you can run a search that has no keyword and maybe doesn’t even look like a search and we took that to an even further level by making it power the “Also of interest” spots on article pages and it took some tweaking, but we have it working pretty effectively and I’ll show you in the code that generates that, it’s actually really slick. So I’m going to break into my browser here.

So the search bar being bold and prominent, if you’re going to have a search bar that big, you should probably have the search engine that’s that good. They (KFF) write up about healthcare topics, so I’m going to search for “affordable care” and I get a ton of results and it comes back pretty quickly. So we’re doing a lot here: Date filtering – you can specify one or the other or both, Topics – that’s their word for category, they banished the word category from the entire site.

Filtering it is pretty fast. Tags – same things and there are a lot of tags, so we built an expander widget and it ranks them and then Content type, which became kind of an interesting topic for us. Whereas, generally when we had previously architectured a WordPress site, we would have decided what content types to deploy based on shared functionality and we would have used categories and tags to differentiate between them in the site hierarchy.

But in this case, we knew we could get a free facet out of this so we made different content types do the same thing, so that they could have their own facets. They think of their documents, like even if it’s a report, this kind of report is an issue brief, that kind of report is a poll finding and this kind of report is a factsheet. And then they’re all supposed to be called a report.

We could have had one content type, but instead we have four. But I think it’s easier for them to use on the backend, because they know what kind each thing is and it’s much easier on the front end, for them anyway, I don’t know how many other people know the difference between an issue brief and a factsheet, they do.

We also built one other thing for them, right into the search engine. It’s here; I didn’t even have to search for anything else. This is sort of like Google AdWords where they can sponsor their own search results and drive you down a path they think might be more useful. So, if you search for “teens”, well they don’t use the word teens, they use the word adolescents and it will suggest you search for adolescent. So that’s the site’s main search.

This is just like one giant search engine query right here, it’s all Elastic Search.

But there are a number of sections in the site and a lot of them have their own search engine. “State Health Facts” – I’ll show you what this would have looked like on the main site section.  We broke out the result into everything and then “Health Facts”, which are collections of data about healthcare in the United States and around the world which resulted in graphs and maps, giant tables of data and there are about a 1,000 of them and they match just a ton of common keywords, ’cause they’re about common health topics, so they all wanted that in there. They also don’t look as nice because they don’t have the teaser. And then slides, there are like tens of thousands of slides and they just don’t want those to be in the same thing.

Again, because of the control we have here, we’re able to separate out the interface based on each tab and I don’t know if VIP knows we’re doing this, maybe I shouldn’t tell you. Every time you load a search page, it does three Elastic Search queries, the second two by AJAX, because the tabs have counts, so the global results, the global steady data that has to go back to Elastic Search and say “well, if I were to search for this, how many would I get” and it’s pretty fast, I don’t notice them coming in, it’s almost instant. So then if I were to search “health reform”, this specific search engine, it takes me back to the main site search but with a particular facet turned on, the further example of that is in this slide search engine here, this is just like one giant search engine query right here, it’s all Elastic Search.

Working with the team of VIP, we built it on Elastic Search, which has tremendous ability to filter, facet and limit.

I think this is particularly funny. The one thing on their site that looks kind of like a blog is the “Perspectives”, it’s a column which their CEO writes and it’s also powered by Elastic Search, so I think we’re maybe using the loop in a couple places but I couldn’t tell you where. If you click into a Perspective here, you’d see the “also of interest” is again dynamically generated by Elastic Search, not in real time, because nothing changes that fast, it’s all cached. The way that we do “also of interest”, which I think is the coolest bit of code you can do with Elastic Search that you can’t really do with a conventional database is we take taxonomies in priority order and then we take tags in priority order. You’ll notice this is not the standard WordPress taxonomy widget, these are re-orderable drop downs.

The tags are here, it’s an autocomplete field, but you can’t add a new tag, they don’t want you to be able to add a new tag, they actually have a taxonomy committee that approves changes. I’m not kidding. Taxonomy committees are great, they’re very very helpful. We’re basically using the term order column, which is already in the WordPress schema, to store the order of every individual taxonomy term, which allows us to send it to Elastic Search in that order and the code to do it is actually very small very elegant. It’s this here: Takes the terms with the post, it does some sort of building an array before this that I won’t show you because you’ve all seen the add something to an array operator.

But the actual query here is this “should” thing, I’m going to give you a list of things that would be cool if they matched and match as many of them and return result in the order of as many of them match, I’m sending you category with an id and tag with an id, and another tag with an id. It’s going to return a match for all 3 first and then a match for the category and the first tag second and the category in the second, third. That’s a big reason why they control their taxonomy so tightly because if they had people adding terms left and right, this would stop being useful because you’d end up with posts with a tag, and it’s the only post with that tag.

The actual search configuration, also pretty simple, this we had to do a lot of background on. VIP wrote a wrapper for the Elastic Search API, we wrote a wrapper for VIP’s wrapper and the result of it is this: which we can use to create a search engine of a given URL by saying “set default, we’re telling our plug in, we want to use this configuration for the core site search. So if you search using a WordPress search mechanism, it’s going to use this. Not in the admin area yet but we’d like to do that too, because it would be very helpful for their administrators.

And then for taxonomies it’s this easy, so we can do some really fast facet configuration, but to add another search engine, it’s that simple, so this creates a search engine that uses search, each search engine is affiliated with a post, because they could have like a teaser, like use this search engine to find XYZ, and then a set up of the facets like news posts get daily news tags and that much code is as much as it takes to create this entire search engine and we had to make it that abstract because I only had 13 minutes.

See the presentations from previous Big Media & Enterprise WordPress Meetups. For Big Media & Enterprise WordPress Meetup groups in other cities, see the full list on VIP Events and join your local group. 

Want more information about WordPress services for media or enterprise sites? Get in touch.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s