Data Sync on VIP Go

Anyone who runs an enterprise WordPress application knows there are significant challenges and considerations to copying production data. No matter that your production data runs to tens or hundreds of thousands of articles, contains sensitive “live” data, and is accompanied by tens of gigabytes of images, often you need a complete copy of that production data to test new functionality or to reproduce a persnickety bug.

Today we’re pleased to announce a speedy, streamlined, and structured Data Sync process for VIP clients. This is a step in our larger effort to make copying large amounts of production data entirely self-service, which we will also be rolling out soon. In the meantime, and even after self-service becomes available, we are happy to sync data on behalf of our clients.

Read on for details on how our new process works.

 

As light as a feather

Copying data must never affect the operation of the production site. It cannot place load on the database or impact performance in any way. To remove the impact on our production servers we hook into our backup mechanism, and use the hourly backup data we keep for all production sites.

Fast, complete, and working data

For the large datasets we expect from many of our clients, copying everything over can take a long time and the subsequent operations on the data can take even longer. Our Data Sync completely replicates their production data and we wanted the operation to be as fast as possible.

To sync the data we use the reliable and well tested functionality of our backup systems. Our backups are fast to restore, and have complete internal integrity, e.g. no partly completed data operations, making them ideal for this purpose.

As well as restoring the data, we need to replace any URLs using the production domain with URLs for the new non-production environment. Traditionally this is done using the WP-CLI tool, which provides a command line interface and tools for managing a WordPress install. While this works for the majority of WordPress sites out there, this method is simply too slow for the massive datasets typically used by a high scale WordPress.com VIP client. The slowdowns are caused by the interactions between PHP and the database layer – many hundreds, thousands, or tens of thousands of reads and writes will necessarily take some time!

To replace the URLs in the data at the speed VIP customers demand, our team wrote a Golang script, “go-search-replace“. In our tests, go-search-replace is at least forty times quicker than the equivalent search and replace using WP-CLI commands, reducing operations which took many hours to minutes at most. (We apologize if you were expecting to kick back with a long and refreshing beverage during the Data Sync.)

Massive media libraries

Of course the database is just one part of the story. Many WordPress sites we host include tens, even hundreds, of gigabytes of data and hundreds of thousands of files on our VIP Go Files Service. Copying such a significant amount of data would take many hours. Instead our cloud platform provides a service we call UnionFS.

UnionFS works by making the files for the production site available to all non-production sites in read-only mode. Files shared by UnionFS in this way are served from the same infrastructure and have the same caching rules applied.

Tailored to your WordPress application

Production data often includes connections to APIs and services that should not be active in non-production environments, such as API keys for live payment gateways and connections to mailing lists. To ensure you have confidence in the data, and to be sure you get the same results every time, we provide a WordPress action hook so your code can swap API keys, clear production orders, and any other custom operations that are specific to your WordPress application.

How do I try this?

As we finalize everything that will make this process fully self-service, we will continue to support VIP client Data Sync needs as they arise. If you want the data from your VIP Go WordPress site copied into a non-production environment, please contact our support team and we will be happy to help.

You can read more in our Data Sync documentation.

A VIP Infrastructure for WordPress Cron

We’re happy to announce a new Cron infrastructure for our VIP Cloud Hosting Service platform. In this post we’ll take you through why we did this, how we did it, and what problems it solves for our VIP clients.

The VIP platform provides performance, speed, and scale to the highest traffic sites. Each component and service we support plays a role in that mission. The new VIP Cron infrastructure ensures your site can schedule one-off tasks, offload intensive processing, and run repeated actions reliably, on time, and without additional developer effort. Our Cron implementation builds on the core WordPress Cron API, for maximum code portability from the WordPress ecosystem and familiarity for your engineering teams.

High Traffic Sites and Cron

The WordPress Cron system allows scheduling of asynchronous events, such as publishing a post at a future date or sending out a survey a few days after completing an order. It also facilitates running repeated tasks, such as syndicating content between sites or ingesting videos from third party video services. The core WordPress Cron system works well for many WordPress sites every day.

Traditionally, WordPress Cron is triggered by normal traffic to your WordPress site. Regular visitors trigger an AJAX request back to the server that identifies and runs pending tasks. This approach works great for many sites, as it has no additional dependencies or setup requirements. However, ease-of-use comes with a few trade-offs:

  • Unreliable triggers – cron is only triggered when there is traffic to your site
  • Shared resources – the jobs run on the same server as regular web requests, so intensive cron jobs can negatively affect site performance
  • Hard to scale – difficult to process many jobs in parallel, or handle very large numbers of scheduled events

VIP sites rely on Cron for mission-critical functionality that must work reliably every time. Our new Cron infrastructure is designed to ensure the reliability and scalability of cron events on every VIP site.

Smarts, Brawn, and Confidence – pick three

We’ve improved three main areas of Cron for our VIPs:

Smarter process control. By default, WordPress Cron processes events serially. This is fine for sparse queues composed of light tasks but enterprise sites often require offloading a long running task to Cron for asynchronous processing. These events function like slow moving traffic on a single lane highway. Subsequent events can be processed late due to being “stuck behind” a slow moving task. An enterprise WordPress Cron needs to be able to process offloaded tasks efficiently without impacting the regular operation of the site.

Handling giant queues. A large Cron queue can cause issues where the size of the queue exceeds the capacity of a single option and object caching. An enterprise hosting platform must handle enterprise-sized queues.

Mission-critical scheduling. Initiating an event in core WordPress Cron relies on unrelated web requests to trigger events. This dependency can cause issues with the event processing regularity and timeliness. An enterprise WordPress Cron solution must run scheduled events on time, every time.

In short, we wanted to ensure that the Cron infrastructure for each VIP site was reliable, powerful, and dedicated to that site, just like the rest of the VIP Cloud Hosting Service. We wanted resource intensive tasks to be offloaded to dedicated containers, rather than running on the same resources used to serve web requests. We wanted to ensure tasks for one site did not interfere with other tasks, or with the operation of another site.

It was also important to us that we fully supported the core WordPress Cron API, so our clients can utilize existing plugins and themes without refactoring code or learning a new API.

A Better Cron

Our Cron Control plugin (open source code) builds on the core WordPress Cron system, and is the basis for our Cron enhancements. Cron Control provides a carefully optimized SQL table for WordPress Cron events. This approach satisfies the highly concurrent querying we commonly see on VIP sites. Each named event in the queue is handled in parallel with other events, allowing a much greater event handling capacity.

Cron events on a VIP site run on dedicated containers using an “event runner” written in Golang (open source Golang runner code). Using our container-based infrastructure allows us to scale the number of containers to meet the demands of the particular site, independently of the site’s web traffic

The Cron Control event runner first spawns a batch of “event retrievers” which collect events to be run. In the case of a WordPress multisite this means spawning parallel event retrievers to collect the events for each individual subsite within the multisite. Once events are all retrieved, they are farmed out to a dedicated pool of “event workers” which execute WP CLI commands to run each event.

Busy sites may have several Cron runners in separate containers all processing the queue simultaneously. Our VIP Cron infrastructure takes particular care to orchestrate the activity of the event workers in the different containers, to avoid clashes with two workers processing the same event.

While the event runner is written in Golang, it interacts closely with WordPress through WP CLI commands provided by the Cron Control plugin. All configuration (such as enabling/disabling cron itself and parallelization limits) is via WordPress hooks in the site code, which makes controlling cron processing easy and familiar for WordPress developers.

Ensuring scheduled posts are published on time is a particular concern for many of our clients. Cron Control gives particular priority to ensuring scheduled post events are run when they are found, and that the list of scheduled posts is up to date.

Good monitoring, smooth operations

The Cron Control system is monitored by a Node.js application, itself hosted on VIP Go (yes, we host Node apps too!). The monitor uses a series of dedicated authenticated REST API endpoints on each VIP site (and each subsite on each WordPress multisite) to ensure that event queues remain within acceptable parameters, that the events within the queue are executed in a timely manner, and that execution is proceeding smoothly. If any issues are detected, the VIP team is alerted and investigates the problem.

On time, every time

Our new Cron infrastructure serves the complex and mission-critical needs of some of the most demanding enterprise applications on the web. Contact us now to find out how you can benefit from the same peace of mind and let VIP give you the freedom you to publish.

For existing clients, we have a separate VIP Lobby post where we take you through the steps to take advantage of our new VIP Cron infrastructure.

 

Ready to get started?

Drop us a note.

No matter where you are in the planning process, we’re happy to help, and we’re actual humans here on the other side of the form. 👋 We’re here to discuss your challenges and plans, evaluate your existing resources or a potential partner, or even make some initial recommendations. And, of course, we’re here to help any time you’re in the market for some robust WordPress awesomeness.