Best Practices for Full Server Crash and Migration

This page has had its content updated on September 21, 2023 EDT by Jordan

Content Error or Suggest an Edit

Notice a grammatical error or technical inaccuracy? Let us know; we will give you credit!

Affiliate Link

Please note that this post contains one or more affiliate links. Please review our Sponsor page for our Affiliate Disclaimer

Introduction

This article stems from a post in the Self Managed WordPress (by GridPane) Facebook group asking what are the best practices to ease a full server crash/migration. This was supposed to be a Facebook post, but the value I’m going to provide will most likely be used by so many people and having it buried in a Facebook thread would be a tragedy.

Just so that everyone understands, whether it’s GridPane, Runcloud or Cloudways, a server crash or issue can occur regardless of the control panel software or underlying instance service provider. Server issues occur no matter the configuration or platform, I’m just using GridPane as an example here because of the timing.

Local, Remote Backups and Restoration Time

I thought that local backups and Blogvault was adequate, but in this case remote GP backups would have saved a lot of time. I had to use GP ZIP exports and copy those to the new server then import. This wasn’t too bad, but remotes would have streamlined the process. I’ll probably set that up with Backblaze this week.

I don’t use GridPane’s or other providers’ remote/local backup feature, not because it has an issue. I simply need more from a remote backup service, such as email alerts on failures, automatic adding of new sites (staging included), and a non-GridPane backup in case GridPane’s systems have failures. Avoiding the “all eggs in one basket” scenario is important, and so if you’re using a SaaS platform for managing your servers and they provide backups, you should look at using another tool or service for a secondary backup.

In addition to backing up WordPress sites, I also make sure to grab all system files. This can help in certain circumstances when you face data corruption, and restoring a file would bring your server back online in minutes versus spinning up a new server and restoring it.

I’m currently using Snapshooter (Affiliate Link) to achieve this as it supports WordPress. However, there are many other backup providers that can provide similar features.

SSL Certificates and Lets Encrypt

Be aware of the 10 site per 3 hour Lets Encrypt limit. I had 22 certificates to reissue and unfortunately hit this limit. No way around it that I’m aware of.

I’m speaking to GridPane’s platform, but I know other platforms also have the same issues.

Include Certificates in Backups

As for the LetsEncrypt limit, the GridPane site backups should include a copy of the LetsEncrypt certificate that is valid. The file sizes are extremely small and wouldn’t impact the backup size; on import, if no SSL Certificate is present, the backup copy will be used if valid. I created a feature request for this.

https://roadmap.gridpane.com/b/stack-feature-requests/include-site-ssl-certificates-in-local-and-remote-backups

Enable LetsEncrypt Test Certificates

In addition, you should be able to run the SSL provision on the test environment of LetsEncrypt. Granted, it’s an invalid certificate and would give an error on your browser. It would help with debugging and enabling the HTTPS portion of a site versus not being able to enable HTTPS on a site, eliminating the use of Cloudflare to provide a valid SSL Certificate (Just make sure you disable Strict SSL Mode at Cloudflare)

Self Signed Certificates

Unless you successfully generate a LetsEncrypt certificate, you can’t even visit your site over HTTPS. GridPane should support and deploy new sites with a self-signed SSL certificate, enabling HTTPS by default for all new sites. This allows the utilization of Cloudflare’s proxy, which will provide a valid SSL Certificate (Just make sure you disable Strict SSL Mode at Cloudflare).

https://roadmap.gridpane.com/b/stack-feature-requests/enable-ssl-by-default-for-all-new-sites-using-self-signed-certificate

Support Third Party SSL Certificate Vendors

Supporting other SSL Certificate vendors would avoid this issue and let you use paid vendors that have no limits. For instance, you can use Cloudflare origin certificates for your site which are free. I will talk about that next. Here’s a feature request for expanding past LetsEncrypt for SSL Certificates, and self signed certificates.

https://roadmap.gridpane.com/b/stack-feature-requests/enable-third-party-ssl-certificate-authorities

Use Cloudflare Origin Certificates

If you’re in a real bind and using Cloudflare, you can provision the Cloudflare Origin Certificates using the Custom SSL Certificate instructions here https://gridpane.com/kb/setting-up-a-custom-ssl/ which can get you past the LetsEncrypt limits. Granted, these will expire so make sure that you address the LetsEncrypt issue. You can also use a self signed certificate as a custom SSL Certificate if you use Cloudflare proxy too.

Staging Site Backups

Make sure you enable backups on staging sites too! We were developing one staging site on that box and lost about a day of work.

As mentioned previously, if you utilize an external full server backup solution, these would be automatically backed up.

Client DNS Access

If you don’t have direct DNS control of all of your sites, be prepared to have those sites down longer. I had to email 4 clients and those sites are still offline as we wait for them. I’m considering setting these up with floating/reserved IPs through Digital Ocean to prevent that part of the issue from recurring. Does anyone else do this or have another workaround?

Cloudflare Member Access

I always ensure I have DNS access; this is a requirement to start working with a client. I don’t have DNS or domain ownership, just access to make changes. Setting up Cloudflare for clients and sharing access to your account is a great way of maintaining access but not ownership. You can also do this whole process for them by using a temporary email address that forwards to you and your client, verifying the email (required to give access to a domain) and giving yourself access, changing the email to the client and having them verify.

CNAME Flattening

Another option is if the client DNS provider allows CNAME flattening, where they can create their apex record (ex: managingwp.io) as a CNAME pointing to an A record such as server1.managingwp.io If your server’s IP changes, you simply need to update your A record server1.managingwp.io to the new IP Address.

Cloudflare Partial (CNAME) Setup (Partner/Enterprise)

The Cloudflare Partial (CNAME) setup allows your client to point their apex (ex. managingwp.io) and any other records to Cloudflare without changing the name servers on their domain or any other records. Unfortunately, this is only available for Partners and Enterprise customers. You can read more in this article

Using Cloudflare Without Changing your Name Servers

Monitoring WordPress sites that are Cached

Make sure your uptime monitors are set for wp-admin, not the homepage. Because of caching, better uptime never picked up on the outage. Most sites even felt fully functional during the 6 hours since they were fully cached.

Cache Buster URL’s (Query String to Bypass Cache)

This sucks; you set up your monitoring for all your WordPress sites, and a client calls saying that their site’s about page is not working. You pull up their main page, and it’s working, but once you hit their about page, you get a server error. Looking deeper, you notice the server or site is having major issues.

The easiest way around this is to configure a cache-busting URL. Some web servers will cache query strings if they fail; Openlitespeed and Litespeed do this. So instead, you can simply add a query string to the “Do not Cache” portion of the LSCache plugin or to your Nginx config. Update your monitors, and you’ve effectively bypassed the cache for all your monitors.

Cache Busting using Timestamp URLs (Always Random Query String)

If you don’t want to set up a query string bypass on your server’s caching plugin or software, some monitoring providers (Betteruptime,Robot.alp) provide an option to past the current timestamp in the URL, so each request is unique. You can reach out to your monitoring company’s support to confirm if this is possible and how to utilize it.

PS. For Betteruptime, it’s {timestamp} in the request URL.

Additional Advice

Failover or Backup Server

You can setup a failover server in GridPane and simply use it as a means to restore sites versus using GridPane’s local or remote backups. You can use a cheap instance as you will not be serving traffic but using it primarily as a snapshot server. So it simply needs enough storage to store the data.

As for other providers, most don’t provide a failover option. GridPane is unique in this manner, and it’s actually a really great feature.

Rollover your Instances (GridPane Specific)

If you’re running an instance that is older than two years, then it’s advised by GridPane to roll over your instances. Granted this should be the case for full OS upgrades like Ubuntu 18 to Ubuntu 20, this shouldn’t be the case to simply have a functioning server. There is even servers running on beta instances that aren’t being updated that should be rolled over.

Unfortunately to my knowledge there is no server versioning, which might help this situation so that users can tell if their server is actually getting updated or not due to needing to be rolled over. Created a feature request.

https://roadmap.gridpane.com/b/feature-requests/implement-server-versioning/

Conclusion

It’s important to have backups, and be in a situation where you have the necessary access needed to migrate sites when a major underlying outage occurs.

I will add more as they come to me, but please feel free to leave a comment if you have other ideas or feedback.

Best Practices for Full Server Crash and Migration

Table of Contents

Introduction

Local, Remote Backups and Restoration Time