Last week, a fire at an OVH Data Center in Strasbourg, France, took down millions of sites, including ours. After the initial shock and with no time to spare, our team worked relentlessly to get everything up and running. To better understand the impact of the fire on our services and how we managed to bounce back, we spoke to Jean-Baptiste Marchand-Arvier, co-founder and CEO of WP Rocket.
Can You Give Us a Rundown of What Happened?
On March 10th, an OVH data center located in Strasbourg burned down. Thankfully, no one was hurt but almost all of our websites and services were hosted there.
As a result of the fire, our servers were no longer reachable, and this had massive consequences on WP Rocket:
- Our main website where people can usually buy and download WP Rocket was down
- Our licensing system to update and add WP Rocket to new websites was out of order
- The Critical Path CSS feature was not working anymore
The services of WP Media, the parent company of WP Rocket, were also affected, in particular:
- Imagify, our image optimization service, was down
- RocketCDN, our CDN service, was missing some features: website creation, deletion and cache purge
- Our internal services such as invoicing and data analysis were unavailable
What Was the Strategy in Place to Get Everything Back Up?
Our strategy was a simple yet effective one:
- First, we started by listing all the affected services
- We then listed all the backups we had and performed integrity checks
- We prioritized which services we needed to have back up first
- Finally, we configured our servers from scratch and imported the data
While doing what was necessary to get our services up and running, we also tried to be as transparent as possible with the whole team as well as with our users about what was happening. In particular, we regularly communicated on social media, through in-app messages and emails.
By keeping as many users as possible in the loop and up to date with the latest developments, we wanted to avoid any sort of panic that might have resulted in an overwhelming flow of messages to our support team.
Did OVH Provide Any Sort of Support?
Not really. They were very busy working on their own issues, and their web-panel was not working which made it extremely difficult to order new servers.
Fortunately, we were able to order a few ones, and our friends at O2Switch also provided some powerful servers to help us out.
Were There Any Backups?
Yes, we had backups that were generated every day and uploaded to another data center. Of course, thanks to Murphy’s law, there were also some issues with several of these backups. It turned out that some of them were corrupted and we were not able to retrieve all of our data.
Fortunately, thanks to other services we were using and Google Cache, we were able to recreate our missing databases and content.
Is Everything Back to Normal Now?
For our users, the short answer is yes. We have indeed been able to restore all our services. However, they are not quite on the same level of performance as they were before the incident. We’re on it, and right now our focus is on making sure that our services are working as expected and fully effective. We are also working on getting our internal services back online.
What Are the Takeaways From the Incident?
There are many! The first thing is that when the worst does happen, you are never fully prepared and ready for that. Even with backups and plans in place, a crisis is a crisis and one is forced to deal with the unexpected. On the bright side, this incident has allowed us to have a more robust infrastructure in place and to be way more careful when it comes to backups and backup integrity checks.
Do You Have Any Insights to Share With Others?
The only advice I can give is very basic yet it is truly what makes the difference. You need to have backups to an external location and perform regular integrity checks. Take some time, even if it’s just once a month for example, to manually check that everything is working as it is supposed to.
In our case, we had a live replication of our database in the same data center, which was great in case we were to experience an issue with one of the servers. However, we didn’t really think of the possibility of the whole data center burning down. That’s the worst case scenario that everyone should be ready for.
I want to thank both our users and partners for their understanding, and patience while we worked on getting everything back to normal. It has been absolutely amazing witnessing the outpouring of support during the last days. We’ve had companies reach out to us to offer help, and customers sharing some kind words with us on social media. In the midst of a crisis, that sort of encouragement is invaluable.