SPO Backup Plan

Many small SPOs dream of their first block. Then after that block shows up a weight is lifted and the real fun begins, like waiting to pull those first rewards. But how many SPOs dream about a backup plan? You've tweaked your pool and she's purring like a lambo on the autobahn. So what could go wrong? And if something does, would you know how to recover? Well, stop staring at your pool metrics and lets explore some basic backup plan options using the 3-2-1 methodology.

What Could Go Wrong?

A better question to ask is when. Anything could go wrong and if it impacts your pool it's a big deal. You gotta be ready to deal with it as quickly as possible given the circumstances at hand. Your next block is in 1 hour and you just lost your core node, GO!

What types of disruptions can cause emergencies?

  • Natural Disaster

  • War or Terrorism

  • Civil Disruption or unrest

  • Accidents or human error

  • Cyber Attacks

What are the most important points of failure to an SPO?

  • Power

  • Internet outages

  • Network

  • Critical operational Data, secret keys, and files

  • Human error

What is a 3-2-1 Backup Plan?

Simple, 3 copies of all your stuff where 2 copies are on two different media types and one copy is completely offsite. This 3-2-1 backup plan is a great way to keep your Stake Pool and its essential data safe.

3 Copies of Data

Following this 3-2-1 plan we have three distinct copies of our stake pool's operational/production data. With one copy being the current data used for stake pool operation purposes (i.e., keys, certs, metadata, wallets, etc...). The other two copies are backups of the pool's operational data.

An important aspect in keeping your pool's data safe and recoverable is for all three copies of the data (operational and the two backups) to be stored in such a manner that if one or more of the copies should fail/lost you always have another copy safe and intact to recover from.

Lastly, it is vital to make sure all your data copies are all updated and kept in sync with the current operational data being used, do not update one copy and leave the other two out of sync. For example if you update the current operational data and leave the backup out of sync you will not be able to recover your stake pool in case of a crisis. All copies should contain the same data from the same exact point in time.

2 Media Types

For the two backup copies we use two different media types. One is a hard drive and the other is a cloud based storage. That way we can be sure if one of the copies is lost or fails we can still recover from it. It is recommend to keep the cloud based backup located in a different region not near your other local copies.

1 Offsite Location

In general, offsite means remotely. However, it is safe enough if you can keep at least 1 backup stored in another place long distance, i.e. not onsite. Hard drive devices fail eventually, so a perfect place for offsite would be cloud drive, NAS or network share.

Physical storages may be damaged by human error, flood, earthquake, or stolen by theft, but that is hardly appear on network drives especially on the cloud storage that offered by well-known service providers. Believe it or not, they have more strict ways to ensure data security.

Why Do I Need A Plan?

Sure you can be that guy or gal, but your introducing a lot of risk. The goal is to minimize downtime and risk. The longer your pool is down the more risk you have of missing a block or the longer it'll take to sync back up to the chain. Once you're sitting on a solid plan you can use it to your pool's advantage. Advertise it as a means to draw delegates. Share your plan in your circle of influence so others can benefit as well. A good decentralized blockchain needs SPOs who are serious about minimizing downtime.

What Should I Backup?

What files are important to an SPO to recover from a crisis?

  • Node.vkey (cold)

  • Node.skey (cold)

  • Node.opcert.counter (cold)

  • Node.kes.vkey (hot)

  • Node.kes.skey (cold)

  • Node.opcert (hot)

  • Node.vrf.vkey (cold)

  • Node.vrf.skey (cold)

  • Payment.vkey (cold)

  • Payment.skey (cold)

  • Stake.vkey (cold)

  • Stake.skey (cold)

  • Stake.address

  • Payment.address (hot)

  • Stake.cert (hot)

  • Metadata.json

  • poolMetadataHash.txt

  • MetadataUrl

  • Pool.registration.cert

  • Deleg.cert (hot)

  • DB snapshot (backup)

How Should I Backup?

The three main types of backups are the full, incremental, and differential backup types. each with its own advantages and disadvantages. We will briefly go over each one and recommend the one that is most suitable for running your stake pool operation.

Full Backup:

A full backup is when you do a complete "point-in-time" copy of your system and the data needed for running your stake pool to a local and/or remote storage device(s). This is fine for a single stake pool operator with limited amount of data to backup to do on a daily basis. It is recommended that for every stake pool you have at least one full backup of both your OS/image used on your node along with a copy of your production data (keys, certs, metadata, wallets, etc...). You could just do a full backup to a usb stick, repo, or to a cloud server every day and be fine, you can find our full usb stick backup script and guide here to learn more. A benefit of this method of backup is that it is the most reliable way to ensure your data is correctly and safely backed up to be used in a moments notice to recover from a disaster. The main drawback of the full backup is that it requires more resource usage from your local or cloud servers which may increase your cost of running the pool depending on your setup.

Incremental Backup:

Unlike with a full backup where you copy the entire system and its data on a scheduled basis, an incremental backup will only copy the data that has changed since the last full backup was done. This can be a much more efficient way to backup your data if you are a small to medium size company that may need efficient, cost effective, reliable, and scalable data backup solutions. While this is a great solution for most business with decent amount of data, for a stake pool operator with little data that changes (other than kes certs) it is not recommended to use this method as it may be overkill. For many Linux users you can use incremental backup tools like Timeshift, or for macOS users you can use rsnaphsot or Time Machine, and for Windows 10 you can use System Restore.

Differential Backup:

Where Should I Backup?

Similar to how you should backup your data, there are three main backup policies or plans that you should consider the local, hybrid, and cloud backup.

The local backup strategy may work for some pools but it is risky even for the smallest of pools since in the case of a extreme event like a natural disaster, war, civil unrest, theft/robbery, or even a human error, you may lose your entire stake pool and its relevant data if you are not prepared.

The hybrid backup strategy is a combination of local and cloud backup. It can be one of the most reliable backup strategies and is the most cost effective for almost any stake pool.

Finally, we have the cloud backup strategy, which is a very reliable backup strategy as well but less cost effective in most cases and requires you to give up full ownership of your pool's hardware and sometimes even data.

How To Recover?

Last updated