arrow-left

Only this pageAll pages
gitbookPowered by GitBook
1 of 6

Cardano-Heartbeat (CEM) 💞

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Welcome to the Cardano Heartbeat (CEM)

The Cardano Heartbeat (Crisis Event Management Platform) is a Project Catalyst fund-6 proposal to develop a platform for managing the Cardano blockchain in disaster or crisis events.

This Project is a work in progress and is currently being developed by a team from the Armada Alliance organization.

View the full proposal on IdeaScalearrow-up-right

USB Full backup

hashtag
Daily backup to stick

Daily backup of your cores hot keys and operational files to a local or remote usb stick with rsync.

hashtag
Source Disk Setup

hashtag
Log in and open a root shell

Tail syslog before inserting your drive. This will print some information that can help you identify the disk.

Attach the external drive and take note of the assigned device node. eg. /dev/sdb

If the target drive is lacking partition tables syslog may not print the device node assignment. fdisk -l however will.

You can also print a list of drives with fdisk.

Example output:

In my case it is /dev/sdb. Yours may be /dev/sdc, /dev/sdd or so on. /dev/sda is usually the system drive. Do not format your system drive by accident.

hashtag
Baremetal Core

hashtag
Create an new GUID Partition Table (GPT)

This will wipe the disk

Type ? to list options

  1. Enter o for new GPT

  2. Enter n to add a new partition and accept defaults to create a partition that spans the entire disk.

  3. Enter w to write changes to disk and exit gdisk.

Your new partition can be found at /dev/sdb1, the first partition on sdb.

hashtag
Optionaly Check the drive for bad blocks (takes a couple of hours)

hashtag
Format the partition as ext4

Make the usb backup drive always available to our backup job. Since it will be holding sensitive data we will mount it in a way where only root and the user cardano-node runs as can access.

Run blkid and pipe it through awk to get the UUID of the filesystem we just created.

Example output:

For myself the UUID=55e3346a-a7ba-4b60-bd68-fa8f86b8f8ca

Drop back into your regular users shell.

Add mount entry to the bottom of fstab adding your new partitions UUID and the full system path to your backup folder. For this guide we set the path to a folder we will create in our home directory. /home/username/core-backup

Replace user with the user cardano-node runs as.

nofail allows the server to boot if the drive is not inserted.

Create the mountpoint & set default ACL for files and folders with umask.

Mount the drive.

Take ownership of the filesystem.

Reboot the server and confirm the system mounted the drive at boot.

hashtag
Remote core to local machine FAT32

hashtag
Create an new GUID Partition Table (GPT)

This will wipe the disk

Type ? to list options

  1. Enter o for new GPT

  2. Enter n to add a new partition and accept defaults to create a partition that spans the entire disk.

  3. Enter w to write changes to disk and exit gdisk.

Set the msftdata data on the exFAT partition (also taken from Thawn's answer). Since we have only one partition, apply the command to partition 1

Your new partition can be found at /dev/sdb1, the first partition on sdb.

hashtag
Optionaly Check the drive for bad blocks (takes a couple of hours)

hashtag
Mount the drive at boot

We want this drive to always be available to our backup job. Since it will be holding sensitive data we will mount it in a way where only root and the user cardano-node runs as can access.

Run blkid and pipe it through awk to get the UUID of the filesystem we just created.

Example output:

For me the UUID=7FFD-F67C

Drop back into your regular users shell.

Add mount entry to the bottom of fstab adding your new partitions UUID and the full system path to your backup folder. For this guide we set the path to a folder we will create in our home directory. /home/username/core-backup

Identify user id and group id and substitute for in fstab.

nofail allows the server to boot if the drive is not inserted.

Create the mountpoint & set default ACL for files and folders with umask.

hashtag
Scheduled Backups

hashtag
Backup what you want with Rsync as frequently as you want.

Create a script that will only backup if the drive is mounted.

Create an rsync-exclude.txt file so we can rip through and grab everything we need and skip the rest.

hashtag
Setup Cron

Open crontab and add the rule to the bottom.

hashtag
Optional backup alias with mount check

Create an alias in .bashrc or .adaenv if present for manual alias to backup the core.

Add the following at the bottom edit the paths and exclude as you see fit and source the changes.

Now if you want to manually backup the hot keys just type core-backup. For example after generating a new KES pair and node.cert

sudo su
tail -f /var/log/syslog
fdisk -l
Disk /dev/sdb: 57.66 GiB, 61907927040 bytes, 120913920 sectors
Disk model: Cruzer
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EECA81B9-3683-4A59-BC63-02EEDC04FD21
gdisk /dev/sdb
Command (? for help): ?
b	back up GPT data to a file
c	change a partition's name
d	delete a partition
i	show detailed information on a partition
l	list known partition types
n	add a new partition
o	create a new empty GUID partition table (GPT)
p	print the partition table
q	quit without saving changes
r	recovery and transformation options (experts only)
s	sort partitions
t	change a partition's type code
v	verify disk
w	write table to disk and exit
x	extra functionality (experts only)
?	print this menu
badblocks -c 10240 -s -w -t random -v /dev/sdb
sudo mkfs.ext4 -n core-backup /dev/sdb1
sudo blkid /dev/sdb1 | awk -F'"' '{print $4}'
55e3346a-a7ba-4b60-bd68-fa8f86b8f8ca
exit
sudo nano /etc/fstab
UUID=55e3346a-a7ba-4b60-bd68-fa8f86b8f8ca /home/<user>/core-backup ext4 nosuid,nodev,nofail 0 1
cd; mkdir $HOME/core-backup; umask 022 $HOME/core-backup
sudo mount $HOME/core-backup
sudo chown -R $USER:$USER $HOME/core-backup
sudo gdisk /dev/sdb
Command (? for help): ?
b	back up GPT data to a file
c	change a partition's name
d	delete a partition
i	show detailed information on a partition
l	list known partition types
n	add a new partition
o	create a new empty GUID partition table (GPT)
p	print the partition table
q	quit without saving changes
r	recovery and transformation options (experts only)
s	sort partitions
t	change a partition's type code
v	verify disk
w	write table to disk and exit
x	extra functionality (experts only)
?	print this menu
sudo apt install exfatprogs

sudo mkfs.exfat -n core-backup /dev/sdb1
sudo parted /dev/sdb
set 1 msftdata on
q
badblocks -c 10240 -s -w -t random -v /dev/sdb
sudo blkid /dev/sdb1 | awk -F'"' '{print $4}'
7FFD-F67C
exit
id $USER
sudo nano /etc/fstab
UUID=F67F-F095 /home/<user>/core-backup exfat defaults,auto,nofail,uid=<xxx>,gid=<xxx>
cd; mkdir $HOME/core-backup; umask 022 $HOME/core-backup
nano $HOME/core-backup-script.sh
#!/bin/bash

# Local Source
SOURCE="<path to your NODE_HOME>"

# Remote Source
#REMOTE_SOURCE="-i -e "ssh -i $HOME/.ssh/<private key>" <user>@<server name or IP>:$NODE_HOME"

DESTINATION="<Path to your mounted USB stick>"

if grep -qs "$HOME/core-backup" /proc/mounts; then
   echo "Executing Rsync"
   rsync -av --exclude-from="exclude-list.txt" $SOURCE $DESTINATION
else
   echo "Core backup drive is not mounted."
fi
exit 0
chmod +x $HOME/core-backup-script.sh
cd; nano exclude-list.txt
.bash_history
.bash_logout
.bashrc
.cache
.config
.local/bin/cardano-node
.local/bin/cardano-service
.profile
.selected_editor
.ssh
.sudo_as_admin_successful
.wget-hsts
git
tmp
pi-pool/db
pi-pool/scripts
pi-pool/logs
usb-transfer
core-backup-script.sh
exclude-list.txt
crontab -e
# Replace with correct path to your pools working directory
#
# run 3am every day
0 3 * * * $HOME/core-backup-script.sh
cd; nano .bashrc
if grep -qs '$HOME/core-backup ' /proc/mounts; then
    echo "Core backup drive is mounted. Executing Rsync"; alias core-backup="rsync -a --exclude={"db/","scripts/","logs/"} $NODE_HOME $HOME/core-backup/"
else
    echo "Core backup drive is not mounted."
fi
exit 0
source .bashrc
core-backup

Custom Dashboard

Here we provide a custom Grafana dashboard that monitors a specific Stake Pool and the Cardano Blockchain. We have made the Dashboard available on .

Follow this to see the dashboard live in action!

GitHubarrow-up-right
linkarrow-up-right

THIRA Threat Asessment

Our THIRA threat assessment report can be found herearrow-up-right.

Here you can see the full THIRA spreadsheet used to create our reportarrow-up-right.

hashtag
THIRA video overview

Resiliency Checklist

In order to determine the necessary skills, hardware, and good practices needed for a Stake Pool Operators to be more resilient to various unforeseen events that may take down their pools from the network we have broken the checklist into the following sections: Stake Pool Operations Recommended Skills and Resources, Resilience Options, and Redundancy

hashtag
Stake Pool Operator Skills

hashtag
Stake Pool Operation

hashtag
Improved Resilience Options

hashtag
Redundancy (data, software, infrastructure, and Hardware)

  • Data and Software Backup

Power Supply
  • Main stake pool/hardware bare metal in a secure location
  • Hardware

  • Internet

  • Power supply

    • Failover

  • Secondary cloud based ISP (AWS, Azure, GCP, etc)
  • SPO Backup Plan

    Many small SPOs dream of their first block. Then after that block shows up a weight is lifted and the real fun begins, like waiting to pull those first rewards. But how many SPOs dream about a backup plan? You've tweaked your pool and she's purring like a lambo on the autobahn. So what could go wrong? And if something does, would you know how to recover? Well, stop staring at your pool metrics and lets explore some basic backup plan options using the 3-2-1 methodology.

    hashtag
    What Could Go Wrong?

    A better question to ask is when. Anything could go wrong and if it impacts your pool it's a big deal. You gotta be ready to deal with it as quickly as possible given the circumstances at hand. Your next block is in 1 hour and you just lost your core node, GO!

    What types of disruptions can cause emergencies?

    • Natural Disaster

    • War or Terrorism

    • Civil Disruption or unrest

    What are the most important points of failure to an SPO?

    • Power

    • Internet outages

    • Network

    hashtag
    What is a 3-2-1 Backup Plan?

    Simple, 3 copies of all your stuff where 2 copies are on two different media types and one copy is completely offsite. This 3-2-1 backup plan is a great way to keep your Stake Pool and its essential data safe.

    hashtag
    3 Copies of Data

    Following this 3-2-1 plan we have three distinct copies of our stake pool's operational/production data. With one copy being the current data used for stake pool operation purposes (i.e., keys, certs, metadata, wallets, etc...). The other two copies are backups of the pool's operational data.

    An important aspect in keeping your pool's data safe and recoverable is for all three copies of the data (operational and the two backups) to be stored in such a manner that if one or more of the copies should fail/lost you always have another copy safe and intact to recover from.

    Lastly, it is vital to make sure all your data copies are all updated and kept in sync with the current operational data being used, do not update one copy and leave the other two out of sync. For example if you update the current operational data and leave the backup out of sync you will not be able to recover your stake pool in case of a crisis. All copies should contain the same data from the same exact point in time.

    hashtag
    2 Media Types

    For the two backup copies we use two different media types. One is a hard drive and the other is a cloud based storage. That way we can be sure if one of the copies is lost or fails we can still recover from it. It is recommend to keep the cloud based backup located in a different region not near your other local copies.

    hashtag
    1 Offsite Location

    In general, offsite means remotely. However, it is safe enough if you can keep at least 1 backup stored in another place long distance, i.e. not onsite. Hard drive devices fail eventually, so a perfect place for offsite would be cloud drive, NAS or network share.

    Physical storages may be damaged by human error, flood, earthquake, or stolen by theft, but that is hardly appear on network drives especially on the cloud storage that offered by well-known service providers. Believe it or not, they have more strict ways to ensure data security.

    hashtag
    Why Do I Need A Plan?

    Sure you can be that guy or gal, but your introducing a lot of risk. The goal is to minimize downtime and risk. The longer your pool is down the more risk you have of missing a block or the longer it'll take to sync back up to the chain. Once you're sitting on a solid plan you can use it to your pool's advantage. Advertise it as a means to draw delegates. Share your plan in your circle of influence so others can benefit as well. A good decentralized blockchain needs SPOs who are serious about minimizing downtime.

    hashtag
    What Should I Backup?

    What files are important to an SPO to recover from a crisis?

    • Node.vkey (cold)

    • Node.skey (cold)

    • Node.opcert.counter (cold)

    hashtag
    How Should I Backup?

    The three main types of backups are the full, incremental, and differential backup types. each with its own advantages and disadvantages. We will briefly go over each one and recommend the one that is most suitable for running your stake pool operation.

    hashtag
    Full Backup:

    A full backup is when you do a complete "point-in-time" copy of your system and the data needed for running your stake pool to a local and/or remote storage device(s). This is fine for a single stake pool operator with limited amount of data to backup to do on a daily basis. It is recommended that for every stake pool you have at least one full backup of both your OS/image used on your node along with a copy of your production data (keys, certs, metadata, wallets, etc...). You could just do a full backup to a usb stick, repo, or to a cloud server every day and be fine, you can find our full usb stick backup script and guide to learn more. A benefit of this method of backup is that it is the most reliable way to ensure your data is correctly and safely backed up to be used in a moments notice to recover from a disaster. The main drawback of the full backup is that it requires more resource usage from your local or cloud servers which may increase your cost of running the pool depending on your setup.

    hashtag
    Incremental Backup:

    Unlike with a full backup where you copy the entire system and its data on a scheduled basis, an incremental backup will only copy the data that has changed since the last full backup was done. This can be a much more efficient way to backup your data if you are a small to medium size company that may need efficient, cost effective, reliable, and scalable data backup solutions. While this is a great solution for most business with decent amount of data, for a stake pool operator with little data that changes (other than kes certs) it is not recommended to use this method as it may be overkill. For many Linux users you can use incremental backup tools like , or for macOS users you can use or , and for Windows 10 you can use .

    hashtag
    Differential Backup:

    hashtag
    Where Should I Backup?

    Similar to how you should backup your data, there are three main backup policies or plans that you should consider the local, hybrid, and cloud backup.

    The local backup strategy may work for some pools but it is risky even for the smallest of pools since in the case of a extreme event like a natural disaster, war, civil unrest, theft/robbery, or even a human error, you may lose your entire stake pool and its relevant data if you are not prepared.

    The hybrid backup strategy is a combination of local and cloud backup. It can be one of the most reliable backup strategies and is the most cost effective for almost any stake pool.

    Finally, we have the cloud backup strategy, which is a very reliable backup strategy as well but less cost effective in most cases and requires you to give up full ownership of your pool's hardware and sometimes even data.

    hashtag
    How To Recover?

    References
    Accidents or human error
  • Cyber Attacks

  • Critical operational Data, secret keys, and files
  • Human error

  • Node.kes.vkey (hot)

  • Node.kes.skey (cold)

  • Node.opcert (hot)

  • Node.vrf.vkey (cold)

  • Node.vrf.skey (cold)

  • Payment.vkey (cold)

  • Payment.skey (cold)

  • Stake.vkey (cold)

  • Stake.skey (cold)

  • Stake.address

  • Payment.address (hot)

  • Stake.cert (hot)

  • Metadata.json

  • poolMetadataHash.txt

  • MetadataUrl

  • Pool.registration.cert

  • Deleg.cert (hot)

  • DB snapshot (backup)

    • Network Configs

      • ufw/iptables

        • sudo ufw status numbered

        • sudo iptables -S

      • wireguard config

        • /etc/wireguard/wg0.conf

        • /root/wg

      • Router config/snapshot

    • Pool Configs

      • mainnet-config.json

      • mainnet-alonzo-genesis.json

    • Binaries

      • cardano-cli

      • cardano-node

    • Tools and Monitoring

      • gLiveView.sh

      • env

      • cardano-service (armada alliance optional)

    Jeff Greenling's Backup Planarrow-up-right

    msp360.comarrow-up-right

    here
    Timeshiftarrow-up-right
    rsnaphsotarrow-up-right
    Time Machinearrow-up-right
    System Restorearrow-up-right
    Mainnet-byron-genesis.json
  • mainnet-shelley-genesis.json

  • mainnet-topology.json

  • armadaPing.sh (armada alliance optional)

  • topologyUpdater.sh