Restore Cassandra data
Scenario: We're taking backup ( snapshot of data from each node of the cluster with naming node-1-data-DD-MM-YY
and uploading it to storage( in our case Azure Blob Storage). The following steps depict how to restore the data.
Note: This works only on a cluster with the same number of nodes, which means if you have a 7 node cluster; To restore the data you need a 7 node cluster. Though you don't need the same size VM. For example, if you have a 16core 64G machine in prod for a single node, you need only 2 core 4G machines to restore the data. But the node count should be the same.
Note: For the procedure to be clear, I am assuming that you are going to restore a 7 node cluster backup. Cassandra directory is
/var/lib/cassandra/
and Cassandra configuration file is/etc/cassandra/cassandra.yaml
Steps:
Download the data for each node respectively. Means node-1 data into the new node-1 machine and node-2 data in the new node-2, so on and so forth.
Follow the steps in all nodes
Stop the Cassandra cluster:
sudo systemctl stop cassandra
Remove all data:
sudo rm -rf /var/lib/cassandra/*
In every backup folder, there will a
tokenring.txt
file which contains the token ring for that node. Copy the content of that file and paste in/etc/cassandra/cassandra.yaml
asinitial_token: <data copied>
.
Ref initial_token: https://docs.datastax.com/en/cassandra-oss/2.1/cassandra/configuration/configCassandra_yaml_r.html#configCassandra_yaml_r__initial_tokenChange the permission of the cassandra directory
sudo chown -R cassandra:cassandra /var/lib/cassandra
Start Cassandra:
sudo systemctl start cassandra
After some time, checknodetool status
and all nodes should beUN
status
Run in only one node
Restoring schema: In any node, go to
backupfolder/cassandra_backup/
there will be adb_schema.sql
file.
To restore the schema:cqlsh -f db_schema.sql
. Once that operation is done, follow the below steps.
Run in all nodes
Stop Cassandra:
sudo systemctl stop cassandra
Download the restoration script
Restore the data:
sudo python3 ./cassandra_restore_v2.py --snapshotdir /path/to/backup_dir/cassandra_backup --datadirectory /var/lib/cassandra/data
It'll run a bunch of operations, and copy the data into the Cassandra data folder.
Note: These copy operations are hard links, so you don't need double the space of data.Check the data size
du -sh /var/lib/cassandra/data
Change the permission of data:
sudo chown -R cassandra:cassandra /var/lib/cassandra/*
Start Cassandra
sudo systemctl start cassandra
Note: if any of the Cassandra nodes didn't start do a stop and start Cassandra. This is because all the nodes started at the same time.