Introduction

This blog is a follow-up to my previous post: MariaDB Repair Replication through DB Restore. A client recently experienced a crash on the database slave in their MariaDB cluster within their QA environment. I successfully restored the database using the procedure detailed in the earlier blog. However, a few weeks later, the database slave crashed again, this time in their production environment. I attempted to use the same mysqldump procedure, but the restore process was exceedingly slow (the exact cause of which is still under investigation). We decided to halt the restore and instead used a physical backup made with the mariabackup tool. This approach significantly sped up the restoration process, leading me to write this new blog, which explores an alternative solution with the same goal.

In this comprehensive guide, we will dive into the process of restoring a MariaDB database and repairing replication using mariabackup.

Prerequisites

Before starting the restoration and replication repair process, it’s important to make sure you have all the necessary preparations in place:

  • Access to the MariaDB server with appropriate privileges.
  • Basic understanding of SQL commands for database management.

Step 1: Control the status of the master

Before initiating a backup with Mariabackup, it’s crucial to ensure the instance’s health and verify the availability of binary log files. Checking the instance’s health confirms that the database is functioning properly and is in a stable state for backup. Additionally, confirming the availability of binary log files ensures that all transactions are properly logged, enabling point-in-time recovery and accurate replication. This proactive approach safeguards against potential data inconsistencies and ensures the reliability of the backup process.

MariaDB [(none)]> show master status \G
***** 1. row *****
File: mariadb-bin.014290
Position: 234461992
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.000 sec)

MariaDB [(none)]> show binary logs;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| mariadb-bin.014288 | 4871759 |
| mariadb-bin.014289 | 222270 |
| mariadb-bin.014290 | 234497590 |
+--------------------+-----------+
3 rows in set (0.000 sec)

Step 2: Create and prepare the backup

Before starting, it’s recommend to have a user with the correct permissions to run mariabackup commands. If you don’t have it yet, please create it. You can find the documentation about it there https://mariadb.com/kb/en/mariabackup-overview/

CREATE USER 'mariabackup'@'localhost' IDENTIFIED BY 'mypassword';
GRANT RELOAD, PROCESS, LOCK TABLES, BINLOG MONITOR ON *.* TO 'mariabackup'@'localhost';

We will start by creating a backup with the mariabackup command. In this case, I’m creating my backup file in a shared directory accessible from the two servers of my cluster.

root@srv1:~# nohup time /usr/bin/mariabackup --backup --target-dir=${TARGET_DIR} --user=mariabackup --password="${PASSWORD}" >> ${BCKLOG_DIR} 2>>${BCKLOG_DIR}

After the backup creation, we need to prepare the backup. Mariabackup generates data files in the target directory, but they lack point-in-time consistency due to being copied at various times during the backup process. Attempting to restore from these files triggers InnoDB to detect inconsistencies and prevent potential corruption by crashing. To ensure data file consistency for restoration, you must first prepare the backup using the –prepare option.

root@srv1:~# nohup time /usr/bin/mariabackup --prepare --target-dir=${TARGET_DIR} >> ${BCKLOG} 2>>${BCKLOG}

You can monitor the status of your command execution for example by watching over the running processes.

root@srv1:~# ps -aux | grep back
root       110  0.0  0.0      0     0 ?        I<    2023   0:00 [writeback]
root      6282  2.6  0.0 2988760 135736 pts/1  Dl+  11:21   1:42 /usr/bin/mariabackup --prepare --target-dir=/backup/2024-05-21/
root     14258  0.0  0.0  13144  1080 pts/3    S+   12:26   0:00 grep --color=auto back

Step 3: Stop the slave instance

Mariadb service must be stopped before initiating any restoration activities with mariabackup. With the MariaDB service stopped, there are no file locks on the database files. This allows the restoration process to replace or modify database files without encountering issues related to file access permissions or locks.

root@srv2:~# systemctl stop mariadb.service
root@srv2:~# mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

Step 4: Restore the backup

You have multiple way to restore a backup made with mariabackup: “mariabackup –copy-back”, “–move-back”, “cp”, “rsync”, … I let you find more information about this in the official documentation https://mariadb.com/kb/en/full-backup-and-restore-with-mariabackup/#restoring-the-backup.

In our case, we will use rsync, so we don’t need to delete the content of our datadir. Let’s start with the restore.

root@srv2:~# nohup time rsync -ra /backup/2024-05-21/*  /var/lib/mysql/

When restoring a database with mariabackup, it maintains file permissions but writes files as the restoring user. You may need to change the data directory ownership to mysql (both user and group). To do this recursively, use:

root@srv2:~# chown -R mysql:mysql /var/lib/mysql/

Step 5: Retrieve needed replication information

Before restarting the mariadb service, we are going to retrieve information needed to restart the replication process properly. The content of xtrabackup_binlog_info is used to retrieve information about the binary log file and the position within that log file at the time the backup was taken. The output typically looks like this:

root@srv2:/var/lib/mysql# cat xtrabackup_binlog_info
mariadb-bin.014290      253051056       0-1-7612720606

Step 6: Restart the service

You can now restart the mariadb service, then check that the instance has started properly.

root@srv2:~# systemctl start mariadb.service
root@srv2:~# systemctl status mariadb.service

Step 7: Restart the replication

Once the mariadb service has been properly restarted, we can now setup replication again. We first are going to clear the replica’s position in the master’s binary log for a fresh start using the reset slave statement. It is use for a clean and fresh start, it deletes the master.info and relay-log.info files, all relay log files, and starts a new relay log file. Ensure that the replica threads are stopped first.

MariaDB [(none)]> stop slave;
Query OK, 0 rows affected (0.042 sec)

MariaDB [(none)]> reset slave;
Query OK, 0 rows affected (0.031 sec)

Before restarting the slave again, we need to ensure that replication resumes from the correct point to avoid data inconsistencies. We have to use the information from the file xtrabackup_binlog_info to configure a replication slave to start replication from a specific point in the master’s binary log. By using this information, the replication slave can start replicating from the exact point where the backup was taken, ensuring seamless data continuity and consistency.

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='srv1',MASTER_USER='replication_user',MASTER_PASSWORD='$PASSWORD';
MariaDB [(none)]> CHANGE MASTER TO MASTER_LOG_FILE='mariadb-bin.014290', MASTER_LOG_POS=253051056;

After these steps, it’s now time to start the slave again.

MariaDB [(none)]> start slave;

Step 8: Verify Replication

To ensure that replication is functioning correctly, verify the replication status. Monitoring parameters such as “Slave_IO_Running” and “Slave_SQL_Running” helps in identifying any issues and ensuring smooth replication operation. You can also control “second_behind_master” which is useful to know if the replica has catch up with the master. (“Difference between the timestamp logged on the master for the event that the replica is currently processing, and the current timestamp on the replica. “)

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: srv1
                   Master_User: replication_user
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.014290
           Read_Master_Log_Pos: 342078122
                Relay_Log_File: relay-bin.000002
                 Relay_Log_Pos: 10544947
         Relay_Master_Log_File: mariadb-bin.014290
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
... ...
         Seconds_Behind_Master: 6985

Conclusion

By carefully following these steps, you can resolve issues with your MariaDB replication, ensuring your data remains secure and your database operates reliably. Regular monitoring of your database and performing consistent backups will help you prevent problems and maintain smooth operations.

When it comes to backup solutions, both mysqldump and mariabackup have their advantages and disadvantages. mysqldump is straightforward and easy to use, making it a good choice for smaller databases or less complex setups. It generates logical backups, which are human-readable SQL files that can be easily edited and restored.

On the other hand, mariabackup is designed for more complex scenarios, particularly with larger databases. It creates physical backups, which are faster and more efficient for large volumes of data, and it supports point-in-time recovery, making it ideal for minimizing downtime and ensuring data integrity. By understanding the strengths and limitations of each tool, you can choose the one that best fits your specific needs and ensure your database remains robust and reliable.

In case mysqldump suits you more, don’t forget to check my other blog. I hope you find these blogs helpful and would love to hear your feedback! 🙂