Clickhouse备份与恢复 异机远程备份脚本
Clickhouse 的数据备份有多种方式,其官方网站上就提供了多种备份方式可以参考,但是不同公司有不同的可用资源和业务需求,因此不存在一个通用的解决方案可以应对各种情况下的ClickHouse备份和恢复。 适用于 1GB 数据的方案可能并不适用于几十 PB 数据的情况。 有多种具备各自优缺点的可能方法,将在下面对其进行讨论。最好使用几种方法而不是仅仅使用一种方法来弥补它们的各种缺点。
背景:
某项目clickhouse 数据库迁移,数据量大约9亿条单表,版本为clickhouse20,迁移前我先在本地做了一些列的备份测试,下面我把我在生产中选用的备份方法和遇到的种种小插曲和大家分享下
实验环境:
官网下载的demo数据,大约有8G的测试数据
clickhouse 20.4.6
一、 文本文件导入导出
#说白了就是手动的把数据库里的数据导出成特定的格式,再导入;这种方式!!不推荐!!!!除非你的数据库真的很少,我当时从官网下载了一个demo 数据,手动导入导出,差点把我累死
导出:
clickhouse-client --password helloword --query="select * from iot_dev.t_indicator_record FORMAT CSV" > iot_dev.t_indicator_record.csv导入: 注意FORMAT后面大写
cat iot_dev.t_indicator_record.csv | clickhouse-client --port 9008 --password helloword --query="INSERT INTO iot_dev.t_indicator_record FORMAT CSV"二、 拷贝数据目录
#此方案和mysql 的冷数据恢复是一个方案,直接拷贝走clickhouse 的数据到另一台机器上,修改下相关配置就可以直接启动了,仔细的观察一下 ClickHouse 在文件系统上的目录结构(配置文件
/ect/clickhouse-server/config.xml 里面配置的 ),为了便于查看,只保留了 data 和 metadata 目录。
. ├── data │ ├── default │ ├── system │ │ ├── asynchronous_metric_log │ │ ├── metric_log │ │ ├── query_log │ │ ├── query_thread_log │ │ └── trace_log ├── metadata │ ├── default │ │ └── v_table_size.sql │ ├── default.sql │ ├── system │ │ ├── asynchronous_metric_log.sql │ │ ├── metric_log.sql │ │ ├── query_log.sql │ │ ├── query_thread_log.sql │ │ └── trace_log.sql
基于这个信息,直接把data和metadata目录(要排除 system 库)复制到新集群,即可实现数据迁移
实操步骤:
1、停止原先的clickhouse数据库,并打包好 对应数据库或表的 data 和 metadata 数据 2、拷贝到目标clickhouse数据库对应的目录,比如/var/lib/clickhouse 目录下 3、给clickhouse 赋予权限, chown -Rf clickhouse:clickhouse /var/lib/clickhouse/* chown -Rf clickhouse:clickhouse /var/lib/clickhouse 4、重启目标clickhouse数据库 5、验证数据 select count(1) form iot_dev.t_indicator_record;三、 使用第三方工具,clickhouse-backup
此方案,是我在测试的过程中使用最方便的备份与恢复工具,因此我会详细的说下此工具的使用方法及需要注意的地方。
clickhouse-backup 是社区开源的一个 ClickHouse 备份工具,可用于实现数据迁移。其原理是先创建一个备份,然后从备份导入数据,类似 MySQL 的 mysqldump + SOURCE。这个工具可以作为常规的异地冷备方案
# 使用限制:
支持1.1.54390以上的ClickHouse 仅MergeTree系列表引擎 不支持备份Tiered storage或storage_policy 云存储上的最大备份大小为5TB AWS S3上的parts数最大为10,000(1)、下载clickhouse-backup 软件包
官方提供了二进制版本和rpm包的方式,我这里采用了二进制安装
github地址: https://github.com/AlexAkulov/clickhouse-backup 下载地址: https://github.com/AlexAkulov/clickhouse-backup/releases/download/v1.0.0/clickhouse-backup.tar.gz #解压软件包 tar -xf clickhouse-backup.tar.gz -C /root #复制二进制文件到系统中 cp /root/clickhouse-backup /usr/bin #创建clickhouse-backup 配置文件目录 mkdir -p /etc/clickhouse-backup #拷贝模板配置文件到clickhouse-backup 配置文件目录下 cp /root/config.yml /etc/clickhouse-backup/(2)、修改clickhouse-backup 配置文件config.yml
# 根据clickhouse自身的配置来修改 此配置文件,比如 clickhouse的数据目录,数据库密码,监控地址及端口
官方的配置说明:
clickhouse-backup 除了备份到本机,此外还支持远程备份的方式,备份到s3 上【对象存储】,ftp,sftp 上,还支持 使用 api 接口 访问
general: remote_storage: none # REMOTE_STORAGE,远程备份的方式,对应下面的【azblo,s3,gcs,api,ftp,sftp】 max_file_size: 107374182400 # MAX_FILE_SIZE disable_progress_bar: false # DISABLE_PROGRESS_BAR backups_to_keep_local: 0 # BACKUPS_TO_KEEP_LOCAL, 本地备份保留个数 ,0为不限制 backups_to_keep_remote: 0 # BACKUPS_TO_KEEP_REMOTE,远程的备份保留个数,0为不限制 log_level: info # LOG_LEVEL,日志等级 allow_empty_backups: false # ALLOW_EMPTY_BACKUPS clickhouse: username: default # CLICKHOUSE_USERNAME,数据库用户名,默认 password: "" # CLICKHOUSE_PASSWORD,数据库密码 host: localhost # CLICKHOUSE_HOST,数据库实例地址 port: 9000 # CLICKHOUSE_PORT,数据库连接端口 disk_mapping: {} # CLICKHOUSE_DISK_MAPPING skip_tables: # CLICKHOUSE_SKIP_TABLES ,备份时,需要忽略的库表 - system.* timeout: 5m # CLICKHOUSE_TIMEOUT freeze_by_part: false # CLICKHOUSE_FREEZE_BY_PART secure: false # CLICKHOUSE_SECURE skip_verify: false # CLICKHOUSE_SKIP_VERIFY sync_replicated_tables: true # CLICKHOUSE_SYNC_REPLICATED_TABLES log_sql_queries: true # CLICKHOUSE_LOG_SQL_QUERIES config_dir: "/etc/clickhouse-server" # CLICKHOUSE_CONFIG_DIR restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND azblob: endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX account_name: "" # AZBLOB_ACCOUNT_NAME account_key: "" # AZBLOB_ACCOUNT_KEY sas: "" # AZBLOB_SAS container: "" # AZBLOB_CONTAINER path: "" # AZBLOB_PATH compression_level: 1 # AZBLOB_COMPRESSION_LEVEL compression_format: tar # AZBLOB_COMPRESSION_FORMAT sse_key: "" # AZBLOB_SSE_KEY s3: access_key: "" # S3_ACCESS_KEY,<AWS访问密钥> secret_key: "" # S3_SECRET_KEY bucket: "" # S3_BUCKET,<存储桶BUCKET名称> endpoint: "" # S3_ENDPOINT region: us-east-1 # S3_REGION acl: private # S3_ACL force_path_style: false # S3_FORCE_PATH_STYLE path: "" # S3_PATH , <备份路径> disable_ssl: false # S3_DISABLE_SSL compression_level: 1 # S3_COMPRESSION_LEVEL # supports tar, gzip, zstd, brotli compression_format: tar # S3_COMPRESSION_FORMAT # empty (default), AES256, or aws:kms sse: AES256 # S3_SSE disable_cert_verification: false # S3_DISABLE_CERT_VERIFICATION storage_class: STANDARD # S3_STORAGE_CLASS debug: false # S3_DEBUG gcs: credentials_file: "" # GCS_CREDENTIALS_FILE credentials_json: "" # GCS_CREDENTIALS_JSON bucket: "" # GCS_BUCKET path: "" # GCS_PATH compression_level: 1 # GCS_COMPRESSION_LEVEL compression_format: tar # GCS_COMPRESSION_FORMAT cos: url: "" # COS_URL timeout: 2m # COS_TIMEOUT secret_id: "" # COS_SECRET_ID secret_key: "" # COS_SECRET_KEY path: "" # COS_PATH compression_format: tar # COS_COMPRESSION_FORMAT compression_level: 1 # COS_COMPRESSION_LEVEL api: listen: "localhost:7171" # API_LISTEN enable_metrics: true # API_ENABLE_METRICS enable_pprof: false # API_ENABLE_PPROF username: "" # API_USERNAME password: "" # API_PASSWORD secure: false # API_SECURE certificate_file: "" # API_CERTIFICATE_FILE private_key_file: "" # API_PRIVATE_KEY_FILE create_integration_tables: false # API_CREATE_INTEGRATION_TABLES ftp: address: "" # FTP_ADDRESS timeout: 2m # FTP_TIMEOUT username: "" # FTP_USERNAME password: "" # FTP_PASSWORD tls: false # FTP_TLS path: "" # FTP_PATH compression_format: tar # FTP_COMPRESSION_FORMAT compression_level: 1 # FTP_COMPRESSION_LEVEL debug: false # FTP_DEBUG sftp: address: "" # SFTP_ADDRESS username: "" # SFTP_USERNAME password: "" # SFTP_PASSWORD key: "" # SFTP_KEY path: "" # SFTP_PATH compression_format: tar # SFTP_COMPRESSION_FORMAT compression_level: 1 # SFTP_COMPRESSION_LEVEL我本次使用的配置文件,就是最简单的配置,直接本地备份,然后通过执行脚本的方式去推送备份文件到备份主机上
[root@localhost clickhouse-backup]# cat config.yml general: max_file_size: 1099511627776 disable_progress_bar: false backups_to_keep_local: 0 backups_to_keep_remote: 0 log_level: info allow_empty_backups: false clickhouse: username: default password: "" host: localhost port: 9000 disk_mapping: {} skip_tables: - system.* - default.* timeout: 5m freeze_by_part: false secure: false skip_verify: false sync_replicated_tables: true skip_sync_replica_timeouts: true log_sql_queries: false(3)、查看clickhouse-backup 相关命令
NAME: clickhouse-backup - Tool for easy backup of ClickHouse with cloud support USAGE: clickhouse-backup <command> [-t, --tables=<db>.<table>] <backup_name> VERSION: 1.0.0 DESCRIPTION: Run as root or clickhouse user COMMANDS: tables Print list of tables create Create new backup create_remote Create and upload upload Upload backup to remote storage list Print list of backups download Download backup from remote storage restore Create schema and restore data from backup restore_remote Download and restore delete Delete specific backup default-config Print default config server Run API server help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --config FILE, -c FILE Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG] --help, -h show help --version, -v print the version1、 查看全部默认的配置项
clickhouse-backup default-config2、 查看可备份的表【已在配置文件中过滤掉system和default 库下面的所有表】
[root@localhost clickhouse-backup]# clickhouse-backup tablesbrdatasets.hits_v1 1.50GiB default3、 创建备份
#全库备份
clickhouse-backup create 2021/09/06 10:03:23 info done backup=2021-09-06T14-03-23 operation=create table=datasets.hits_v1 2021/09/06 10:03:23 info done backup=2021-09-06T14-03-23 operation=create备份存储在中 $data_path/backup 下,备份名称默认为时间戳,可手动指定备份名称
clickhouse-backup create fxkjnj_0906备份包含两个目录:
metadata目录: 包含重新创建所需的DDL SQL shadow目录: 包含作为ALTER TABLE ... FREEZE操作结果的数据 [root@localhost backup]# ll /var/lib/clickhouse/backup/2021-09-06T14-03-23/ total 4 drwxr-x--- 3 clickhouse clickhouse 22 Sep 6 10:03 metadata -rw-r----- 1 clickhouse clickhouse 865 Sep 6 10:03 metadata.json drwxr-x--- 3 clickhouse clickhouse 22 Sep 6 10:03 shadow#单表备份
clickhouse-backup create [-t, --tables=<db>.<table>] <backup_name>备份表datasets.hits_v1
clickhouse-backup create -t datasets.hits_v1备份多个表datasets.hits_v1, datasets.hits_v2
clickhouse-backup create -t datasets.hits_v1,datasets.hits_v24、查看备份记录
[root@localhost datasets]# clickhouse-backup list 2021-09-06T14-03-23 1.50GiB 06/09/2021 14:03:23 local 2021/09/06 10:10:50 error InvalidParameter: 1 validation error(s) found. - minimum field size of 1, ListObjectsV2Input.Bucket.5、删除备份文件
[root@localhost datasets]# clickhouse-backup delete local 2021-09-06T14-03-23(4)、数据恢复
语法:
clickhouse-backup restore 备份名
[root@localhost shadow]# clickhouse-backup restore --help NAME: clickhouse-backup restore - Create schema and restore data from backup USAGE: clickhouse-backup restore [-t, --tables=<db>.<table>] [-s, --schema] [-d, --data] [--rm, --drop] <backup_name> OPTIONS: --config FILE, -c FILE Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG] --table value, --tables value, -t value --schema, -s Restore schema only --data, -d Restore data only --rm, --drop Drop table before restore参数:
--table 只恢复特定表,可使用正则。如针对特定的数据库:--table=dbname.* --schema 只还原表结构 --data 只还原数据 --rm 数据恢复之前,先删除需要恢复的表四、使用clickhouse-backup备份与恢复数据-实战
4.1 、本机备份与恢复
1、备份前,先查看数据库里的数据量,一共是17747796条数据
1、备份前,先查看数据库里的数据量,一共是17747796条数据 [root@localhost ~]# clickhouse-client ClickHouse client version 20.4.6.53 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.4.6 revision 54434. localhost :) localhost :) select count(1) from datasets.hits_v1; SELECT count(1) FROM datasets.hits_v1 ┌─count(1)─┐ │ 17747796 │ └──────────┘ 1 rows in set. Elapsed: 0.016 sec. localhost :) exit 2、创建clickhouse-backup配置文件,本地备份文件最多保留7个 [root@localhost ~]# vim /etc/clickhouse-backup/config.yml general: max_file_size: 1099511627776 disable_progress_bar: false backups_to_keep_local: 7 log_level: info allow_empty_backups: false clickhouse: username: default password: "helloword" host: localhost port: 9000 disk_mapping: {} skip_tables: - system.* - default.* timeout: 5m freeze_by_part: false secure: false skip_verify: false sync_replicated_tables: true skip_sync_replica_timeouts: true log_sql_queries: false 3、查看可备份的表 [root@localhost backup]# clickhouse-backup tables datasets.hits_v1 1.50GiB default 4、使用clickhouse-backup 做个备份 [root@localhost clickhouse]# clickhouse-backup create 2021/09/06 10:50:42 info done backup=2021-09-06T14-50-42 operation=create table=datasets.hits_v1 2021/09/06 10:50:42 info done backup=2021-09-06T14-50-42 operation=create 5、模拟进行数据破坏,!!!生产环境勿用!!!此处为测试 [root@localhost clickhouse]# clickhouse-client ClickHouse client version 20.4.6.53 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.4.6 revision 54434. localhost :) drop database datasets; DROP DATABASE datasets Ok. 0 rows in set. Elapsed: 0.002 sec. SHOW DATABASES ┌─name───────────────────────────┐ │ _temporary_and_external_tables │ │ default │ │ system │ test └────────────────────────────────┘ 3 rows in set. Elapsed: 0.002 sec. 6、恢复数据库 [root@localhost backup]# clickhouse-backup restore 2021-09-06T14-50-42 -s -d --rm 2021/09/06 10:57:04 info done backup=2021-09-06T14-50-42 operation=restore table=datasets.hits_v1 2021/09/06 10:57:04 info done backup=2021-09-06T14-50-42 operation=restore 7、验证数据库,查看数据是否复原,数据量是否匹配 [root@localhost backup]# clickhouse-client ClickHouse client version 20.4.6.53 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.4.6 revision 54434. localhost :) show databases; SHOW DATABASES ┌─name───────────────────────────┐ │ _temporary_and_external_tables │ │ datasets │ │ default │ │ system │ │ test │ └────────────────────────────────┘ 5 rows in set. Elapsed: 0.002 sec. localhost :) select count(1) from datasets.hits_v1; SELECT count(1) FROM datasets.hits_v1 ┌─count(1)─┐ │ 17747796 │ └──────────┘ 1 rows in set. Elapsed: 0.020 sec. localhost :)4.2、异机远程备份与恢复
所谓异机远程备份,就是指将备份好的文件,同步传输到远程 专业的存储服务器,往往这些存储服务器都是大存储,可以备份存储1年甚至几年的数据库备份都没问题
环境:
clickhouse 数据库 192.168.99.102
存储服务器 192.168.99.101
1、备份前,先查看数据库里的数据量,一共是17747796条数据 [root@localhost ~]# clickhouse-client ClickHouse client version 20.4.6.53 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.4.6 revision 54434. localhost :) localhost :) select count(1) from datasets.hits_v1; SELECT count(1) FROM datasets.hits_v1 ┌─count(1)─┐ │ 17747796 │ └──────────┘ 1 rows in set. Elapsed: 0.016 sec. localhost :) exit 2、创建clickhouse-backup配置文件,本地备份文件最多保留7个 [root@localhost ~]# vim /etc/clickhouse-backup/config.yml general: max_file_size: 1099511627776 disable_progress_bar: false backups_to_keep_local: 7 log_level: info allow_empty_backups: false clickhouse: username: default password: "helloword" host: localhost port: 9000 disk_mapping: {} skip_tables: - system.* - default.* timeout: 5m freeze_by_part: false secure: false skip_verify: false sync_replicated_tables: true skip_sync_replica_timeouts: true log_sql_queries: false 3、查看可备份的表 [root@localhost backup]# clickhouse-backup tables datasets.hits_v1 1.50GiB default 4、使用clickhouse-backup 做个备份 [root@localhost clickhouse]# clickhouse-backup create 2021/09/06 10:50:42 info done backup=2021-09-06T14-50-42 operation=create table=datasets.hits_v1 2021/09/06 10:50:42 info done backup=2021-09-06T14-50-42 operation=create 5、使用SCP 将备份文件,传输到存储服务器上 [root@localhost clickhouse]# scp -rp /var/lib/clickhouse/backup/2021-09-06T14-50-42 fxkjnj@192.168.99.101:/data/clickhouse-backup/ 6、登录到存储服务器 ,查看备份数据是否存在 [root@localhost clickhouse]# ll /data/clickhouse-backup/2021-09-06T14-50-42 total 4 drwxr-x--- 3 clickhouse clickhouse 22 Sep 6 10:50 metadata -rw-r----- 1 clickhouse clickhouse 865 Sep 6 10:50 metadata.json drwxr-x--- 3 clickhouse clickhouse 22 Sep 6 10:50 shadow五、使用脚本定期异机远程备份
环境: clickhouse 数据库 192.168.99.102 存储服务器 192.168.99.101
条件:
存储服务器 建立备份目录,/data/clickhouse-back clickhouse 数据库 可以免密到 存储服务器 上,免密传输备份文件 vim /var/lib/clickhouse/clickhouse-backup.sh #!/bin/bash #Author fxkjnj.com #################################################### ## ## clickhouse-back script ## backup data at remote host ## you should config ssh trust ## #################################################### MSNAME=yz BAKFILE=$MSNAME-`date +%Y%m%d%H%M%S` LOCAL_BAKDIR=/var/lib/clickhouse/backup REMOTE_BAKDIR=/data/clickhouse-back REMOTE_HOST=root@192.168.99.101 #备份到本地 /usr/bin/clickhouse-backup create $BAKFILE if [[ $? != 0 ]]; then echo "clickhouse-backup Create FAILED" > /var/log/clickhouse-backup.log exit else #SCP备份到远程主机 scp -rp $LOCAL_BAKDIR/$BAKFILE $REMOTE_HOST:$REMOTE_BAKDIR/ if [[ $? != 0 ]]; then echo "clickhouse-backup FAILED" > /var/log/clickhouse-backup.log else echo "clickhouse-backup successful" > /var/log/clickhouse-backup.log fi fi #定期删除远程备份文件 ssh $REMOTE_HOST "find $REMOTE_BAKDIR/yz* -maxdepth 0 -mtime +30 -type d | xargs rm -rf {}"六、常见问题
1、问题现象:使用clickhouse-backup 恢复数据时,提示UUID 问题
clickhouse-backup restore 2021-08-21T06-35-10 -s -d --rm 2021/08/21 14:40:51 error cant create table `default`.`t`: code: 57, message: Directory for table data store/c57/c5780d8a-7d5a-47a3-8578-0d8a7d5a37a3/ already exists after 1 times, please check your schema depencncies解决方法:
去掉 备份文件中 ${backup_path}/2021-08-21T06-35-10/metadata/default/t.json 中的UUID UUID 80ea6411-9c37-4d47-80ea-64119c374d47 再次执行恢复 clickhouse-backup restore 2021-08-21T06-35-10 -s -d --rm SELECT count(1) FROM datasets.hits_v1 ┌─count(1)─┐ │ 17747796 │ └──────────┘ 1 rows in set. Elapsed: 0.016 sec. localhost :) exit扫一扫,关注我们