kubernetes 集群中 Etcd 备份与恢复 + 备份脚本
先来谈谈为什么要备份,首先我etcd 是部署的3节点的集群 ,按理说 已经算上是高可用了吧
大家都知道etcd 是 k8s 集群 中配置存储中心,与api-server 进行互相通信,任何写入的操作最终的数据都落地到etcd 中,可见etcd 在k8s 集群中的重要性。
但是 一些队友 在k8s 中人为的一些误操作 还是要对etcd 的数据进行备份,提高数据安全性。
etcd数据备份和恢复时要注意使用的API接口,分为2和3两个版本,这里我使用的是ETCD 3
环境:
k8s 集群 二进制部署
etcd v3.4
etcd-1 192.168.31.61
etcd-2 192.168.31.63
etcd-3 192.168.31.66
备份:
#在每个节点上都创建备份目录
[root@master-1 ~]#mkdir /opt/etcd/bak
#在etcd-1 节点上 使用命令备份
#ETCDCTL_API=3 宣称使用的etcd api 接口版本为3版本
#/opt/etcd/bak/snap.db 备份的快照保存的位置
#指定etcd 公钥私钥,以及ca证书
[root@master-1 bin]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot save /opt/etcd/bak/snap.db \
–endpoints=https://192.168.31.61:2379 \
–cacert=/opt/etcd/ssl/ca.pem \
–cert=/opt/etcd/ssl/server.pem \
–key=/opt/etcd/ssl/server-key.pem
{"level":"info","ts":1609649619.228145,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/opt/etcd/bak/snap-2020-0101.db.part"}
{"level":"info","ts":"2021-01-03T12:53:39.249+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1609649619.2496758,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.31.61:2379"}
{"level":"info","ts":"2021-01-03T12:53:39.361+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1609649619.379133,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.31.61:2379","size":"4.3 MB","took":0.150711783}
{"level":"info","ts":1609649619.3793566,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/opt/etcd/bak/snap-2020-0101.db"}
Snapshot saved at /opt/etcd/bak/snap-2020-0101.db
#查看备份的快照
[root@master-1 ~]# ll /opt/etcd/bak
total 4212
-rw——- 1 root root 4309024 Jan 3 12:53 snap-2020-0101.db
#查看快照状态
[root@master-1 bak]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot status snap-2020-0101.db
ad0f41b3, 174429, 1547, 4.3 MB
#拷贝快照到其他主机上的备份目录下
[root@master-1 bak]# scp snap-2020-0101.db root@192.168.31.63:/opt/etcd/bak/
snap-2020-0101.db 100% 4208KB 23.4MB/s 00:00
[root@master-1 bak]# scp snap-2020-0101.db root@192.168.31.66:/opt/etcd/bak/
snap-2020-0101.db
恢复:
模拟故障:
#看看下当前deployment控制器和 pod
[root@master-1 cfg]# kubectl get pods,deployment
NAME READY STATUS RESTARTS AGE
pod/web-test-5cdbd79b55-87pqt 1/1 Running 1 4d15h
pod/web-test-5cdbd79b55-p54nq 1/1 Running 1 4d15h
pod/web-test-5cdbd79b55-r9swh 1/1 Running 1 4d14h
pod/web-test-5cdbd79b55-t8pcx 1/1 Running 1 4d14h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/web-test 4/4 4 4 5d20h
#删除deployments控制器
[root@master-1 bak]# kubectl delete deployments web-test
deployment.apps “web-test” deleted
#再查看下当前POD,发现默认命名空间下已经没有Pod 存在了
[root@master-1 bak]# kubectl get pods
No resources found in default namespace.
恢复快照:
1、在每个节点上停止kube-apiserver和etcd
# 注意: 停止的 方式有所不同:
- 如果是kubeadm 部署的话,需要删除 etcd , Kube-apiserver 的 yaml 文件才行,如果只是删除deployment 控制器的话是没有用的
- 如果是 二进制 部署的话,只需要在master 节点上 停止 kube-apiserver 进程,在各etcd 节点 停止 etcd 进程即可
#我这里是 二进制部署 的 k8s 集群
master节点:
[root@master-1 bak]# systemctl stop kube-apiserver
etcd 各节点:
[root@master-1 bak]# systemctl stop etcd
2、删除个节点 etcd 数据目录(最好直接备份走从命名)
[root@master-1 bak]# mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd-2020-0101-bak
3、在每个节点上恢复
#(注意etcd节点名称,以及IP地址,以及cluster-token,每个人的环境都不一样)别忘了指定备份的快照
etcd-1:
[root@master-1 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \
–name etcd-1 \
–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \
–initial-cluster-token=etcd-cluster \
–initial-advertise-peer-urls=https://192.168.31.61:2380 \
–data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}
{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}
{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}
{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}
{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
etcd-2:
[root@node-1 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \
–name etcd-2 \
–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \
–initial-cluster-token=etcd-cluster \
–initial-advertise-peer-urls=https://192.168.31.63:2380 \
–data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}
{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}
{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}
{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}
{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
etcd-3:
[root@node-2 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \
–name etcd-3 \
–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \
–initial-cluster-token=etcd-cluster \
–initial-advertise-peer-urls=https://192.168.31.66:2380 \
–data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}
{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}
{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}
{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}
{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
#本别启动 api-server 和 etcd各节点
master节点:
[root@master-1 bak]# systemctl start api-server
etcd 节点
[root@master-1 bak]# systemctl start etcd
#验证数据是否还原:
[root@master-1 cfg]# kubectl get pods,deployment
NAME READY STATUS RESTARTS AGE
pod/web-test-5cdbd79b55-87pqt 1/1 Running 1 4d15h
pod/web-test-5cdbd79b55-p54nq 1/1 Running 1 4d15h
pod/web-test-5cdbd79b55-r9swh 1/1 Running 1 4d14h
pod/web-test-5cdbd79b55-t8pcx 1/1 Running 1 4d14h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/web-test 4/4 4 4 5d20h
可以看到被删除的POD 已经恢复回来了
Etcd 实时备份 脚本
#以各自环境为准
[root@master-1 /]# vim /opt/etcd/back_etcd.sh
#!/bin/bash
set -e
exec >> /opt/etcd/log/backup_etcd.log
Date=`date +%Y-%m-%d-%H-%M`
EtcdEndpoints="https://192.168.31.61:2379"
EtcdCmd="/opt/etcd/bin/etcdctl"
BackupDir="/opt/etcd/bak"
BackupFile="snapshot.db.$Date"
cacertfile="/opt/etcd/ssl/ca.pem"
certfile="/opt/etcd/ssl/server.pem"
keyfile="/opt/etcd/ssl/server-key.pem"
echo "`date` backup etcd..."
export ETCDCTL_API=3
$EtcdCmd snapshot save $BackupDir/$BackupFile --endpoints=$EtcdEndpoints --cacert=$cacertfile --cert=$certfile --key=$keyfile
echo "`date` backup done!"
####另外 #######
备份的节点最好不是单一的,虽然是集群,但也怕哪天突然一个节点没有备份,等日后要恢复的时候就傻眼了
所有最好是 备份 >= 2 个节点, 且 要给备份的文件 做实时的监控
至此,本篇结束
2021/1/4 南京
暂无评论内容