kubernetes 集群中 Etcd 备份与恢复 + 备份脚本

kubernetes 集群中 Etcd 备份与恢复 + 备份脚本

 

先来谈谈为什么要备份,首先我etcd 是部署的3节点的集群 ,按理说 已经算上是高可用了吧

大家都知道etcd 是 k8s 集群 中配置存储中心,与api-server 进行互相通信,任何写入的操作最终的数据都落地到etcd 中,可见etcd 在k8s 集群中的重要性。

但是 一些队友 在k8s 中人为的一些误操作 还是要对etcd 的数据进行备份,提高数据安全性。

etcd数据备份和恢复时要注意使用的API接口,分为2和3两个版本,这里我使用的是ETCD 3

kubernetes 集群中 Etcd 备份与恢复 + 备份脚本

环境:

k8s 集群 二进制部署

etcd v3.4

etcd-1 192.168.31.61

etcd-2 192.168.31.63

etcd-3 192.168.31.66

 

备份:

#在每个节点上都创建备份目录

[root@master-1 ~]#mkdir /opt/etcd/bak

 

#在etcd-1 节点上 使用命令备份

#ETCDCTL_API=3 宣称使用的etcd api 接口版本为3版本

#/opt/etcd/bak/snap.db 备份的快照保存的位置

#指定etcd 公钥私钥,以及ca证书

 

[root@master-1 bin]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot save /opt/etcd/bak/snap.db \

–endpoints=https://192.168.31.61:2379 \

–cacert=/opt/etcd/ssl/ca.pem \

–cert=/opt/etcd/ssl/server.pem \

–key=/opt/etcd/ssl/server-key.pem

{"level":"info","ts":1609649619.228145,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/opt/etcd/bak/snap-2020-0101.db.part"}

{"level":"info","ts":"2021-01-03T12:53:39.249+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}

{"level":"info","ts":1609649619.2496758,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.31.61:2379"}

{"level":"info","ts":"2021-01-03T12:53:39.361+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}

{"level":"info","ts":1609649619.379133,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.31.61:2379","size":"4.3 MB","took":0.150711783}

{"level":"info","ts":1609649619.3793566,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/opt/etcd/bak/snap-2020-0101.db"}

Snapshot saved at /opt/etcd/bak/snap-2020-0101.db

 

#查看备份的快照

[root@master-1 ~]# ll /opt/etcd/bak

total 4212

-rw——- 1 root root 4309024 Jan 3 12:53 snap-2020-0101.db

 

#查看快照状态

[root@master-1 bak]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot status snap-2020-0101.db

ad0f41b3, 174429, 1547, 4.3 MB

 

#拷贝快照到其他主机上的备份目录下

[root@master-1 bak]# scp snap-2020-0101.db root@192.168.31.63:/opt/etcd/bak/

snap-2020-0101.db 100% 4208KB 23.4MB/s 00:00

[root@master-1 bak]# scp snap-2020-0101.db root@192.168.31.66:/opt/etcd/bak/

snap-2020-0101.db

 

 

恢复:

模拟故障:

#看看下当前deployment控制器和 pod

[root@master-1 cfg]# kubectl get pods,deployment

NAME READY STATUS RESTARTS AGE

pod/web-test-5cdbd79b55-87pqt 1/1 Running 1 4d15h

pod/web-test-5cdbd79b55-p54nq 1/1 Running 1 4d15h

pod/web-test-5cdbd79b55-r9swh 1/1 Running 1 4d14h

pod/web-test-5cdbd79b55-t8pcx 1/1 Running 1 4d14h

NAME READY UP-TO-DATE AVAILABLE AGE

deployment.apps/web-test 4/4 4 4 5d20h

 

#删除deployments控制器

[root@master-1 bak]# kubectl delete deployments web-test

deployment.apps “web-test” deleted

 

#再查看下当前POD,发现默认命名空间下已经没有Pod 存在了

[root@master-1 bak]# kubectl get pods

No resources found in default namespace.

 

 

恢复快照:

1、在每个节点上停止kube-apiserver和etcd

# 注意: 停止的 方式有所不同:

  •               如果是kubeadm 部署的话,需要删除 etcd , Kube-apiserver 的 yaml 文件才行,如果只是删除deployment 控制器的话是没有用的
  •               如果是 二进制 部署的话,只需要在master 节点上 停止 kube-apiserver 进程,在各etcd 节点 停止 etcd 进程即可

 

#我这里是 二进制部署 的 k8s 集群

master节点:

[root@master-1 bak]# systemctl stop kube-apiserver

 

etcd 各节点:

[root@master-1 bak]# systemctl stop etcd

 

 

2、删除个节点 etcd 数据目录(最好直接备份走从命名)

[root@master-1 bak]# mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd-2020-0101-bak

 

3、在每个节点上恢复

#(注意etcd节点名称,以及IP地址,以及cluster-token,每个人的环境都不一样)别忘了指定备份的快照

etcd-1:

[root@master-1 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \

–name etcd-1 \

–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \

–initial-cluster-token=etcd-cluster \

–initial-advertise-peer-urls=https://192.168.31.61:2380 \

–data-dir=/var/lib/etcd/default.etcd

{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}

{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}

{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}

{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}

{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

 

etcd-2:

[root@node-1 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \

–name etcd-2 \

–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \

–initial-cluster-token=etcd-cluster \

–initial-advertise-peer-urls=https://192.168.31.63:2380 \

–data-dir=/var/lib/etcd/default.etcd

{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}

{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}

{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}

{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}

{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

 

etcd-3:

[root@node-2 / ]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap-2020-0101.db \

–name etcd-3 \

–initial-cluster=”etcd-1=https://192.168.31.61:2380,etcd-2=https://192.168.31.63:2380,etcd-3=https://192.168.31.66:2380″ \

–initial-cluster-token=etcd-cluster \

–initial-advertise-peer-urls=https://192.168.31.66:2380 \

–data-dir=/var/lib/etcd/default.etcd

{"level":"info","ts":1609651738.9956837,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

{"level":"info","ts":1609651739.1263688,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":173190}

{"level":"info","ts":1609651739.1434276,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"72130f86e474b7bb","added-peer-peer-urls":["https://192.168.31.66:2380"]}

{"level":"info","ts":1609651739.1435857,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b10f0bac3883a232","added-peer-peer-urls":["https://192.168.31.61:2380"]}

{"level":"info","ts":1609651739.143635,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"bc5dd24e13e697c0","local-member-id":"0","added-peer-id":"b46624837acedac9","added-peer-peer-urls":["https://192.168.31.63:2380"]}

{"level":"info","ts":1609651739.1764905,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap-2020-0101.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

 

#本别启动 api-server 和 etcd各节点

master节点:

[root@master-1 bak]# systemctl start api-server

 

etcd 节点

[root@master-1 bak]# systemctl start etcd

 

#验证数据是否还原:

[root@master-1 cfg]# kubectl get pods,deployment

NAME READY STATUS RESTARTS AGE

pod/web-test-5cdbd79b55-87pqt 1/1 Running 1 4d15h

pod/web-test-5cdbd79b55-p54nq 1/1 Running 1 4d15h

pod/web-test-5cdbd79b55-r9swh 1/1 Running 1 4d14h

pod/web-test-5cdbd79b55-t8pcx 1/1 Running 1 4d14h

NAME READY UP-TO-DATE AVAILABLE AGE

deployment.apps/web-test 4/4 4 4 5d20h

可以看到被删除的POD 已经恢复回来了

 

 

Etcd 实时备份 脚本

#以各自环境为准

[root@master-1 /]# vim /opt/etcd/back_etcd.sh

#!/bin/bash

set -e

exec >> /opt/etcd/log/backup_etcd.log

Date=`date +%Y-%m-%d-%H-%M`

EtcdEndpoints="https://192.168.31.61:2379"

EtcdCmd="/opt/etcd/bin/etcdctl"

BackupDir="/opt/etcd/bak"

BackupFile="snapshot.db.$Date"

cacertfile="/opt/etcd/ssl/ca.pem"

certfile="/opt/etcd/ssl/server.pem"

keyfile="/opt/etcd/ssl/server-key.pem"

echo "`date` backup etcd..."

export ETCDCTL_API=3

$EtcdCmd snapshot save $BackupDir/$BackupFile --endpoints=$EtcdEndpoints --cacert=$cacertfile --cert=$certfile --key=$keyfile

echo "`date` backup done!"

 

####另外 #######

备份的节点最好不是单一的,虽然是集群,但也怕哪天突然一个节点没有备份,等日后要恢复的时候就傻眼了

所有最好是 备份 >= 2 个节点, 且 要给备份的文件 做实时的监控

 

至此,本篇结束

2021/1/4   南京

本文版权归 飞翔沫沫情 作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出 原文链接 如有问题, 可发送邮件咨询,转贴请注明出处:https://www.fxkjnj.com/?p=2551

发表评论

登录后才能评论