kubernetes 常见报错(持续更新)
1、k8s的node1节点处于NotReady
Oct 29 17:56:17 k8s-node1 kubelet[48455]: E1029 17:56:17.983157 48455 kubelet_node_status.go:94] Unable to register node “k8s-node3” with API server: nodes “k8s-node3” is forbidden: node “k8s-node1” is not allowed to modify node “k8s-node3”
Oct 29 17:56:18 k8s-node1 kubelet[48455]: E1029 17:56:18.307200 48455 reflector.go:123] object-“default”/”default-token-8njxz”: Failed to list *v1.Secret: secrets “default-token-8njxz” is forbidden: User “system:node:k8s-node1” cannot list resource “secrets” in API group “” in the namespace “default”: no relationship found between node “k8s-node1” and this object
重新部署node1节点,重新部署后
在master1节点测试重新加载网络,或者可以直接重新加载网络测试不重新部署node1节点
kubectl apply -f kube-flannel.yaml
kubectl apply -f apiserver-to-kubelet-rbac.yaml
2、kubectl, helm 命令自动补全
kubectl 命令自动补全:
yum install -y bash-completion
locate bash_completion
将一下内容添加到/etc/profile
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
helm 命令自动补全:
source <(helm completion bash)
3、新加入Node 节点
部署kubelet,kube-proxy 拷贝其他Node 配置文件(修改),启动文件 , 拷贝SSL 证书(只需要ca.pem kube-proxy-key.pem kube-proxy.pem)
创建flannel 工作目录,配置文件目录
[root@node1 ~]# mkdir /opt/cni/bin /etc/cni/net.d -p
# /opt/cni/bin 用于存放插件的二进制文件
#/etc/cni/net.d/XX.conf 用于存放某一个网络配置文件的名称
4、etcd 节点无法启动
查看etcd 集群状态:
[root@master1 cfg]# /opt/etcd/bin/etcdctl –ca-file=/opt/etcd/ssl/ca.pem –cert-file=/opt/etcd/ssl/server.pem –key-file=/opt/etcd/ssl/server-key.pem –endpoints=”https://192.168.31.63:2379,https://192.168.31.65:2379,https://192.168.31.66:2379″ cluster-health
failed to check the health of member 72130f86e474b7bb on https://192.168.31.66:2379: Get https://192.168.31.66:2379/health: dial tcp 192.168.31.66:2379: connect: connection refused
member 72130f86e474b7bb is unreachable: [https://192.168.31.66:2379] are all unreachable
member b46624837acedac9 is healthy: got healthy result from https://192.168.31.63:2379
member fd9073b56d4868cb is healthy: got healthy result from https://192.168.31.65:2379
cluster is degraded
到所在节点上etcd 上查看具体报错:
— Subject: Unit etcd.service has begun start-up
— Defined-By: systemd
— Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
—
— Unit etcd.service has begun starting up.
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: shadowed by corresponding flag
Jul 31 22:01:43 node2 etcd[39920]: etcd Version: 3.3.13
Jul 31 22:01:43 node2 etcd[39920]: Git SHA: 98d3084
Jul 31 22:01:43 node2 etcd[39920]: Go Version: go1.10.8
Jul 31 22:01:43 node2 etcd[39920]: Go OS/Arch: linux/amd64
Jul 31 22:01:43 node2 etcd[39920]: setting maximum number of CPUs to 4, total number of available CPUs is 4
Jul 31 22:01:43 node2 etcd[39920]: the server is already initialized as member before, starting as etcd member…
Jul 31 22:01:43 node2 etcd[39920]: peerTLS: cert = /opt/etcd/ssl/server.pem, key = /opt/etcd/ssl/server-key.pem, ca = , trusted-ca = /opt/etcd/ssl/ca.pem, client-cert-auth = false, crl-file =
Jul 31 22:01:43 node2 etcd[39920]: listening for peers on https://192.168.31.66:2380
Jul 31 22:01:43 node2 etcd[39920]: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
Jul 31 22:01:43 node2 etcd[39920]: listening for client requests on 127.0.0.1:2379
Jul 31 22:01:43 node2 etcd[39920]: listening for client requests on 192.168.31.66:2379
Jul 31 22:01:43 node2 etcd[39920]: recovered store from snapshot at index 1000012
Jul 31 22:01:43 node2 etcd[39920]: restore compact to 916650
Jul 31 22:01:43 node2 etcd[39920]: name = etcd-3
Jul 31 22:01:43 node2 etcd[39920]: data dir = /var/lib/etcd/default.etcd
Jul 31 22:01:43 node2 etcd[39920]: member dir = /var/lib/etcd/default.etcd/member
Jul 31 22:01:43 node2 etcd[39920]: heartbeat = 100ms
Jul 31 22:01:43 node2 etcd[39920]: election = 1000ms
Jul 31 22:01:43 node2 etcd[39920]: snapshot count = 100000
Jul 31 22:01:43 node2 etcd[39920]: advertise client URLs = https://192.168.31.66:2379
Jul 31 22:01:43 node2 etcd[39920]: read wal error (walpb: crc mismatch) and cannot be repaired
Jul 31 22:01:43 node2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE
Jul 31 22:01:43 node2 systemd[1]: Failed to start Etcd Server.
— Subject: Unit etcd.service has failed
— Defined-By: systemd
— Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
—
— Unit etcd.service has failed.
—
— The result is failed.
接着我删除了 该节点上 etcd 的数据
rm -rf /var/lib/etcd/default.etcd/member/snap/*
rm -rf /var/lib/etcd/default.etcd/member/wal/*
重启该etcd 节点,还是报错:
[root@node2 wal]# journalctl -xe
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: shadowed by corresponding flag
Jul 31 22:05:30 node2 etcd[42059]: etcd Version: 3.3.13
Jul 31 22:05:30 node2 etcd[42059]: Git SHA: 98d3084
Jul 31 22:05:30 node2 etcd[42059]: Go Version: go1.10.8
Jul 31 22:05:30 node2 etcd[42059]: Go OS/Arch: linux/amd64
Jul 31 22:05:30 node2 etcd[42059]: setting maximum number of CPUs to 4, total number of available CPUs is 4
Jul 31 22:05:30 node2 etcd[42059]: the server is already initialized as member before, starting as etcd member…
Jul 31 22:05:30 node2 etcd[42059]: peerTLS: cert = /opt/etcd/ssl/server.pem, key = /opt/etcd/ssl/server-key.pem, ca = , trusted-ca = /opt/etcd/ssl/ca.pem, client-cert-auth = false, crl-file =
Jul 31 22:05:30 node2 etcd[42059]: listening for peers on https://192.168.31.66:2380
Jul 31 22:05:30 node2 etcd[42059]: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
Jul 31 22:05:30 node2 etcd[42059]: listening for client requests on 127.0.0.1:2379
Jul 31 22:05:30 node2 etcd[42059]: listening for client requests on 192.168.31.66:2379
Jul 31 22:05:30 node2 etcd[42059]: member 72130f86e474b7bb has already been bootstrapped
Jul 31 22:05:30 node2 systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILURE
Jul 31 22:05:30 node2 systemd[1]: Failed to start Etcd Server.
— Subject: Unit etcd.service has failed
— Defined-By: systemd
— Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
—
— Unit etcd.service has failed.
—
— The result is failed.
上网查阅了下:
并没有成功启动服务,可以看到提示信息: member 72130f86e474b7bb has already been bootstrapped
查看资料说是:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration. That is why you see the mismatch.
大概意思:
其中一个成员是通过discovery service引导的。必须删除以前的数据目录来清理成员信息。否则成员将忽略新配置,使用旧配置。这就是为什么你看到了不匹配。
看到了这里,问题所在也就很明确了,启动失败的原因在于data-dir (/var/lib/etcd/default.etcd)中记录的信息与 etcd启动的选项所标识的信息不太匹配造成的。
问题解决
第一种方式:
我们可以通过修改启动参数解决这类错误。既然 data-dir 中已经记录信息,我们就没必要在启动项中加入多于配置。
具体修改–initial-cluster-state参数:
[root@node2 member]# vim /usr/lib/systemd/system/etcd.service
将 –initial-cluster-state=new 修改成 –initial-cluster-state=existing,再次重新启动就ok了。
重启成功后再改回来
第二种方式:
复制其他节点的data-dir中的内容,以此为基础上以 –force-new-cluster 的形式强行拉起一个,然后以添加新成员的方式恢复这个集群。
5、yaml 文件无法创建Pod
报错:
[root@k8sm90 demo]# kubectl create -f tomcat-deployment.yaml
error: unable to recognize “tomcat-deployment.yaml”: no matches for kind “Deployment” in version “extensions/v1beta1”
解决方法:
修改yaml 文件,修改 apiVersion 选项: apps/v1
例如:
[root@k8sm90 demo]# cat tomcat-deployment.yaml
apiVersion: apps/v1
kind: Deployment
…
因为我的 k8s 版本是v1.18.5,在这个版本中 Deployment 已经从 extensions/v1beta1 弃用
改为了apps/v1
DaemonSet, Deployment, StatefulSet, and ReplicaSet resources will no longer be served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 by default in v1.16.
如果改完后发现还是报错,[root@host131 prometheus]# kubectl create -f prometheus-deployment.yaml
error: error validating “prometheus-deployment.yaml”: error validating data: ValidationError(Deployment.spec): missing required field “selector” in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with –validate=false
解决方法:原因也非常清晰,Deployment.spec中需要添加selector选择器
selector:
matchLabels:
k8s-app: influxdb
#需要注意的是 Deployment.spec.selector 里面的 Label 需要和template里面的Label一致,这里之所以定义为 k8s-app: influxdb 也是因为为了和template已经存在的Label保持一致,这样就可以成功的进行创建了
6、签发k8s 证书问题
报错:
“code”:5100,”message”:”Invalid policy: no key usage available”}
1、可能是因为 签发的CA 机构证书 复用 ,etcd,apiserver,kubelet 创建证书都需要 单独 创建3个CA 机构证书,!!!!不能复用
2、可能是因为 创建的CA 机构的 “CN”: “XX” 名称 与 证书申请文件 的 “CN”: “XX” 名称不一致!!!! 注意名称必须要保持一致
如:
3、可能是因为 创建的CA 机构的 ca-config.json 配置文件中的 “profiles”: {
“XXXXX” 与 cfssl 签发证书文件时 的 profile名称 不一致!!! 注意名称必须要保持一致
如:
7、执行命令 kubectl run nginx –image=nginx –replicas=2 –port=80 会反馈
Flag –replicas has been deprecated, has no effect and will be removed in the future.
并且只创建一个nginx 容器实例,没有副本
原因:
在K8S v1.18.0 以后,–replicas已弃用 ,推荐用 deployment 创建 pods
8、在创建ingress 规则时,报错: no matches for kind “Ingress” in version “networking.k8s.io/v1”
原因:
networking.k8s.io/v1beta1 == 1.14 to 1.18
networking.k8s.io/v1 = 1.19+
因为我的k8s集群的api 版本是 1.18.6 ,也就是说在创建ingress 规则的时候,yaml 文件中只能写 apiVersion: networking.k8s.io/v1beta1
如果要写 apiVersion: networking.k8s.io/v1 需要满足 api 的版本 大于等于1.19
下面kubernetes 官方 贴出两个ingress 规则yaml 文件:
api 基于 1.14 to 1.18
[root@k8sm90 demo]#cat ingress-rule.yaml
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: test-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
serviceName: test
servicePort: 80
api 基于 1.19 +
[root@k8sm90 demo]#cat ingress-rule.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minimal-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
service:
name: test
port:
number: 80
#另外 通过kubectl get apiservices 可以看到当前K8S 集群所有api
[root@master-1 templates]# kubectl get apiservices
NAME SERVICE AVAILABLE AGE
v1. Local True 6d23h
v1.admissionregistration.k8s.io Local True 6d23h
v1.apiextensions.k8s.io Local True 6d23h
v1.apps Local True 6d23h
v1.authentication.k8s.io Local True 6d23h
v1.authorization.k8s.io Local True 6d23h
v1.autoscaling Local True 6d23h
v1.batch Local True 6d23h
v1.coordination.k8s.io Local True 6d23h
v1.networking.k8s.io Local True 6d23h
v1.rbac.authorization.k8s.io Local True 6d23h
v1.scheduling.k8s.io Local True 6d23h
v1.storage.k8s.io Local True 6d23h
v1beta1.admissionregistration.k8s.io Local True 6d23h
v1beta1.apiextensions.k8s.io Local True 6d23h
v1beta1.authentication.k8s.io Local True 6d23h
v1beta1.authorization.k8s.io Local True 6d23h
v1beta1.batch Local True 6d23h
v1beta1.certificates.k8s.io Local True 6d23h
v1beta1.coordination.k8s.io Local True 6d23h
v1beta1.discovery.k8s.io Local True 6d23h
v1beta1.events.k8s.io Local True 6d23h
v1beta1.extensions Local True 6d23h
v1beta1.networking.k8s.io Local True 6d23h
v1beta1.node.k8s.io Local True 6d23h
v1beta1.policy Local True 6d23h
v1beta1.rbac.authorization.k8s.io Local True 6d23h
v1beta1.scheduling.k8s.io Local True 6d23h
v1beta1.storage.k8s.io Local True 6d23h
v2beta1.autoscaling Local True 6d23h
v2beta2.autoscaling Local True 6d23h
本文版权归 飞翔沫沫情 作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出 原文链接 如有问题, 可发送邮件咨询,转贴请注明出处:https://www.fxkjnj.com/?p=2256
评论列表(4条)
希望大佬每天更新 嘻嘻
@岛屿末歌:必须的
完美!!
no matches for kind “Ingress” in version “networking.k8s.io/v1”
刚好遇到了这个问题,百度了下 就看到了这篇文章 跟着博主的思路 看了下 确实是api 版本的问题,感谢分享