====== Rancher 的異常排解紀錄 ====== ===== 無法正確啟動的判別方式 ===== * rkeuser@iiidevops4:~$ kubectl get pod -n cattle-system NAME READY STATUS RESTARTS AGE cattle-cluster-agent-6bf6f8fcc4-sznpp 1/1 Running 0 18m cattle-node-agent-79nrh 1/1 Running 23 67d cattle-node-agent-ch6pn 1/1 Running 23 67d cattle-node-agent-jr5bq 1/1 Running 7 7d20h cattle-node-agent-k2fcs 1/1 Running 26 67d rancher-98d8d5cf5-hbjjv 1/1 Running 1 25m rancher-98d8d5cf5-nhlwz 0/1 CrashLoopBackOff 8 25m rancher-98d8d5cf5-zjbzs 0/1 Running 0 105s - 找出哪個 rancher pod 是 leader $ kubectl describe configMap cattle-controllers -n kube-system Name: cattle-controllers Namespace: kube-system Labels: Annotations: control-plane.alpha.kubernetes.io/leader: {"holderIdentity":"rancher-98d8d5cf5-hbjjv","leaseDurationSeconds":45,"acquireTime":"2021-09-08T06:40:25Z","renewTime":"2021-09-08T07:02:5... Data ==== Events: - 可以看到目前的 leader : rancher-98d8d5cf5-hbjjv , 所以可以看一下這 pod 的紀錄 $ kubectl logs rancher-98d8d5cf5-hbjjv -n cattle-system 2021/09/08 06:38:27 [INFO] Rancher version v2.4.15 (cdb64d640) is starting 2021/09/08 06:38:27 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:auto Embedded:false HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLog Path:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Features:} 2021/09/08 06:38:27 [INFO] Listening on /tmp/log.sock I0908 06:38:27.719747 6 http.go:122] HTTP2 has been explicitly disabled : 2021/09/08 06:56:18 [ERROR] AppController p-gn54t/test-20210831-master-sq [helm-controller] failed with : Get "https://10.43.0.1:443/apis/project.cattle.io/v3/namespaces/p-gn54t/apprevisions?labelSelector=io.cattle.field%!F(MISSING)appId%!D(MISSING)test-20210831-master-sq&timeout=30s": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2021/09/08 06:57:04 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found 2021/09/08 07:01:20 [ERROR] PipelineExecutionController p-gn54t/p-qp9qq-1 [pipeline-execution-controller] failed with : pipeline.project.cattle.io "p-gn54t/p-qp9qq" not found ===== 不小心砍了 pipeline 的 jenlins POD ===== * 假設以下的 jenkins POD 不見了! PIPELINE 就無法啟動運行 ~$ kubectl get namespace | grep pipeline cattle-pipeline Active 66d p-gn54t-pipeline Active 66d ~$ kubectl get pod -n p-gn54t-pipeline NAME READY STATUS RESTARTS AGE docker-registry-57fbddc6cc-drt29 1/1 Running 4 66d jenkins-75cf8d9966-m2vc8 1/1 Running 0 168m minio-7b7866c65f-7hpl5 1/1 Running 0 167m * 只要將 pipeline 這個 namespace Exp. p-gn54t-pipeline 刪除, 就會自動建立回來 * 參考 - https://github.com/rancher/rancher/issues/18779 ===== Rancher 異常無法啟動重新安裝 ===== * 環境 : rke / helm 安裝的 rancher * 透過 helm uninstall 後, 再執行 helm install 後依然無法正常啟動 * 參考這篇**[[https://www.cnblogs.com/37yan/p/14275214.html|乾淨移除 Rancher]]**與這篇**[[https://rancher.com/blog/2018/2018-07-09-rancher-management-plane-architecture/|Rancher 中的 CRD]]**說明後, 依照以下的處理方式就能解決 - 刪除 crd 的 dynamicschemas.management.cattle.io - 刪除 cert-manager 和 cattle-system namespace - 重新安裝 rancher ===== 修改 Rancher server url 的方式 ===== * 參考 - https://gist.github.com/janeczku/d3b9eed3b1dee7863b66fba3367a1bd4 * 進入 rancher 進階設定頁面 https:///g/settings/advanced * 找到 server-url 進行編輯 {{tag>rancher}}