S_lion's Studio

docker cgroups driver设置为systemd报错

字数统计: 1.2k阅读时长: 6 min
2023/02/02 Share

kubernetes官方建议将cgroup驱动设置为systemd来增强系统稳定性。

当 systemd 是初始化系统时, 不推荐使用 cgroupfs 驱动,因为 systemd 期望系统上只有一个 cgroup 管理器。 此外,如果使用 cgroup v2, 则应用 systemd cgroup 驱动取代 cgroupfs。

操作系统信息如下:

1
2
3
4
5
6
7
8
9
10
# cat /etc/os-release 
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"

# uname -r
4.19.90-23.8.v2101.ky10.aarch64

docker版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# docker version
Client: Docker Engine - Community
Version: 19.03.4
API version: 1.40
Go version: go1.12.10
Git commit: 9013bf5
Built: Fir Oct 18 15:52:52 2019
OS/Arch: linux/arm64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf5
Built: MFir Oct 18 15:51:29 2019
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.3.7
GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version: 1.0.0-rc3
GitCommit: fabf83fd21f205c801571df4074024179eb03b44
docker-init:
Version: 0.18.0
GitCommit: fec3683

docker info:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# docker info
Client:
Debug Mode: false

Server:
Containers: 25
Running: 0
Paused: 0
Stopped: 25
Images: 22
Server Version: 19.03.4
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version: fabf83fd21f205c801571df4074024179eb03b44
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.90-23.8.v2101.ky10.aarch64
Operating System: Kylin Linux Advanced Server V10 (Tercel)
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.647GiB
Name: sdb01
ID: GOQ4:YEDR:YKEZ:UT6V:TBSF:FAWG:H5HJ:RM74:BY6F:Q77D:CTGW:WDHR
Docker Root Dir: /data/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
192.9.42.51:9980
acaas-registry.agree:9980
127.0.0.0/8
Registry Mirrors:
https://acaas-registry.agree:9980/
Live Restore Enabled: false
Product License: Community Engine

docker

在鲲鹏服务器上部署的docker,修改Cgroup Driver为systemd后无法启动容器,报错如下:

1
2
3
4
5
6
7
8
9
# cat /etc/docker/daemon.json 
{
...
"exec-opts": ["native.cgroupdriver=systemd"],
...
}

# docker run -itd --name mynginx nginx:latest
docker: Error response from daemon: OCI runtime create failed: systemd cgroup flag passed, but systemd support for managing cgroups is not available: unknown.

查阅相关文档,该报错是runc返回的错误

1
2
3
4
5
6
7
8
9
10
// We default to cgroupfs, and can only use systemd if the system is a
// systemd box.
cgroupManager := libcontainer.Cgroupfs
if context.GlobalBool("systemd-cgroup") {
if systemd.UseSystemd() {
cgroupManager = libcontainer.SystemdCgroups
} else {
return nil, fmt.Errorf("systemd cgroup flag passed, but systemd support for managing cgroups is not available")
}
}

查看默认安装的runc版本

1
2
3
4
# runc --version
runc version 1.0.0-rc3
commit: fabf83fd21f205c801571df4074024179eb03b44
spec: 1.0.0-rc5

该版本比较低,查看github上runc的version release,runc在后续的几个版本中都在优化对systemd与cgroupfs的支持,替换为高版本的runc。

1
2
3
4
# runc --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev

重启docker后容器可正常运行。

1
2
# systemctl daemon-reload
# systemctl restart docker

kubelet

kubelet对应的cgroup-driver修改为systemd。

1
2
# cat /etc/sysconfig/kubelet 
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"

重启kubelet后发现pod无法启动,报错如下:

1
2
# kubectl describe pod -n default nginx-c21d979d6-dnwgv
OCI message: "process_linux.go:264: applying cgroup configuration for process caused \"No such device or address\""

是由于当对存在正在运行的pod节点进行cgroup驱动变更时,是不支持的。
知道这一点后,后续的kubelet的变更操作改为了
01. 先移除该节点的pod
02. 记录该节点label,delete pod
03. reset pod
04. cgroup-driver变更,重启kubelet
05. join node,恢复原始label

高可用master节点的cgroup驱动变更操作为:
01. vip切换
02. 删除etcd中该节点的信息

1
2
3
4
5
# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key  /etc/kubernetes/pki/etcd/server.key endpoint status --cluster -w table

# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list

# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove xxxx
  1. 记录节点label,delete node
  2. reset node
  3. docker、kubelet变更cgroup- driver,重启服务
  4. 拷贝ca证书
    1
    2
    3
    4
    5
    6
    7
    8
    9
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/ca.crt k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/ca.key k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/sa.key k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/sa.pub k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/front-proxy-ca.key k8s-master2:/etc/kubernetes/pki/
    [root@k8s-master2 ~]# mkdir -p /etc/kubernetes/pki/etcd/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/etcd/ca.crt k8s-master2:/etc/kubernetes/pki/etcd/
    [root@k8s-master1 ~]# scp /etc/kubernetes/pki/etcd/ca.key k8s-master2:/etc/kubernetes/pki/etcd/
  5. join master node,恢复节点label
    1
    [root@k8s-master2 ~]# kubeadm join xxx.xxx.xxx.xxx:8443 --token dsak.xasdkadasoqe2     --discovery-token-ca-cert-hash sha256:kdkasdksakdqlwelqkfeacd1efa36a9c5c71a897517d8fb6f6c9db8ee314  --control-plane
  6. 启动vip

参考文档

https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#cgroup-drivers

https://github.com/moby/moby/issues/38753

https://github.com/kubernetes/kubernetes/issues/114539

https://github.com/cri-o/cri-o/issues/832

https://github.com/kubernetes/kubernetes/issues/98006

https://blog.csdn.net/qq_15138049/article/details/122231353

https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/#%E8%BF%81%E7%A7%BB%E5%88%B0-systemd-%E9%A9%B1%E5%8A%A8

CATALOG
  1. 1. docker
  2. 2. kubelet
  3. 3. 参考文档