迁移kubelet、docker和containerd工作目录

news/2024/7/4 7:43:50 标签: kubelet, docker, 云原生

文章目录

  • 问题背景
  • 迁移
    • Docker
      • 停止 Docker 服务
      • 修改配置
      • 移动文件
      • 重新启动 Docker 服务
    • containerd
      • 停止服务
      • 修改配置
      • 移动文件
      • 重新启动服务
    • kubelet(遇到问题待解决)
      • 停止服务
      • 修改配置
      • 移动文件(遇到问题待解决)
      • 重新启动服务
  • 使用的版本

问题背景

kubeletdocker和containerd 的工作目录默认都在 /var/lib 下。
但是我们学校实验室租的线上机器挂载在 / 的磁盘空间很小,挂载在 /mnt/data_mnt/ 的数据盘空间大。
应该是因为工作目录的原因,当 /占用超过 80% 时, kubelet 会认为磁盘空间不足,因为 DiskPressure 而进入 NotReady 状态。

(以下是迁移后)

root@iZhp3hqett0mw795req5b2Z:~# df -h | head
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs            16G   19M   16G   1% /run
/dev/vda1        99G   48G   46G  51% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vdb1       493G  120G  348G  26% /mnt/data_mnt
overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/54a47bbff1442f521326770cab94eb3221d82b0ff9e997c1b2efe6cad811b21b/merged
overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/a74d553e701c85c5ad25fd14a8fd30383e0dc21f4b567bc81e6b7ac74bc73524/merged

迁移

Docker

停止 Docker 服务

删除所有容器后。

systemctl stop docker

修改配置

Docker配置文件在 /etc/docker/daemon.json,增加字段设置数据目录。

参考官网文档 https://docs.docker.com/config/daemon/#daemon-data-directory

修改后示例:

{
 "registry-mirrors": [
     "https://dockerhub.azk8s.cn",
     "https://hub-mirror.c.163.com",
     "https://reg-mirror.qiniu.com"
 ],

   "builder": {
       "gc": {
         "defaultKeepStorage": "20GB",
         "enabled": true
       }
   },
   "experimental": true,
   "features": {
     "buildkit": false
   },
   "dns": ["8.8.8.8", "8.8.4.4"],
   "data-root": "/mnt/data_mnt/var/lib/docker"
}

移动文件

/var/lib/docker 复制到 /mnt/data_mnt/var/lib/docker

重新启动 Docker 服务

systemctl start docker

# 跑一个 nginx 看看
docker run -p 80:80 nginx

# 查看服务状态
systemctl status dockerdocker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2023-10-17 22:28:47 CST; 12h ago
     Docs: https://docs.docker.com
 Main PID: 3917580 (dockerd)
    Tasks: 25
   Memory: 1.0G
      CPU: 1min 16.247s
   CGroup: /system.slice/docker.service
           ├─ 370428 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 5050 -container-ip 172.17.0.2 -container-port 5000
           └─3917580 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Oct 18 10:11:41 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:11:41.715286425+08:00" level=error msg="Handler for POST /v1.41/containers/f66c7e907176ccd2abe010253448ab6dcab286c60f893b4cde72184215747d90/start returned error: driver 
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.451142888+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5000/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455921606+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455953643+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455930600+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456010183+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456058582+08:00" level=info msg="Attempting next endpoint for push after error: no basic auth credentials"
Oct 18 10:18:56 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:18:56.354196507+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:02 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:02.439702060+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:07 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:07.267669420+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client

containerd

停止服务

systemctl stop containerd

修改配置

配置文件在 /etc/containerd/config.toml

可以看到root = "/mnt/data_mnt/var/lib/containerd",可见工作目录默认在 /var/lib/containerd

万一不小心改乱了,可以重新生成默认配置:

containerd config default > /etc/containerd/config.toml

修改后例如:

version = 2
root = "/mnt/data_mnt/var/lib/containerd"
state = "/run/containerd"
oom_score = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  uid = 0
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = "/run/containerd/containerd-debug.sock"
  uid = 0
  gid = 0
  level = "warn"

[timeouts]
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "sealos.hub:5000/pause:3.9"
    max_container_log_line_size = -1
    max_concurrent_downloads = 20
    disable_apparmor = false
    [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "overlayfs"
      default_runtime_name = "runc"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
          [plugins."io.containerd.grpc.v1.cri".registry.configs."sealos.hub:5000".auth]
            username = "admin"
            password = "passw0rd"

移动文件

/mnt/data_mnt/var/lib/containerd 复制到 /var/lib/containerd

重新启动服务

systemctl start containerd

systemctl status containerd

kubelet_180">kubelet(遇到问题待解决)

停止服务

systemctl stop kubelet

修改配置

kubelet 服务的配置,我的配置在 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

注意同一目录可能还有个文件 /etc/systemd/system/kubelet.service.d/override.conf 实际运行中会用 override.conf 覆盖 10-kubeadm.conf 的内容。

修改后内容示例:

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/mnt/data_mnt/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/mnt/data_mnt/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
Environment="KUBELET_EXTRA_ARGS= \
               \
               \
              --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --image-service-endpoint=unix:///var/run/image-cri-shim.sock"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

另外还要修改 /etc/kubernetes/kubelet.conf 中配置的密钥地址,修改后示例(部分)

# 以上省略
users:
- name: system:node:izhp3hqett0mw795req5b2z
  user:
    client-certificate: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem

另外还要建软连接,因为读取密钥时,是通过名为“当前”的软连接找实际特定版本的密钥,移动后就乱套了。

ln -s  kubelet-client-2023-10-07-11-14-02.pem kubelet-client-current.pem 

移动文件(遇到问题待解决)

有些文件删除不了……

root@iZhp3hqett0mw795req5b2Z:~# rm -rf /var/lib/kubelet
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~projected/kube-api-access-6jt8n': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~empty-dir/tmp-volume': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/54e7cb22-fdab-4e33-afb3-c8ba88d153a2/volumes/kubernetes.io~projected/kube-api-access-j84xs': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/d1a3fba3-3ab8-4ef9-b61c-6479b26c79f7/volumes/kubernetes.io~projected/kube-api-access-lf5tx': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/5e38f3a0-7f59-4d2e-98f4-1ec915e6ba89/volumes/kubernetes.io~projected/kube-api-access-prz4v': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/0f02517c-01c3-4b58-9f85-be169a92a31d/volumes/kubernetes.io~projected/kube-api-access-r4kxp': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/7098d438-0a9d-40df-aee1-ec4884ba262f/volumes/kubernetes.io~projected/kube-api-access-rqtwq': Device or resource busy

重新启动服务

systemctl start kubelet

systemctl status kubelet

使用的版本

日期:2023年10月18日

版本

root@iZhp3hqett0mw795req5b2Z:~# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

root@iZhp3hqett0mw795req5b2Z:~# docker version
Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.1
 Git commit:        20.10.21-0ubuntu1~18.04.3
 Built:             Thu Apr 27 05:50:21 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.1
  Git commit:       20.10.21-0ubuntu1~18.04.3
  Built:            Thu Apr 27 05:36:22 2023
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.12-0ubuntu1~18.04.1
  GitCommit:        
 runc:
  Version:          1.1.4-0ubuntu1~18.04.2
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

root@iZhp3hqett0mw795req5b2Z:~# containerd --version
containerd github.com/containerd/containerd 1.6.12-0ubuntu1~18.04.1 

http://www.niftyadmin.cn/n/5102232.html

相关文章

Flutter之Widget生命周期

目录 初始化构造函数initStatedidChangeDependencies 运行时builddidUpdateWidget 组件移除deactivatedisposereassemble 函数生命周期说明:实际场景App生命周期 前言:生命周期是一个组件加载到卸载的整个周期,熟悉生命周期可以让我们在合适的…

香港服务器的速度为什么比较快

租用过海外服务器的用户的都知道,在这么多免备案的服务器产品中,租用香港服务器的速度是最快的,对于身在国内的网站 运营者或者企业租用香港服务器搭建网站,针对大陆用户不仅仅体验是最好的,其次也方便网站的管理者对于…

Python--随机出拳(random)--if判断--综合案例练习:石头剪刀布

注:涉及相关链接: Python:if判断--综合案例练习:石头剪刀布-CSDN博客 Python语言非常的强大,强大之处就在于其拥有很多模块(module),这些模块中拥有很多别人已经开发好的代码&…

SEO业务适合什么代理IP?2023海外代理IP推荐排名

随着数字营销趋势的变化,搜索引擎优化仍然是企业在网络世界中努力繁荣的重要组成部分。为了实现 SEO 成功,从搜索引擎获取准确且多样化的数据至关重要,然而可能会受到诸如基于位置的限制和被检测风险等限制的阻碍,IP代理则可以帮助…

微服务负载均衡实践

概述 本文介绍微服务的服务调用和负载均衡,使用spring cloud的loadbalancer及openfeign两种技术来实现。 本文的操作是在微服务的初步使用的基础上进行。 环境说明 jdk1.8 maven3.6.3 mysql8 spring cloud2021.0.8 spring boot2.7.12 idea2022 步骤 改造Eu…

Python pip 替换国内镜像源

pip它还有一个非常好的特点,当你安装一个库的时候,它会自动帮你安装所有这个库的依赖库。完全一键式操作。非常方便。但是由于pipy网站是国外网站,很容易会被墙,导致经常下载速度非常慢,经常超时。 解决办法&#xff…

数据抓取代码示例

以下是一个使用lua-http和Lua编写的爬虫程序,用于爬取内容。此程序使用了https://www.duoip.cn/get_proxy的代码。 -- 引入lua-http库 local http require "http" ​ -- 定义get\_proxy函数 local function get_proxy()-- 使用https://www.duoip.cn/get…