Skip to content

ESXi安装k8s集群

最近ESXi 虚拟机安装完毕,开始折腾k8s集群。

k8s 集群拓扑

我先安装了Debian 11(bullseye),但是里面的软件比如containerd都很老了,只能跑1.4.3没办法设定指定镜像,换成Debian 12和Ubuntu 22.04都有网络没办法互通的问题……怎么改iptables都不行,只好用Fedora 36。这下才终于成功了,下面记录一下我遇到的坑。

首先对某墙只想唱:“听我说,谢谢你”,大家安装过程可以用阿里云的镜像。官方文档https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ 其实已经非常全面和详细了,不要漏了任何一步,我就漏了加载overlay 模块导致的集群起不来的尴尬……

给路由器配置了自动DHCP,这样局域网里可以直接用fed-k8s-master这种域名访问了。

Fedora登入后第一步就是设置机器名称

hostnamectl set-hostname fed-k8s-master

然后是系统配置,比如必要的内核模块,sysctl一些东西,fedora还有个问题,就是要关掉防火墙

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system
sudo systemctl stop firewalld && sudo systemctl disable firewalld
sudo dnf remove zram-generator-defaults

# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

接着配置k8s fedora源,记得把里面的proxy换成你科学上网用的(用aliyun的也问题不大,就是我不太喜欢……)

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
proxy=<你的代理地址>
EOF

sudo yum install -y kubelet kubeadm kubectl containerd iproute-tc --disableexcludes=kubernetes
sudo systemctl enable --now kubelet containerd

接下来就是配置containerd,这里有3个坑:

  • 一个是cgroup driver要配置成systemd
  • runtime 的type 得手动配置
  • 还有就是坑爹的sandbox镜像源(可以理解成占位)

最后使用的配置文件就是:/etc/containerd/config.toml

version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/usr/libexec/cni/"
      conf_dir = "/etc/cni/net.d"
  [plugins."io.containerd.internal.v1.opt"]
    path = "/var/lib/containerd/opt"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

开始初始化master control-plane(控制面主机),10.0.0.39 改成你自己的主机ip

kubeadm init --apiserver-advertise-address 10.0.0.39 --image-repository registry.aliyuncs.com/google_containers

执行成功会返回一个token

kubeadm join 10.0.0.39:6443 --token xlstub.uvzof5s4mz504ev1 \
--discovery-token-ca-cert-hash sha256:9cc767e5d1e2f2707d4e1f5a1270c569aeca3a49185aa591a9d2142f8d352198

先拷下来,然后不管他,因为我们还需要CNI(container network interface),简单来说就是容器间互相访问的网络, 我这里选了flannel,下载flanneld二进制,并应用配置

mkdir -p /opt/bin && wget https://github.com/flannel-io/flannel/releases/download/v0.19.0/flanneld-amd64 -O /opt/bin/flanneld
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

这里也有个坑:不管是coredns还是flanneld都一直起不来。其实就是k8s的一个bug……效果是

[root@fed-k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE      NAME                                     READY   STATUS              RESTARTS        AGE
kube-flannel   kube-flannel-ds-95sv7                    0/1     CrashLoopBackOff    7 (3m52s ago)   15m
kube-flannel   kube-flannel-ds-qjjcn                    0/1     CrashLoopBackOff    7 (3m36s ago)   15m
kube-system    coredns-74586cf9b6-fkd5x                 0/1     ContainerCreating   0               79m
kube-system    coredns-74586cf9b6-ktvk7                 0/1     ContainerCreating   0               79m
kube-system    etcd-fed-k8s-master                      1/1     Running             0               79m
kube-system    kube-apiserver-fed-k8s-master            1/1     Running             0               79m
kube-system    kube-controller-manager-fed-k8s-master   1/1     Running             0               79m
kube-system    kube-proxy-cl2l6                         1/1     Running             0               79m
kube-system    kube-proxy-llln9                         1/1     Running             0               30m
kube-system    kube-scheduler-fed-k8s-master            1/1     Running             0               79m

要是crictl logs 看为啥起不来,就会发现:

I0721 03:54:55.082767       1 match.go:206] Determining IP address of default interface
I0721 03:54:55.083292       1 match.go:259] Using interface with name ens192 and address 10.0.0.39
I0721 03:54:55.083350       1 match.go:281] Defaulting external address to interface address (10.0.0.39)
I0721 03:54:55.083485       1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0721 03:54:55.084170       1 main.go:330] Error registering network: failed to acquire lease: node "fed-k8s-master" pod
 cidr not assigned
I0721 03:54:55.084272       1 main.go:447] Stopping shutdownHandler...
W0721 03:54:55.084438       1 reflector.go:436] github.com/flannel-io/flannel/subnet/kube/kube.go:403: watch of *v1.Node
 ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented
the request from succeeding

这是个k8s的bug !,需要手动调整/etc/kubernetes/manifests/kube-controller-manager.yaml 这个文件,加上这两个选项……

--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16

重启kubelet,master就配置好了。

在另一台一样配置好的fedora机器上,执行刚才的join命令,这样,一个简单的集群就配置好了

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.