最近ESXi 虚拟机安装完毕,开始折腾k8s集群。
k8s 集群拓扑
我先安装了Debian 11(bullseye),但是里面的软件比如containerd都很老了,只能跑1.4.3没办法设定指定镜像,换成Debian 12和Ubuntu 22.04都有网络没办法互通的问题……怎么改iptables都不行,只好用Fedora 36。这下才终于成功了,下面记录一下我遇到的坑。
首先对某墙只想唱:“听我说,谢谢你” ,大家安装过程可以用阿里云的镜像。官方文档https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ 其实已经非常全面和详细了,不要漏了任何一步,我就漏了加载overlay 模块导致的集群起不来的尴尬……
给路由器配置了自动DHCP,这样局域网里可以直接用fed-k8s-master
这种域名访问了。
Fedora登入后第一步就是设置机器名称
hostnamectl set-hostname fed-k8s-master
然后是系统配置,比如必要的内核模块,sysctl一些东西,fedora还有个问题,就是要关掉防火墙
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
sudo systemctl stop firewalld && sudo systemctl disable firewalld
sudo dnf remove zram-generator-defaults
# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
接着配置k8s fedora源,记得把里面的proxy换成你科学上网用的(用aliyun的也问题不大,就是我不太喜欢……)
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
proxy=<你的代理地址>
EOF
sudo yum install -y kubelet kubeadm kubectl containerd iproute-tc --disableexcludes=kubernetes
sudo systemctl enable --now kubelet containerd
接下来就是配置containerd,这里有3个坑:
一个是cgroup driver要配置成systemd runtime 的type 得手动配置 还有就是坑爹的sandbox镜像源(可以理解成占位)
最后使用的配置文件就是:/etc/containerd/config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/usr/libexec/cni/"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.internal.v1.opt"]
path = "/var/lib/containerd/opt"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
开始初始化master control-plane(控制面主机),10.0.0.39 改成你自己的主机ip
kubeadm init --apiserver-advertise-address 10.0.0.39 --image-repository registry.aliyuncs.com/google_containers
执行成功会返回一个token
kubeadm join 10.0.0.39:6443 --token xlstub.uvzof5s4mz504ev1 \
--discovery-token-ca-cert-hash sha256:9cc767e5d1e2f2707d4e1f5a1270c569aeca3a49185aa591a9d2142f8d352198
先拷下来,然后不管他,因为我们还需要CNI(container network interface),简单来说就是容器间互相访问的网络, 我这里选了flannel,下载flanneld二进制,并应用配置
mkdir -p /opt/bin && wget https://github.com/flannel-io/flannel/releases/download/v0.19.0/flanneld-amd64 -O /opt/bin/flanneld
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
这里也有个坑:不管是coredns还是flanneld都一直起不来。其实就是k8s的一个bug……效果是
[root@fed-k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-95sv7 0/1 CrashLoopBackOff 7 (3m52s ago) 15m
kube-flannel kube-flannel-ds-qjjcn 0/1 CrashLoopBackOff 7 (3m36s ago) 15m
kube-system coredns-74586cf9b6-fkd5x 0/1 ContainerCreating 0 79m
kube-system coredns-74586cf9b6-ktvk7 0/1 ContainerCreating 0 79m
kube-system etcd-fed-k8s-master 1/1 Running 0 79m
kube-system kube-apiserver-fed-k8s-master 1/1 Running 0 79m
kube-system kube-controller-manager-fed-k8s-master 1/1 Running 0 79m
kube-system kube-proxy-cl2l6 1/1 Running 0 79m
kube-system kube-proxy-llln9 1/1 Running 0 30m
kube-system kube-scheduler-fed-k8s-master 1/1 Running 0 79m
要是crictl logs 看为啥起不来,就会发现:
I0721 03:54:55.082767 1 match.go:206] Determining IP address of default interface
I0721 03:54:55.083292 1 match.go:259] Using interface with name ens192 and address 10.0.0.39
I0721 03:54:55.083350 1 match.go:281] Defaulting external address to interface address (10.0.0.39)
I0721 03:54:55.083485 1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0721 03:54:55.084170 1 main.go:330] Error registering network: failed to acquire lease: node "fed-k8s-master" pod
cidr not assigned
I0721 03:54:55.084272 1 main.go:447] Stopping shutdownHandler...
W0721 03:54:55.084438 1 reflector.go:436] github.com/flannel-io/flannel/subnet/kube/kube.go:403: watch of *v1.Node
ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented
the request from succeeding
这是个k8s的bug ! ,需要手动调整/etc/kubernetes/manifests/kube-controller-manager.yaml
这个文件,加上这两个选项……
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
重启kubelet,master就配置好了。
在另一台一样配置好的fedora机器上,执行刚才的join命令,这样,一个简单的集群就配置好了