Kubespray实现生产环境一键部署k8s v1.25.6集群

系列 - K8s系列

什么是kubespray

kubespray(之前称为kube-spray)是一个用Ansible管理的工具,可以帮助部署Kubernetes集群。它支持多节点、高可用性配置,适用于生产环境。

  • 支持部署 AWS, GCE, Azure, OpenStack, vSphere, Equinix Metal (裸机), Oracle Cloud Infrastructure (实验性), 或者裸机等云厂商或者私有云k8s集群。
  • 支持高可用集群配置
  • 支持可选择组合插件配置(例如,选择网络插件)
  • 支持目前最流行的Linux各类分支操作系统
  • 持续集成测试

优点:支持多种基础设施平台。使用Ansible进行部署,配置灵活。 缺点:对于不熟悉Ansible的用户来说,需要先要学习ansible,可以大胆的告诉大家ansible简单。

Kubernetes部署实践案例

本文使用kubespray v2.21.0 版本,用于部署 k8s v1.25.6 版本。

本人采用虚拟机进行模拟实验,采用两台master和node1进行部署集群。另一台作为console ansible管理部署主机

主机名 IP 配置信息 备注
console 192.168.19.199 CPU: 1C 内存: 2GB 磁盘: 50GB 控制机不用升级
k8s-master 192.168.19.130 CPU: 1C 内存: 2GB 磁盘: 50GB 内核kernel 6.8.1
k8s-node1 192.168.19.130 CPU: 1C 内存: 2GB 磁盘: 50GB 内核kernel 6.8.1
  • (k8s-master,k8s-node1)修改主机名称,命令如下:
1
# hostnamectl set-hostname k8s-master # 列出其中一台,其他类似修改
  • 同步3台机器时间,如果不同步好,安装完成后kubectl访问时出现证书过期
1
# ntpdate ntp.aliyun.com
  • 关闭swap虚拟内存
1
# sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab && sudo swapoff -a
  • 关闭防护墙
1
# sudo systemctl stop firewalld && sudo systemctl disable firewalld
  • k8s-master和k8s-node1主机内核版本升级,本人升级到 kernel 6.8.1 版本

Linux内核升级

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-ml-6.8.1-1.el7.elrepo.x86_64.rpm --no-check-certificate
# wget https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-ml-devel-6.8.1-1.el7.elrepo.x86_64.rpm --no-check-certificate
# rpm -ivh kernel-ml-6.8.1-1.el7.elrepo.x86_64.rpm
# rpm -ivh kernel-ml-devel-6.8.1-1.el7.elrepo.x86_64.rpm
# yum install perl # 安装 kernel-ml-devel时可能会出现缺少perl语言,直接yum安装
# cat /etc/grub2.cfg # 查看引导
# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg # 看看新安装的内核版本在引导中的位置,一般在 0
# grub2-set-default 0 # 设置引导为最新的内核
# grub2-mkconfig -o /boot/grub2/grub.cfg # 生成配置文件
# reboot # 升级内核需要重启后生效

内核优化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720

net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384

net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.lo.disable_ipv6 = 0
net.ipv6.conf.all.forwarding = 1
EOF

# sysctl --system 生效

console控制机与k8s两台主机ssh互信

1
2
3
# ssh-keygen  # console控制机
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.19.130 # copy公钥
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.19.131 # copy公钥
  • 在console部署控制机环境准备

由于ansible需要依赖python环境才能运行,因此需要安装python环境,我采用的python是3.10.14版本,官方推荐python 3.6以上,这个3.10.14版本里面依赖openssl需要高版本,因此也要安装一系列包,操作如下:

1
2
3
4
5
6
# wget https://www.openssl.org/source/openssl-1.1.1q.tar.gz --no-check-certificate
# tar zxf openssl-1.1.1q.tar.gz
# cd openssl-1.1.1q
# ./config --prefix=/usr/local/openssl-1.1.1 # 配置位置
# make && sudo make install # 编译安装
# ll /usr/local/openssl-1.1.1/ # 查看位置

python 3.10.4安装

1
2
3
4
# wget https://www.python.org/ftp/python/3.10.14/Python-3.10.14.tgz
#  ./configure --enable-optimizations --with-openssl=/usr/local/openssl-1.1.1 --with-openssl-rpath=auto
# make && make install
# ln -s /usr/local/bin/python3.10 /usr/bin/python3 # 软连接生成

使用Ansible 操作kubespray 项目里面的playbook 剧本进行一键部署

  • 下载对应版本:kubespray v2.21.0
1
# wget https://github.com/kubernetes-sigs/kubespray/archive/refs/tags/v2.21.0.tar.gz
  • 解压压缩包
1
2
# tar -xvf v2.21.0.tar.gz
# mv kubespray-v2.21.0  kubespray
  • 创建python虚拟环境
1
2
# python3 -m venv venv
# source venv/bin/activate
  • 安装kubespray 依赖包
1
# pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
  • 修改kubespray配置信息
1
2
3
4
5
6
#  cd kubespray
#  cp -rfp inventory/sample inventory/mycluster
#  vim inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml 
kube_network_plugin: cilium # 选择网络插件,支持 cilium, calico, weave 和 flannel
kube_service_addresses: 10.233.0.0/18 # 设置 Service 网段
kube_pods_subnet: 10.233.64.0/18 # 设置 Pod 网段

相关配置文件:

1
2
3
4
. inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
. inventory/mycluster/group_vars/all/containerd.yml
. inventory/mycluster/group_vars/all/cri-o.yml
. inventory/mycluster/group_vars/all/docker.yml
1
2
3
4
5
6
7
# cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

# 支持 docker, crio 和 containerd,推荐 containerd.
container_manager: containerd

# 是否开启 kata containers
kata_containers_enabled: false

修改容器数据目录

1
2
# vim ./inventory/mycluster/group_vars/all/containerd.yml
containerd_storage_dir: "/data/containerd"

配置容器registry

1
2
3
4
5
6
# vim ./inventory/mycluster/group_vars/all/containerd.yml

containerd_registries:
    "docker.io":
    - "http://hub-mirror.c.163.com"
    - "https://mirror.aliyuncs.com"

centos7需启用containerd_snapshotter: “native"不然kubelet报错启动不了

1
2
3
4
# sed -i 's@# containerd_snapshotter: "native"@containerd_snapshotter: "native"@g' inventory/mycluster/group_vars/all/containerd.yml
# 修改后
# cat inventory/mycluster/group_vars/all/containerd.yml
containerd_snapshotter: "native"

修改etcd

1
2
3
# vim inventory/mycluster/group_vars/all/etcd.yml

etcd_data_dir: /data/etcd

集群证书 默认一年有效期

1
2
3
4
5
# sed -i 's@auto_renew_certificates: false@auto_renew_certificates: true@g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 修改后
# cat inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# 是否开启自动更新证书,推荐开启。
auto_renew_certificates: true

打开日志排错

1
2
3
# vim inventory/mycluster/group_vars/all/all.yml

unsafe_show_logs: true

使用外部负载器 默认没有对kube-apiserver https做负载高可用 这里使用外部haproxy负载kube-apiserver https

例haproxy配置

1
2
3
4
5
6
7
8
9
listen kubernetes-apiserver-https
  bind 0.0.0.0:6443
  mode tcp
  option tcplog
  option log-health-checks
  balance roundrobin
  timeout client 3h
  timeout server 3h
  server k8s-master 192.168.19.130:6443 check check-ssl verify none inter 10000

定义了loadbalancer_apiserver会自动关闭loadbalancer_apiserver_localhost

1
2
3
4
5
# vim ./inventory/mycluster/group_vars/all/all.yml
apiserver_loadbalancer_domain_name: "apiserver.magic.com"
loadbalancer_apiserver:
  address: 192.168.19.200
  port: 8443

配置主机列表

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# vim inventory/mycluster/inventory.ini

[all]
master ansible_host=192.168.19.130  # ip=10.3.0.1 etcd_member_name=etcd1
node1 ansible_host=192.168.19.131  # ip=10.3.

[kube_control_plane]
master

[etcd]
master

[kube_node]
node1

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node

国内安装,推荐使用daocloud国内源,在线安装(通互联网即可)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 备份
cp inventory/mycluster/group_vars/all/offline.yml{,.bak}
# 修改files_repo
sed -i 's@^# files_repo: .*@files_repo: "https://files.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 修改registry_repo
sed -i 's@^# kube_image_repo: .*@kube_image_repo: "k8s.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# gcr_image_repo: .*@gcr_image_repo: "gcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# github_image_repo: .*@github_image_repo: "ghcr.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# docker_image_repo: .*@docker_image_repo: "docker.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
sed -i 's@^# quay_image_repo: .*@quay_image_repo: "quay.m.daocloud.io"@g' inventory/mycluster/group_vars/all/offline.yml
# 取消注释 启用files_repo和registry_host
sed -i -E '/# .*\{\{ files_repo/s/^# //g' inventory/mycluster/group_vars/all/offline.yml
sed -i -E '/# .*\{\{ registry_host/s/^# //g' inventory/mycluster/group_vars/all/offline.yml

这个是已经修改好的,直接复制下面内容到offline.yml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# cat inventory/mycluster/group_vars/all/offline.yml

files_repo: "https://files.m.daocloud.io"

## Container Registry overrides
kube_image_repo: "k8s.m.daocloud.io" 
gcr_image_repo: "gcr.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"

## Kubernetes components
kubeadm_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"
kubectl_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl"
kubelet_download_url: "{{ files_repo }}/storage.googleapis.com/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet"

## CNI Plugins
cni_download_url: "{{ files_repo }}/github.com/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"

## cri-tools
crictl_download_url: "{{ files_repo }}/github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"

## [Optional] etcd: only if you **DON'T** use etcd_deployment=host
etcd_download_url: "{{ files_repo }}/github.com/etcd-io/etcd/releases/download/{{ etcd_version }}/etcd-{{ etcd_version }}-linux-{{ image_arch }}.tar.gz"

# [Optional] Calico: If using Calico network plugin
calicoctl_download_url: "{{ files_repo }}/github.com/projectcalico/calico/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
calicoctl_alternate_download_url: "{{ files_repo }}/github.com/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
# [Optional] Calico with kdd: If using Calico network plugin with kdd datastore
calico_crds_download_url: "{{ files_repo }}/github.com/projectcalico/calico/archive/{{ calico_version }}.tar.gz"

# [Optional] Cilium: If using Cilium network plugin
ciliumcli_download_url: "{{ files_repo }}/github.com/cilium/cilium-cli/releases/download/{{ cilium_cli_version }}/cilium-linux-{{ image_arch }}.tar.gz"

# [Optional] Flannel: If using Falnnel network plugin
flannel_cni_download_url: "{{ files_repo }}/kubernetes/flannel/{{ flannel_cni_version }}/flannel-{{ image_arch }}"

# [Optional] helm: only if you set helm_enabled: true
helm_download_url: "{{ files_repo }}/get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz"

# [Optional] crun: only if you set crun_enabled: true
crun_download_url: "{{ files_repo }}/github.com/containers/crun/releases/download/{{ crun_version }}/crun-{{ crun_version }}-linux-{{ image_arch }}"

# [Optional] kata: only if you set kata_containers_enabled: true
kata_containers_download_url: "{{ files_repo }}/github.com/kata-containers/kata-containers/releases/download/{{ kata_containers_version }}/kata-static-{{ kata_containers_version }}-{{ ansible_architecture }}.tar.xz"

# [Optional] cri-dockerd: only if you set container_manager: docker
cri_dockerd_download_url: "{{ files_repo }}/github.com/Mirantis/cri-dockerd/releases/download/v{{ cri_dockerd_version }}/cri-dockerd-{{ cri_dockerd_version }}.{{ image_arch }}.tgz"

# [Optional] cri-o: only if you set container_manager: crio
# crio_download_base: "download.opensuse.org/repositories/devel:kubic:libcontainers:stable"
# crio_download_crio: "http://{{ crio_download_base }}:/cri-o:/"

# [Optional] runc,containerd: only if you set container_runtime: containerd
runc_download_url: "{{ files_repo }}/github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}"
containerd_download_url: "{{ files_repo }}/github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz"
nerdctl_download_url: "{{ files_repo }}/github.com/containerd/nerdctl/releases/download/v{{ nerdctl_version }}/nerdctl-{{ nerdctl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"
  • 集群部署操作

上面的步骤是我通过全新环境安装过程中遇到的问题及处理方式,就是为了下面一键安装成功而配置的。看似繁琐,实则可以直接复制粘贴。

部署命令(一键安装)成功

1
# ansible-playbook -i inventory/mycluster/inventory.ini    --user=root -b -v cluster.yml # 我使用root用户安装的如果有其他的可以

/kubespray/ansbile9.jpg

console控制机获取kubeconfig信息

1
2
# ansible -i /data/k8s/kubespray/inventory/mycluster/inventory.ini master  -m fetch  -a 'src=/root/.kube/config dest=kubeconfig flat=yes' -b
# vim kubeconfig # 将IP地址换成k8s master地址 192.168.19.130

接下来就可以使用kubectl获取容器信息了

1
2
3
4
[root@console k8s]# kubectl get node --kubeconfig ./kubeconfig
NAME     STATUS   ROLES           AGE   VERSION
master   Ready    control-plane   11h   v1.25.6
node1    Ready    <none>          11h   v1.25.6

到目前为止部署安装完成了。

  • 忘记同步三台主机时间导致安装后出现证书无效,需要重新部署。
1
2
# ansible-playbook -i inventory/mycluster/inventory.ini reset.yml -b -vvv # 重置k8s集群
# ansible-playbook -i inventory/mycluster/inventory.ini    --user=root -b -v cluster.yml # 重新部署
  • 参考链接
1
2
3
4
5
# 离线部署配置
https://github.com/kubernetes-sigs/kubespray/blob/master/docs/mirror.md 

# DaoCloud(China)
https://github.com/DaoCloud/public-binary-files-mirror