一江春水向东流 https://open010.com Wed, 22 Oct 2025 09:34:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/open010.com/wp-content/uploads/2019/05/cropped-colorful-artistic-abstract-wood-web-header-1.jpg?fit=32%2C18&ssl=1 一江春水向东流 https://open010.com 32 32 KubeVirt uses custom virtual machines https://open010.com/2025/10/22/kubevirt-uses-custom-virtual-machines/ https://open010.com/2025/10/22/kubevirt-uses-custom-virtual-machines/#respond Wed, 22 Oct 2025 09:34:07 +0000 https://open010.com/?p=1925 背景
翻了一下笔记,2023年底接触了kubevirt 这个k8s下的虚拟机项目,当时也只是简单的部署以及使用现有的几个镜像,2024年也断断续续使用过几次,直到今年2025年8月再次接触,制作了自己的镜像才算基本搞通了这个kubevirt 的使用,时间跨度有点大,这里记录一下。

准备工作
kvm 虚拟化
物理机ubuntu 系统
apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils -y
虚拟机 ubuntu 系统
apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager -y
虚拟机 rockylinux 系统
yum install -y qemu-kvm libvirt virt-install bridge-utils
嵌套虚拟化
如果测试机器本身是虚拟机,则还有一个工作,开启嵌套虚拟化。

修改 grub 设置
安装了前面的几个kvm 相关包之后,还需要修改 grub 设置。

vi /etc/default/grub 加上最后1部分
GRUB_CMDLINE_LINUX=”crashkernel=auto rd.lvm.lv=rl/root intel_iommu=on”
重新生成 GRUB 配置文件
# 对于 BIOS 系统:
grub2-mkconfig -o /boot/grub2/grub.cfg
# 对于 UEFI 系统:
grub2-mkconfig -o /boot/efi/EFI/rocky/grub.cfg
# 判断是bios还是uefi
ls /sys/firmware/efi # 如果存在该目录,则为 UEFI
检查 qemu 状态
安装后,使用如下 virt-host-validate qemu 命令检查虚拟化是否正常

virt-host-validate qemu
QEMU: Checking for hardware virtualization : PASS
QEMU: Checking if device /dev/kvm exists : PASS
QEMU: Checking if device /dev/kvm is accessible : PASS
QEMU: Checking if device /dev/vhost-net exists : PASS
QEMU: Checking if device /dev/net/tun exists : PASS
QEMU: Checking for cgroup ‘cpu’ controller support : PASS
QEMU: Checking for cgroup ‘cpuacct’ controller support : PASS
QEMU: Checking for cgroup ‘cpuset’ controller support : PASS
QEMU: Checking for cgroup ‘memory’ controller support : PASS
QEMU: Checking for cgroup ‘devices’ controller support : PASS
QEMU: Checking for cgroup ‘blkio’ controller support : PASS
QEMU: Checking for device assignment IOMMU support : PASS
QEMU: Checking if IOMMU is enabled by kernel : PASS
QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support)

正式部署 kubevirt
参考 https://kubevirt.io/quickstart_cloud/,部署过程基本参考前面的官网链接,如下

yaml部署kubevirt
KubeVirt can be installed using the KubeVirt operator, which manages the lifecycle of all the KubeVirt core components.
查看kubevirt 当前最新版本
export VERSION=$(curl -s https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
echo $VERSION
v1.6.0

# 部署 kubevirt-operator
kubectl create -f “https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/kubevirt-operator.yaml”

# 输出如下
namespace/kubevirt created
customresourcedefinition.apiextensions.k8s.io/kubevirts.kubevirt.io created
priorityclass.scheduling.k8s.io/kubevirt-cluster-critical created
clusterrole.rbac.authorization.k8s.io/kubevirt.io:operator created
serviceaccount/kubevirt-operator created
role.rbac.authorization.k8s.io/kubevirt-operator created
rolebinding.rbac.authorization.k8s.io/kubevirt-operator-rolebinding created
clusterrole.rbac.authorization.k8s.io/kubevirt-operator created
clusterrolebinding.rbac.authorization.k8s.io/kubevirt-operator created
deployment.apps/virt-operator created

# 部署自定义资源
# Again use kubectl to deploy the KubeVirt custom resource definitions:

kubectl create
-f “https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/kubevirt-cr.yaml”

#检查组件 Verify components
#By default KubeVirt will deploy 7 pods, 3 services, 1 daemonset, 3 deployment apps, 3 replica sets.

kubectl get kubevirt.kubevirt.io/kubevirt -n kubevirt -o=jsonpath=”{.status.phase}”

Deploying( 再等等)

Check the components:

kubectl get all -n kubevirt

Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/virt-api-67778d48b6-7kjhm 0/1 ContainerCreating 0 19s
pod/virt-api-67778d48b6-z8lrq 0/1 ContainerCreating 0 20s
pod/virt-operator-b87fbb945-n7287 1/1 Running 0 3m35s
pod/virt-operator-b87fbb945-xl7pg 1/1 Running 0 3m34s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubevirt-operator-webhook ClusterIP 10.233.48.98 443/TCP 26s
service/kubevirt-prometheus-metrics ClusterIP None 443/TCP 26s
service/virt-api ClusterIP 10.233.49.2 443/TCP 26s
service/virt-exportproxy ClusterIP 10.233.29.96 443/TCP 26s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/virt-api 0/2 2 0 21s
deployment.apps/virt-operator 2/2 2 2 3m36s

NAME DESIRED CURRENT READY AGE
replicaset.apps/virt-api-67778d48b6 2 2 0 21s
replicaset.apps/virt-operator-b87fbb945 2 2 2 3m36s

NAME AGE PHASE
kubevirt.kubevirt.io/kubevirt 92s Deploying

9分钟后,部署完成,主要是拉镜像通过了
kubectl get kubevirt.kubevirt.io/kubevirt -n kubevirt -o=jsonpath=”{.status.phase}”
Deployed

这样检查也过了
kubectl get all -n kubevirt
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/virt-api-67778d48b6-7kjhm 1/1 Running 0 5m59s
pod/virt-api-67778d48b6-z8lrq 1/1 Running 0 6m
pod/virt-controller-8c6b9f8f4-4xhgd 1/1 Running 0 4m56s
pod/virt-controller-8c6b9f8f4-jcqzd 1/1 Running 0 4m56s
pod/virt-handler-642df 1/1 Running 0 4m55s
pod/virt-handler-crjvl 1/1 Running 0 4m55s
pod/virt-handler-tbmv7 1/1 Running 0 4m55s
pod/virt-operator-b87fbb945-n7287 1/1 Running 0 9m15s
pod/virt-operator-b87fbb945-xl7pg 1/1 Running 0 9m14s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubevirt-operator-webhook ClusterIP 10.233.48.98 443/TCP 6m5s
service/kubevirt-prometheus-metrics ClusterIP None 443/TCP 6m5s
service/virt-api ClusterIP 10.233.49.2 443/TCP 6m5s
service/virt-exportproxy ClusterIP 10.233.29.96 443/TCP 6m5s

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/virt-handler 3 3 3 3 3 kubernetes.io/os=linux 4m56s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/virt-api 2/2 2 2 6m
deployment.apps/virt-controller 2/2 2 2 4m56s
deployment.apps/virt-operator 2/2 2 2 9m15s

NAME DESIRED CURRENT READY AGE
replicaset.apps/virt-api-67778d48b6 2 2 2 6m
replicaset.apps/virt-controller-8c6b9f8f4 2 2 2 4m56s
replicaset.apps/virt-operator-b87fbb945 2 2 2 9m15s

NAME AGE PHASE
kubevirt.kubevirt.io/kubevirt 7m11s Deployed
部署 Virtctl 工具
VERSION=$(kubectl get kubevirt.kubevirt.io/kubevirt -n kubevirt -o=jsonpath=”{.status.observedKubeVirtVersion}”)
ARCH=$(uname -s | tr A-Z a-z)-$(uname -m | sed ‘s/x86_64/amd64/’) || windows-amd64.exe
echo ${ARCH}
curl -L -o virtctl https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-${ARCH}
chmod +x virtctl
sudo install virtctl /usr/local/bin
部署 kubevirt cdi
https://kubevirt.io/labs/kubernetes/lab2.html

export VERSION=$(basename $(curl -s -w %{redirect_url} https://github.com/kubevirt/containerized-data-importer/releases/latest))
kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml
kubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-cr.yaml
检查cdi 输出
kubectl -n cdi get all
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/cdi-operator-ccb895984-w4b6n 1/1 Running 0 3m8s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cdi-operator 1/1 1 1 3m8s

NAME DESIRED CURRENT READY AGE
replicaset.apps/cdi-operator-ccb895984 1 1 1 3m8s
[root@gm-10-29-221-9 ~]# kubectl get cdi cdi -n cdi
NAME AGE PHASE
cdi 7m15s Deployed
[root@gm-10-29-221-9 ~]# kubectl get pods -n cdi
NAME READY STATUS RESTARTS AGE
cdi-apiserver-6c76687b66-6l7d2 1/1 Running 1 (5m48s ago) 7m11s
cdi-deployment-5f6ff949d7-mlrth 1/1 Running 0 7m9s
cdi-operator-ccb895984-w4b6n 1/1 Running 0 10m
cdi-uploadproxy-b499c7956-lsh5r 1/1 Running 0 7m7s
检查 集群 确认有 sc
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
hwameistor-storage-lvm-hdd (default) lvm.hwameistor.io Retain WaitForFirstConsumer true 100d
local-path rancher.io/local-path Delete WaitForFirstConsumer false 100d

测试fedora 镜像, 补上 sc 等字段
cat < dv_fedora.yml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: “fedora”
spec:
storage:
resources:
requests:
storage: 5Gi
storageClassName: hwameistor-storage-lvm-hdd
accessModes:
– ReadWriteOnce
volumeMode: Filesystem
source:
http:
url: “https://download.fedoraproject.org/pub/fedora/linux/releases/40/Cloud/x86_64/images/Fedora-Cloud-Base-AmazonEC2.x86_64-40-1.14.raw.xz”
EOF

kubectl create -f dv_fedora.yml
启动一个vm1 虚拟机
wget https://kubevirt.io/labs/manifests/vm1_pvc.yml
cat vm1_pvc.yml

修改前
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
creationTimestamp: 2018-07-04T15:03:08Z
generation: 1
labels:
kubevirt.io/os: linux
name: vm1
spec:
runStrategy: Always
template:
metadata:
creationTimestamp: null
labels:
kubevirt.io/domain: vm1
spec:
domain:
cpu:
cores: 2
devices:
disks:
– disk:
bus: virtio
name: disk0
– cdrom:
bus: sata
readonly: true
name: cloudinitdisk
machine:
type: q35
resources:
requests:
memory: 1024M
volumes:
– name: disk0
persistentVolumeClaim:
claimName: fedora
– cloudInitNoCloud:
userData: |
#cloud-config
hostname: vm1
ssh_pwauth: True
disable_root: false
ssh_authorized_keys:
– ssh-rsa YOUR_SSH_PUB_KEY_HERE
name: cloudinitdisk

# Generate a password-less SSH key using the default location.
ssh-keygen
PUBKEY=cat ~/.ssh/id_rsa.pub
sed -i “s%ssh-rsa.*%$PUBKEY%” vm1_pvc.yml
kubectl create -f vm1_pvc.yml
fedora 起来了
virtctl console vm1 进入 这个虚拟机实例

用户是 fedora ,没有密码,ssh [email protected] 前面用了ssk-key
暴露一个nodeport
virtctl expose vmi vm1 –name=vm1-ssh –port=20222 –target-port=22 –type=NodePort
顺利完成第一轮测试,后面自制rocky linux 8.10镜像 以及win10-ltsc-2021-6216版本

定制镜像
这次自己打包了2个镜像,rockylinux 和 windows ltsc 2021.

rockylinux 镜像比较简单,rockylinux.org 官方就有 cloud 镜像,下载,重新打包即可。

wget -c https://dl.rockylinux.org/pub/rocky/8/images/x86_64/Rocky-8-GenericCloud-Base.latest.x86_64.qcow2

vi Dockerfile

FROM scratch
ADD –chown=107:107 Rocky-8-GenericCloud-Base.latest.x86_64.qcow2 /disk/

docker build -t 10.29.221.9/public/rockylinux:v810 .
docker push 10.29.221.9/public/rockylinux:v810
rocky810的镜像构建
首先rockylinux官网下载 Rocky-8-GenericCloud-Base.latest.x86_64.qcow2 云镜像,然后本地构建一下

wget -c https://dl.rockylinux.org/pub/rocky/8/images/aarch64/Rocky-8-GenericCloud-Base.latest.aarch64.qcow2

vi Dockerfile
FROM scratch
ADD –chown=107:107 Rocky-8-GenericCloud-Base.latest.x86_64.qcow2 /disk/

nerdctl build –platform=amd64 -t 10.29.221.9/public/rockylinux:v810 .

处理 rockylinux8 的 datavolume
kubectl get datavolumes.cdi.kubevirt.io rocky810-rootdisk -oyaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
labels:
kubevirt.io/created-by: 2782666c-6914-4395-8362-0067661a02b0
name: rocky810-rootdisk
namespace: default
spec:
pvc:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: hwameistor-storage-lvm-hdd
source:
registry:
url: docker://10.29.221.9/public/rockylinux:v810
rocky810 vm 配置
kubectl get vm rocky810 -oyaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
annotations:
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1
virtnest.io/alias-name: “”
virtnest.io/image-secret: “”
virtnest.io/image-source: docker
virtnest.io/os-image: 10.29.221.9/public/rockylinux:v810
labels:
virtnest.io/os-family: rocky
virtnest.io/os-version: “810”
name: rocky810
namespace: default
spec:
dataVolumeTemplates:
– metadata:
creationTimestamp: null
name: rocky810-rootdisk
namespace: default
spec:
pvc:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: hwameistor-storage-lvm-hdd
source:
registry:
url: docker://10.29.221.9/public/rockylinux:v810
runStrategy: Always
template:
metadata:
creationTimestamp: null
spec:
architecture: amd64
domain:
cpu:
cores: 1
sockets: 1
threads: 1
devices:
disks:
– bootOrder: 1
disk:
bus: virtio
name: rootdisk
– disk:
bus: virtio
name: cloudinitdisk
interfaces:
– masquerade: {}
name: default
machine:
type: q35
memory:
guest: 2Gi
resources: {}
networks:
– name: default
pod: {}
volumes:
– dataVolume:
name: rocky810-rootdisk
name: rootdisk
– cloudInitNoCloud:
userDataBase64: I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogZmFsc2UKY2hwYXNzd2Q6IHsibGlzdCI6ICJyb290OkRhb2Nsb3VkLjIwMjMiLCBleHBpcmU6IEZhbHNlfQoKd3JpdGVfZmlsZXM6CiAgLSBwYXRoOiAvZXRjL3N5c3RlbWQvc3lzdGVtL2RoY2xpZW50LW9uLWJvb3Quc2VydmljZQogICAgcGVybWlzc2lvbnM6ICIwNjQ0IgogICAgY29udGVudDogfAogICAgICBbVW5pdF0KICAgICAgRGVzY3JpcHRpb249UnVuIGRoY2xpZW50IG9uIGJvb3QKICAgICAgQWZ0ZXI9bmV0d29yay50YXJnZXQKCiAgICAgIFtTZXJ2aWNlXQogICAgICBUeXBlPW9uZXNob3QKICAgICAgRXhlY1N0YXJ0PS9zYmluL2RoY2xpZW50CiAgICAgIFJlbWFpbkFmdGVyRXhpdD10cnVlCgogICAgICBbSW5zdGFsbF0KICAgICAgV2FudGVkQnk9bXVsdGktdXNlci50YXJnZXQKcnVuY21kOgogIC0gc3lzdGVtY3RsIGRhZW1vbi1yZWxvYWQKICAtIHN5c3RlbWN0bCBlbmFibGUgZGhjbGllbnQtb24tYm9vdC5zZXJ2aWNlCiAgLSBzeXN0ZW1jdGwgc3RhcnQgZGhjbGllbnQtb24tYm9vdC5zZXJ2aWNlCiAgLSBzZWQgLWkgIi8jXD9QZXJtaXRSb290TG9naW4vcy9eLiokL1Blcm1pdFJvb3RMb2dpbiB5ZXMvZyIgL2V0Yy9zc2gvc3NoZF9jb25maWc=
name: cloudinitdisk
kubectl apply -f rocky810.yaml
检查一下vm 状态
kubectl get vm -A
NAMESPACE NAME AGE STATUS READY
default rocky810 50d Running True
default win10 46d Running True
windows 10 ltsc 2021版镜像
然后是这次折腾的重点windows 10 ltsc 2021版镜像

当前大家默认使用的是cloudbase 的win10 专业版镜像,使用上感觉有点卡,于是尝试以前使用的windows 10 ltsc 2021。

先检查 之前安装的 kubevirt cdi 服务
kubectl -n cdi get svc -l cdi.kubevirt.io=cdi-uploadproxy
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cdi-uploadproxy ClusterIP 10.233.7.11 443/TCP 3h24m
然后用virtctl 上传原始ISO镜像 ,实际名称为
19044.6216.250813-0800.vb_refresh_enterprise_ltsc_2021_x64freo_zh-cn_1b2d3b40.iso,这里改成 win10_ltsc.iso
virtctl image-upload –image-path=win10_ltsc.iso –storage-class hwameistor-storage-lvm-hdd pvc iso-win10 –size=7Gi –insecure –uploadproxy-url=https://10.233.7.11 –force-binds

输出日志为

PVC default/iso-win10 not found
PersistentVolumeClaim default/iso-win10 created
Waiting for PVC iso-win10 upload pod to be ready…
Pod now ready
Uploading data to https://10.233.7.11

4.64 GiB / 4.64 GiB [————————————————————————————-] 100.00% 42.76 MiB p/s 1m51s

Uploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress
Processing completed successfully
Uploading win10_ltsc.iso completed successfully
然后用这个 yaml 部署 vm
cat win10.vm.yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: win10
spec:
runStrategy: Always
template:
metadata:
labels:
kubevirt.io/domain: win10
spec:
domain:
cpu:
cores: 4
devices:
disks:
– bootOrder: 1
cdrom:
bus: sata
name: cdromiso
– disk:
bus: virtio
name: harddrive
– cdrom:
bus: sata
name: virtiocontainerdisk
interfaces:
– masquerade: {}
model: e1000
name: default
machine:
type: q35
resources:
requests:
memory: 16G
networks:
– name: default
pod: {}
volumes:
– name: cdromiso
persistentVolumeClaim:
claimName: iso-win10
– name: harddrive
hostDisk:
capacity: 50Gi
path: /data/disk.img
type: DiskOrCreate
– containerDisk:
image: dce-boot.io/docker.io/kubevirt/virtio-container-disk
name: virtiocontainerdisk
kubectl apply -f win10.vm.yaml
然后代理出来,用vnc 工具开始安装
virtctl vnc –proxy-only win10

好几年没搞过 ltsc 安装了,看到多了一个 iot版本 ltsc

默认看不到可用盘

直接确定,选择驱动位置。

选择 第二个, 和很久以前相比多了直通这一组驱动

这次截图只给了25G磁盘了,后来另一套环境扩到50G了

kubectl get pvc disk-windows -oyaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{“apiVersion”:”v1″,”kind”:”PersistentVolumeClaim”,”metadata”:{“annotations”:{},”name”:”disk-windows”,”namespace”:”default”},”spec”:{“accessModes”:[“ReadWriteOnce”],”resources”:{“requests”:{“storage”:”25Gi”}},”storageClassName”:”local-path”}}
pv.kubernetes.io/bind-completed: “yes”
pv.kubernetes.io/bound-by-controller: “yes”
volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
volume.kubernetes.io/selected-node: gm-10-29-221-9
volume.kubernetes.io/storage-provisioner: rancher.io/local-path
creationTimestamp: “2025-08-16T04:04:44Z”
finalizers:
– kubernetes.io/pvc-protection
name: disk-windows
namespace: default
resourceVersion: “32202616”
uid: d06116f4-606e-4567-884f-cb11bb44e8c0
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 25Gi
storageClassName: local-path
volumeMode: Filesystem
volumeName: pvc-d06116f4-606e-4567-884f-cb11bb44e8c0
status:
accessModes:
– ReadWriteOnce
capacity:
storage: 25Gi
phase: Bound
然后是正常安装

遇到了一次登录交互报错

强制重启后,到了熟悉的界面, 海内存知己

隐私网络等设置

第一次画面,地球仪图标打了叉,没有网络。

设备管理这里没有网卡设备,需要安装驱动。

pci 这里打了问号

安装驱动

一路确认即可

看到卡设备了,其他几个问号的pci 依样画葫芦即可。

开启远程桌面

关机,转换当前系统 disk.img 为 标准镜像
cd /opt/local-path-provisioner/pvc-d06116f4-606e-4567-884f-cb11bb44e8c0_default_disk-windows/
qemu-img convert -O qcow2 disk.img win10.ltsc.qcow2
得到1个12G的qcow2 格式镜像

# dockerfile构建
FROM scratch

COPY –chown=107:107 –from=build win10.ltsc.qcow2 /disk/win10.ltsc.qcow2

构建并推送
nerdctl build –platform=amd64 -t dce-boot.io/public/win10.ltsc:v1 –insecure-registry .

遗留问题
dce-boot.io/public/win10.ltsc:v1 这个镜像 没有经过类似 cloudbase 的优化,印象中不能直接给其他环境使用,直接启动,下次有空再补文档吧。

2025-10-13 更新

今天找环境使用了这个镜像,能直接在虚拟机模块使用,不用用户再次初始化,但是有2个小问题,它会继续更新补丁,这时电脑会很卡,另外据说2025-10-14 后,win10 不再更新。 然后磁盘占用比较厉害,30G的盘干掉了20G,需要精简。以后要么直接用精简版开始构建镜像吧。

更新后,系统信息这样的界面,我没看过,截图存档。

参考文档
https://kubevirt.io/quickstart_cloud/
https://icloudnative.io/posts/use-kubevirt-to-manage-windows-on-kubernetes/
https://www.ctyun.cn/document/10027726/10747218
https://www.geminiopencloud.com/zh-tw/blog/kubevirt-2/

FROM: https://blog.wanjie.info/2025/09/kubevirt-uses-custom-virtual-machines/

]]>
https://open010.com/2025/10/22/kubevirt-uses-custom-virtual-machines/feed/ 0
yq用法 https://open010.com/2024/03/07/yq%e7%94%a8%e6%b3%95/ https://open010.com/2024/03/07/yq%e7%94%a8%e6%b3%95/#respond Thu, 07 Mar 2024 07:35:12 +0000 https://open010.com/?p=1899 yq介绍

在开始使用yq之前,我们首先需要安装它。当你在谷歌上搜索yq时,你会发现两个项目/存储库。

首先,在https://github.com/kislyuk/yq是jq(JSON处理器)的包装器。如果你已经熟悉jq,你可能想抓住这个,使用你已经知道的语法。

不过,在本文中,我们将使用https://github.com/mikefarah/yq 这个版本,这个版本没有100%匹配jq语法,但它的优点是它没有依赖性(不依赖于jq),有关差异的更多上下文,请参阅下面的GitHub问题。

https://github.com/mikefarah/yq/issues/193

mikefarah版本的yq的具体用法参见文档:https://mikefarah.gitbook.io/yq/

工作实践

业务需求

由于公司产品需要部署在无网环境,所以需要制作适配各个组件的离线安装源,包括rpm/apt之类的系统相关软件包,以及各个二进制/压缩包之类的离线文件,原来是做法是统一将所有的离线源统一整合到一个yml文件当中去,现在的做法是将一个名为packages.yaml的文件放到各个组件的工程里,通过git下载然后进行聚合生成最后的全量列表。

聚合之前,通过生成一个config.yaml来进行选择哪些组件的聚合,下面是全量的配置:

---
downMode: git
project:
  - name: offline-packages
    url: https://git.xxx.com/PaaS/offline-packages.git
    branch: autoupdate-git
    item:
  - name: yks
    url: https://git.xxx.com/PaaS/paas-installer.git
    branch: develop
    item:
  - name: middleware
    url: https://git.xxx.com/PaaS/middleware.git
    branch: develop
    item:
      - redis
      - nginx
      - elasticsearch
      - kafka
      - kibana
      - license-server
      - minio

通过流水线传参,生成以分号为分隔符的参数,来自定义生成config_custom.yaml:

# 生成聚合用的config.yml
    if [ "$middleware_names" != "all" ]; then
      middleware_pattern="middleware|$(echo ${middleware_names} |sed 's/;/\|/g')"
      # 只保留选择的大类,如'yks, middleware, offline-packages'
      yq 'del(.project[] | select(.name | test("'$middleware_pattern'")|not))' config.yml > ./config_custom_tmp.yml
      # 再次筛选, 只保留middleware大类下的小类,如'redis, nginx'
      yq 'del(.project[].item.[] | select(. |test("'$middleware_pattern'")|not) )' config_custom_tmp.yml > ./config_custom.yml
      config_file=config_custom.yml
      rm -rf config_custom_tmp.yml
    else
      config_file=config.yml
    fi

解析:

由于yq的contains不支持多个匹配,所以在这里用到了test来测试是否包含以‘|’分隔的多个字符

del的用法:

用del删除之前,先要能查询打印出需要删除的部分,才能进一步使用del删除,如:

如果选择了只生成middleware和paas-installer, 则先查出不包含middleware和paas-installer的部分:

[root@jenkins1-iuap-hb2-ali offline-packages]# cat config.yml |yq '.project[]| select(.name | test("middleware|yks")|not)'
name: offline-packages
url: https://git.xxx.com/paas/offline-packages.git
branch: develop
item:

然后调用del删除:

[root@jenkins1 offline-packages]# cat config.yml |yq 'del(.project[]| select(.name | test("middleware|yks")|not))'
---
downMode: git
project:
  - name: paas
    url: https://git.xxx.com/PaaS/paas-installer.git
    branch: develop
    item:
  - name: middleware
    url: https://git.xxx.com/PaaS/middleware.git
    branch: develop
    item:
      - redis
      - nginx
      - elasticsearch
      - kafka
      - kibana
      - license-server
      - minio

 

如果只选择了paas,nginx和redis,则需再次筛选, 只保留middleware大类下的小类,如’redis, nginx’:

 

[root@jenkins1 offline-packages]# yq 'del(.project[].item.[] | select(. |test("redis|nginx")|not) )' config_custom_tmp.yml
---
downMode: git
project:
  - name: yks
    url: https://git.xxx.com/PaaS/paas-installer.git
    branch: develop
    item: []
  - name: middleware
    url: https://git.xxx.com/PaaS/middleware.git
    branch: develop
    item:
      - redis
      - nginx

 

 

 

    osInfo=$(awk ‘BEGIN{split(“‘${os_names}'”, arr, “,”);for(key in arr) if(arr[key] ~ “ubuntu”) print arr[key]”:apt”;else print arr[key]”:yum”}’)

最后生成的格式如下:

os: kylin
version: v10-sp3
package:
- name:
  - telnet
  - curl
  - wget
 
- name:
  - docker-ce-20.10.17
  - docker-ce-cli-20.10.17
  - containerd.io-1.6.28
  arch: amd64
  repo:
  - https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  - http://mirrors.aliyun.com/repo/Centos-7.repo
  replace:
  - item: repo
    from: $releasever
    to: "7"
- name:
  - docker-ce-20.10.17
  - docker-ce-cli-20.10.17
  - containerd.io-1.6.28
  arch: arm64
  repo:
  - https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  - http://mirrors.aliyun.com/repo/Centos-altarch-7.repo
  replace:
  - item: repo
    from: $releasever
    to: "7"
file:
- name: cni-plugins
  arch: amd64
  src: http://bucket.oss-cn-beijing.aliyuncs.com/download/nexus2/raw-apis/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
  dest: raw-apis/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz

需要再次通过yq来进行提取过滤:

  • 提取不需要配repo的以及带arch=amd64/x86_64或不带key为arch的列表:

yq -o=j -I=0 '.package[] | select(. | has("repo")|not) | select(.arch == "amd64" or .arch == "x86_64" or has("arch")|not)' packages.yaml

其中-o=j表示输出是json格式,-I=0代表sets indent level for output为0,默认是2,输出结果是:

{
  "name": [
    "telnet",
    "curl",
    "wget",
     ...
  ]
}

  • 提取需要配repo的(带key为repo的)以及带arch=amd64/x86_64或不带key为arch的列表:

yq -o=j -I=0 '.package[] | select(. | contains({"repo":""})) | select(.arch == "amd64" or .arch == "x86_64" or has("arch")|not) ' packages.yaml

因为rhel 8的docker repo不同,聚合生成了好几个带repo的docker安装列表:

os: rhel
version: "8.8"
package:
- name:
  - telnet
- name:
  - docker-ce-20.10.17
  - docker-ce-cli-20.10.17
  - containerd.io-1.6.28
  arch: amd64
  repo:
  - https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  - http://mirrors.aliyun.com/repo/Centos-7.repo
  replace:
  - item: repo
    from: $releasever
    to: "7"
- name:
  - docker-ce-20.10.17
  - docker-ce-cli-20.10.17
  - containerd.io-1.6.28
  arch: arm64
  repo:
  - https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  - http://mirrors.aliyun.com/repo/Centos-altarch-7.repo
  replace:
  - item: repo
    from: $releasever
    to: "7"
- name:
  - ntp
  - ntpdate
  - libselinux-python
  repo:
  - https://mirrors.aliyun.com/repo/Centos-7.repo
  replace:
  - item: repo
    from: $releasever
    to: "7"
- name:
  - docker-ce-25.0.3
  - docker-ce-cli-25.0.3
  - containerd.io-1.6.28
  arch: amd64
  repo:
  - https://mirrors.aliyun.com/docker-ce/linux/rhel/docker-ce.repo
  - http://mirrors.aliyun.com/repo/Centos-8.repo
  replace:
  - item: repo
    from: $releasever
    to: "8"

需要只提取rhel安装用的:

yq 'del( .package[] | select(. | contains({"repo":""})) | select(.name[] =="*docker*") | select(.repo[] == "*/linux/centos/*") )'  all_packages.yaml

下面是构建过程中用的Dockerfile:
ARG FROM=""
FROM $FROM as packages

ARG FROM=""
WORKDIR /packages

# download packages
RUN set -x \
    && ARCH=$(uname -m); echo "ARCH=$ARCH" > .info \
    && ARCH_ALIAS=$(echo $ARCH | sed 's/x86_64/amd64/;s/aarch64/arm64/'); echo "ARCH_ALIAS=$ARCH_ALIAS" >> .info \
    && BASE_URL=http://bucket.oss.aliyuncs.com/download \
    && image=$(echo $FROM|awk -F'/' '{print $NF}') \
    && image_name=$(echo $image|awk -F':' '{print $1}') \
    && image_tag=$(echo $image|awk -F':' '{print $2}') \
    && os_name=${image_name} \
    && os_version=$(echo $image_tag|sed 's/-arm64//;s/-amd64//') \
    && echo "os_name=$os_name" >> .info \
    && echo "image_name=$image_name" >> .info \
    && echo "image_tag=$image_tag" >> .info \ 
    && echo "os_version=$os_version" >> .info
    # && curl -o /usr/bin/yq $BASE_URL/binary/yq_linux_$ARCH && chmod +x /usr/bin/yq

COPY all_packages.yaml .

RUN set -x; echo "download package with no custom repo firstly" \
    && source ./.info \
    && declare -A packages \
    && packages=$(yq -o=j -I=0 '.package[] | select(. | has("repo")|not) | select(.arch == "'$ARCH_ALIAS'" or .arch == "'$ARCH'" or has("arch")|not)' all_packages.yaml) \
    && for pkg in ${packages}; do \
        if (which rpm &> /dev/null); then \
            echo "$pkg"|yq '.name[]' | sort -u | xargs repotrack -d 9 --destdir "$os_name/$os_version/os/$ARCH/Packages"; \
        elif (which apt &> /dev/null); then \
            echo $(echo "$pkg"|yq '.name[]') $(dpkg --get-selections | grep -v deinstall | cut -f1 | cut -d ':' -f1) | sed -e 's/ntp\|ntpdate/ /g' | \
            sort -u | xargs apt-get install --reinstall --print-uris | awk -F "'" '{print $2}' | grep -v '^$' > packages_1.urls; \
            apt-get install --reinstall --print-uris ntp ntpdate | awk -F "'" '{print $2}' | grep -v '^$' >> packages_1.urls; \
            wget -q -x -P $os_name/$os_version -i packages_1.urls; \
        fi; \
       done

RUN set -x; echo "download package with custom repo sencondly" \
    && source ./.info \
    && declare -A packagesWithRepo \
    && packagesWithRepo=$(yq -o=j -I=0 '.package[] | select(. | contains({"repo":""})) | select(.arch == "'$ARCH_ALIAS'" or .arch == "'$ARCH'" or has("arch")|not) ' all_packages.yaml) \
    && for pkg in ${packagesWithRepo}; do \
        repo=$(echo "$pkg"|yq '.repo[]'); \
        replace=$(echo "$pkg"|yq -o=j -I=0 '.replace[]'); \
        echo "$repo"|while read ro; do \
            if (which rpm &> /dev/null); then \
                repo_name=${ro##*/}; \
                curl -k -o /etc/yum.repos.d/${repo_name} ${ro}; \
                for r in ${replace}; do \
                    from=$(echo "$r"|yq '.from'); \
                    to=$(echo "$r"|yq '.to'); \
                    sed -i "s@$from@$to@g" /etc/yum.repos.d/${repo_name}; \
                done; \
            elif (which apt &> /dev/null); then \
                eval echo "$ro" > /etc/apt/sources.list.d/custom.list; \
            fi; \
        done; \
        if (which rpm &> /dev/null); then \
            yum makecache; \
            echo "$pkg"|yq '.name[]' | sort -u | xargs repotrack -d 9 --destdir "$os_name/$os_version/os/$ARCH/Packages"; \
        elif (which apt &> /dev/null); then \
            echo "$pkg"|yq '.name[]' | xargs apt-get --allow-unauthenticated install --reinstall --print-uris | awk -F "'" '{print $2}' | grep -v '^$' | sort -u > packages_2.urls; \
            wget -q -x -P $os_name/$os_version -i packages_2.urls; \
        fi; \
       done

# create repo
RUN set -x \
    && source ./.info \
    && if (which rpm &> /dev/null); then \
        createrepo -d "$os_name/$os_version/os/$ARCH"; \
        if [ "$image_name" == "rhel" -o "$image_name" == "nfs" ]; then \
            # 修复:No available modular metadata for modular package
            repo2module -s stable $os_name/$os_version/os/$ARCH modules.yaml \
            && modifyrepo_c --mdtype=modules modules.yaml $os_name/$os_version/os/$ARCH/repodata/ \
            && rm -rf modules.yaml; \
        fi; \
        printf '\
['$os_name''$os_version'] \n\
name='$os_name''$os_version' \n\
baseurl=http://{{ nexus_access_address|default(127.0.0.1) }}/nexus/content/repositories/'$os_name'/'$os_version'/os/$basearch/ \n\
enabled=1 \n\
gpgcheck=0 \n\
sslverify=0 \n\
' > ${os_name}.repo.j2; \
       elif (which apt &> /dev/null); then \
        dpkg-scanpackages $os_name/$os_version | gzip -9c > $os_name/$os_version/Packages.gz; \
        printf '\
deb [trusted=yes] http://{{ nexus_access_address|default(127.0.0.1) }}/nexus/content/repositories/'$os_name'/'$ARCH_ALIAS' '$os_version'/ \n\
' > ${os_name}.list.j2; \
       fi \
    \
    # clean finaly
    && rm -rf packages_1.urls packages_2.urls


# download files
RUN set -x;source ./.info \
    && item=$(yq -o=j -I=0 '.file[] | select(.arch == "'$ARCH_ALIAS'" or .arch == "'$ARCH'" or has("arch")|not)' all_packages.yaml) \
    && if [ "X$item" != "X" ]; then \
        for i in $item; do \
            src=$(echo "$i"|yq '.src'); \
            dest=$(echo "$i"|yq '.dest'); \
            dest_path=${dest%/*}; \
            # dest_file=${dest##*/}; \
            mkdir -p files/$dest_path; \
            wget -c -q -O files/$dest "$src"; \
        done; \
    fi

FROM scratch
COPY --from=packages /packages ./packages

 

]]>
https://open010.com/2024/03/07/yq%e7%94%a8%e6%b3%95/feed/ 0
Linux自动探测网卡名称 https://open010.com/2024/02/26/linux%e8%87%aa%e5%8a%a8%e6%8e%a2%e6%b5%8b%e7%bd%91%e5%8d%a1%e5%90%8d%e7%a7%b0/ https://open010.com/2024/02/26/linux%e8%87%aa%e5%8a%a8%e6%8e%a2%e6%b5%8b%e7%bd%91%e5%8d%a1%e5%90%8d%e7%a7%b0/#respond Mon, 26 Feb 2024 07:21:57 +0000 https://open010.com/?p=1894 Linux上需要自动探测通信的网卡名称,有时会有多网卡

方案一

通过ip命令提取默认路由,来找到默认通信的网卡

获取IP命令:

if [[ "$useIPV6" == "true" ]]; then
        ip_r=$(ip -6 r g 2000::1)
        LOCAL_IP=$(grep -oP "src \S+ "  < <(echo $ip6_r) | sed 's/src //' | awk '{gsub(/^\s+|\s+$/, "");print}')
        if [ "X$LOCAL_IP" == "X" ]; then
            LOCAL_IP=$(ip add | grep -w inet6 | grep -v ::1 | awk NR==1'{print $2}' | cut -d "/" -f 1)
        fi
else        
        ip_r=$(ip r g 1.0.0.0)
        LOCAL_IP=$(grep -oP "src \S+ "  < <(echo $ip_r) | sed 's/src //' | awk '{gsub(/^\s+|\s+$/, "");print}')
        if [ "X$LOCAL_IP" == "X" ] || (! echo $LOCAL_IP | grep -Eq '^([0-9]{1,3}\.){3}[0-9]{1,3}$'); then
            LOCAL_IP=$(ip add | grep -w inet | grep -v 127.0.0.1 | awk NR==1'{print $2}' | cut -d "/" -f 1)
        fi
    fi
fi

 

获取网卡名称

function get_cfgname(){
    ipa_info=$(ip a)
    line=$(echo "${ipa_info}" | sed -n -e "/\<${LOCAL_IP}\>/=")
    detect_cfgname=$(echo "${ipa_info}" | sed -n "1,${line}p" | grep '^[0-9]' | sed -n '$p' | awk -F ':| ' '{print $3}')
    echo $detect_cfgname
}

function interface_check() {
    cfgname=$(get_cfgname)
    # 判断网卡名称,此变量由环境变量传过来,如没有传时,则根据ip来反推网卡名称
    if [ "X$ifCfgName" == "X" ]; then
        ifCfgName=$cfgname
        if [ "X$ifCfgName" != "X" ]; then
            echo_color info "经自动探测,服务器网卡名称为:${ifCfgName}"
        else
            echo_color error "服务器网卡名称:${ifCfgName} 不存在, 请再次确认!"
            exit 1
        fi
    elif [ "$ifCfgName" != "$cfgname" ]; then
        detect_ipaddr=$(ip addr | awk '/inet/ && ! /\/32/ {ip[$NF] = $2; sub(/\/.*$/,"",ip[$NF])} END {for(i in ip){if(i ~ "'$ifCfgName'") print ip[i]}}')
        if [ "$LOCAL_IP" != "$detect_ipaddr" ]; then
            echo_color warning "填的网卡名称是$ifCfgName,跟探测的网卡名称$cfgname 不一致,请再次确认!"
            exit 1
        fi
    else
        echo_color info "填写的服务器网卡名称为:${ifCfgName}"
    fi

    write_restore_end
}

 

方案二

通过ansible_default_ipv4变量

获取IP:

if [[ "$useIPV6" == "true" ]]; then
    filter=ansible_default_ipv6
else        
    filter=ansible_default_ipv4
fi
# define parameters of address and interface
eval $(ansible localhost -m setup -a 'gather_subset=!all,network filter='$filter''|egrep "\"interface\"|\"address\""|awk -F': ' '{gsub(/"|,| /,"",$0);gsub(/:/,"=",$0);print $0}')

LOCAL_IP=$address

获取网卡名称:

# 判断网卡名称,此变量由环境变量传过来,如没有传时,则提取ansible_default_ipv4变量
    if [ "X$ifCfgName" == "X" ]; then
        ifCfgName=$interface
        echo_color info "经自动探测,服务器网卡名称为:${ifCfgName}"
    elif [ "$ifCfgName" != "$interface" ]; then
        echo_color warning "填的网卡名称是$ifCfgName,跟探测的网卡名称$interface 不一致,请再次确认!"
        exit 1
    else
        echo_color info "填写的服务器网卡名称为:${ifCfgName}"
    fi

 

]]>
https://open010.com/2024/02/26/linux%e8%87%aa%e5%8a%a8%e6%8e%a2%e6%b5%8b%e7%bd%91%e5%8d%a1%e5%90%8d%e7%a7%b0/feed/ 0
Linux使用safe-rm防止误删系统文件 https://open010.com/2024/02/21/linux%e4%bd%bf%e7%94%a8safe-rm%e9%98%b2%e6%ad%a2%e8%af%af%e5%88%a0%e7%b3%bb%e7%bb%9f%e6%96%87%e4%bb%b6/ https://open010.com/2024/02/21/linux%e4%bd%bf%e7%94%a8safe-rm%e9%98%b2%e6%ad%a2%e8%af%af%e5%88%a0%e7%b3%bb%e7%bb%9f%e6%96%87%e4%bb%b6/#respond Wed, 21 Feb 2024 09:41:51 +0000 https://open010.com/?p=1889 前言

safe-rm 是一款用来替代不安全 rm 的开源软件,可以在 /etc/safe-rm.conf 文件中配置保护名单,定义哪些文件不能被 rm 删除,可用于防止执行 rm -rf 命令导致文件被误删的发生。

安装 safe-rm 工具

0.x版本的是通过shell脚本来实现的,而1.x版本则通过rust来实现的,需要现编译。

0.12版本下载安装:

# 下载文件(-c 的作用是可以断点续传)
# wget -c https://launchpadlibrarian.net/188958703/safe-rm-0.12.tar.gz

# 解压文件
# tar -xvf safe-rm-0.12.tar.gz

# 拷贝可执行文件
# cd safe-rm
# cp safe-rm /bin/

# 建立软链接
# mv /bin/rm /bin/rm.bak
# ln -s /bin/safe-rm /bin/rm

创建 safe-rm 配置文件,添加保护名单

# 默认的safe-rm配置文件有以下两个,需要自行创建

全局配置:/etc/safe-rm.conf
用户配置:~/.safe-rm

# 创建全局配置文件
# touch /etc/safe-rm.conf

# 添加保护名单
# vim /etc/safe-rm.conf
/
/*
/etc
/etc/*
/usr
/usr/*
/usr/local
/usr/local/*
/usr/local/bin
/usr/local/bin/*
/bin/*
/boot/*
/dev/*
/etc/*
/home/*
/lib/*
/lib64/*
/media/*
/mnt/*
/opt/*
/proc/*
/root/*
/run/*
/sbin/*
/srv/*
/sys/*
/var/*

测试 save-rm 是否生效

# 创建测试文件
# touch /home/test.txt

# 追加需要保护的文件路径到配置文件中
# vim /etc/safe-rm.conf
/home/test.txt

# 测试删除受保护的文件路径,如果输出skipping日志代表safe-rm生效
# rm /home/test.txt
# rm -rf /home/test.txt
safe-rm: skipping /home/test.txt

# 注意:
# 配置文件里面的/etc只能保证执行"rm -rf /etc"命令的时候不能删除,但是如果执行"rm -rf /etc/app",还是可以删除app文件的
# 如果想保证某个目录下面的所有文件都不被删除,则配置文件里可以写成/etc/*,但使用通配符的方式无法避免/etc目录下链接文件被删除
# 例如/lib或/lib64这种目录,下面会有很多库文件对应的链接文件,使用safe-rm并不能保护链接文件被删除
# 建议将/etc/safe-rm.conf加入到保护名单中,防止/etc/safe-rm.conf被删后配置失效

# 使用系统默认的删除命令,此时safe-rm的保护作用将失效 # /usr/bin/rm -rf /etc/app

 

1.1.0版本下载安装:

1.x版本解决了软链接文件的问题,保护目录下的软链接文件也不会被删除。

官网地址是https://launchpad.net/safe-rm/trunk/1.1.0

#下载safe-rm源码
wget -c https://launchpad.net/safe-rm/trunk/1.1.0/+download/safe-rm-1.1.0.tar.gz

#解压
tar -zxvf safe-rm-1.1.0.tar.gz 
#安装依赖
yum install cargo
cd safe-rm-1.1.0/
#编译,编译完的safe-rm文件大小是11M
make
#移动功能文件到系统目录
mv target/release/safe-rm /usr/local/bin/
#替换rm命令
echo 'alias rm="safe-rm -i"' >> /etc/bashrc
#刷新修改的文件
source /etc/bashrc
#创建链接,将safe-rm替换rm
ln -s /usr/local/bin/safe-rm /usr/local/bin/rm
echo 'PATH=/usr/local/bin:$PATH' >>  /etc/profile
source /etc/profile

 

]]>
https://open010.com/2024/02/21/linux%e4%bd%bf%e7%94%a8safe-rm%e9%98%b2%e6%ad%a2%e8%af%af%e5%88%a0%e7%b3%bb%e7%bb%9f%e6%96%87%e4%bb%b6/feed/ 0
centos7通过snap安装certbot免费获取并自动续期https证书 https://open010.com/2024/02/04/1877/ https://open010.com/2024/02/04/1877/#respond Sun, 04 Feb 2024 02:12:37 +0000 https://open010.com/?p=1877 之前一直使用certbot-auto来获取https证书,后来更换了新域名, 重新生成证书时,报:“Your system is not supported by certbot-auto anymore.”,
查了一下是因为certbot-auto团队没有精力为所有操作系统进行维护,所以包括centos7在内的许多系统已不被支持,目前certbot不推荐在centos7上使用仓库安装,官方建议使用snap进行certbot的安装和更新。官方原文如下:

While the Certbot team tries to keep the Certbot packages offered by various operating systems working in the most basic sense, due to distribution policies and/or the limited resources of distribution maintainers, Certbot OS packages often have problems that other distribution mechanisms do not. The packages are often old resulting in a lack of bug fixes and features and a worse TLS configuration than is generated by newer versions of Certbot. They also may not configure certificate renewal for you or have all of Certbot’s plugins available. For reasons like these, we recommend most users follow the instructions at https://certbot.eff.org/instructions and OS packages are only documented here as an alternative.

翻译:

尽管Certbot团队试图保持各操作系统的Certbot包在最基本的功能上运作,但由于各发行版的政策和有限的维护人员,Certbot OS包通常存在其他发行版本没有的问题。这些包通常很旧,导致缺少bug修复和一些特性,并且TLS的配置比新版本Certbot生成的较差。他们也可能没有为你配置证书更新,或者没有Certbot的全部可用插件。出于这些原因,我们建议大多数用户按照https://certbot.eff.org/instructions而操作系统包在这里只是作为一个替代方案。  

安装Snapd

官方给出的文档https://snapcraft.io/docs/installing-snapd,这里翻译一下CentOS版本

Snap在CentOS 7.6版本以上均可用

sudo yum install epel-release

添加EPEL repository后,进行Snapd的安装

sudo yum install snapd

安装后,需要启用管理snap通信套接字的systemd unit

sudo systemctl enable --now snapd.socket

为了启用classic snap的支持,需要创建如下软连接

sudo ln -s /var/lib/snapd/snap /snap

重新登录或者重启你的系统,以确保snap的路径正确更新,至此snap安装完成。

安装Certbot

升级snap

执行如下命令以保证Snap为最新的版本

sudo snap install core
sudo snap refresh core

卸载Certbot和其他Certbot OS包

如果使用操作系统包管理器(如apt、dnf或yum)安装了任何Certbot包,则应在安装Certbot snap之前将其删除,以确保在运行命令Certbot时使用snap,而不是从操作系统包管理器安装。执行此操作的确切命令取决于你的操作系统,常见的示例有sudo apt get remove certbot、sudo dnf remove certbot或sudo yum remove certbot。

如果以前通过Certbot auto脚本使用Certbot,还应该按照此处的说明删除其安装。

安装Certbot

依次执行下列命令

# 删除以前的安装
yum -y remove certbot
rm -rf /opt/scripts/certbot

snap install --classic certbot

当执行snap install –classic certbot命令时,可能会有如下报错

[root@localhost ~]# snap install --classic certbot
error: cannot install "certbot": classic confinement requires snaps under /snap
       or symlink from /snap to /var/lib/snapd/snap

解决:

创建一个软链:

ln -s /var/lib/snapd/snap /snap

 

执行如下命令以确保Certbot命令行可用

ln -s /snap/bin/certbot /usr/bin/certbot

 

下面是生成两个域名的SSL证书

certbot certonly --standalone --email [email protected] -d open010.com -d www.open010.com

当然, 我们用Let’s Encrypt生成 泛域名ssl证书最方便了, 这样所有子域名都可以直接使用一个证书

官网文档:https://certbot.eff.org/docs/using.html

直接使用官网首页的安装方法是无法使用最新的Let’s Encrypt的v2 API,这里加参数–server https://acme-v02.api.letsencrypt.org/directory ,还需要增加DNS验证,需要添加参数–preferred-challenges dns,而我用的DNS服务器商没有API来自动验证,故需要添加参数–manual,还需要注意的是:域名不能只添加*.xxx.com, 还需要增加xxx.com, 不然只能子域名用,xxx.com域名就用不了了, 所以最后的命令为:

certbot certonly --preferred-challenges dns --manual  -d *.open010.com -d open010.com --server https://acme-v02.api.letsencrypt.org/directory

签发证书时提示添加TXT记录,需要打开域名注册网站,按要求添加一下TXT记录,等记录生效后, 按回车,这样等待签发完成即可

[root@summer sean]# certbot certonly --preferred-challenges dns --manual  -d *.open010.com --server https://acme-v02.api.letsencrypt.org/directory
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Requesting a certificate for *.open010.com

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/open010.com-0002/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/open010.com-0002/privkey.pem
This certificate expires on 2024-05-04.
These files will be updated when the certificate renews.

NEXT STEPS:
- This certificate will not be renewed automatically. Autorenewal of --manual certificates requires the use of an authentication hook script (--manual-auth-hook) but one was not provided. To renew this certificate, repeat this same certbot command before the certificate's expiry date.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

 

 

生成证书时, 会在/etc/letsencrypt/renewal目录下生成配置文件open010.com.conf,生成前可以将旧的配置文件先删除

申请 成功后,证书会保存在/etc/letsencrypt/live/open010.com下面

配置nginx证书,配置中用到一个 /data/secret/dhparam.pem 文件,该文件是一个 PEM 格式的密钥文件,用于 TLS 会话中。用来加强 ssl 的安全性。生成该文件方法,

mkdir /etc/ssl/private/
cd /etc/ssl/private/
openssl dhparam 2048 -out dhparam.pem

配置到nginx:

ssl_dhparam /etc/ssl/private/dhparam.pem;

自动续期证书

Certbot包带有cron作业或systemd计时器,它将在证书过期之前自动续订证书。除非更改配置,否则不需要再次运行Certbot。通过运行以下命令,可以测试证书的自动续订

sudo certbot renew --dry-run

 

Let’s Encrypt证书有效期为90天,设置一个定时任务来运行certbot命令来自动续期

0 2  1 *  *  certbot renew --quiet && nginx -s reload

这样,系统就会每月1日凌晨两点续期证书了

]]>
https://open010.com/2024/02/04/1877/feed/ 0
挂载nfs目录导致 df 命令卡死问题解决 https://open010.com/2024/01/23/%e6%8c%82%e8%bd%bdnfs%e7%9b%ae%e5%bd%95%e5%af%bc%e8%87%b4-df-%e5%91%bd%e4%bb%a4%e5%8d%a1%e6%ad%bb%e9%97%ae%e9%a2%98%e8%a7%a3%e5%86%b3/ https://open010.com/2024/01/23/%e6%8c%82%e8%bd%bdnfs%e7%9b%ae%e5%bd%95%e5%af%bc%e8%87%b4-df-%e5%91%bd%e4%bb%a4%e5%8d%a1%e6%ad%bb%e9%97%ae%e9%a2%98%e8%a7%a3%e5%86%b3/#respond Tue, 23 Jan 2024 07:09:12 +0000 https://meaninglive.com/?p=1858 问题:

linux服务器挂载了一个含有海量文件的 nfs 目录,当使用 df命令时卡住了?

先确认下是不是这个 nfs 目录的原因,使用strace df -h跟踪一下是哪个系统调用有问题。遇到卡住的地方就会停住

[root@node1 ~]# strace df -h
execve("/usr/bin/df", ["df", "-h"], 0x7ffefb6cbde8 /* 23 vars */) = 0
brk(NULL)                               = 0x230b000
...
...
statfs("/data/nfs",    #卡住的地方,确实是挂载的nfs目录

nfsstat -m命令定位一下挂载的目录

结果发现, 服务器是挂载了一台不存在的nfs server导致的

处理:

  1. 先查看目录 mount -l 列出挂载的目录
  2. 强制卸载目录 umount -f -l 挂载的目录,如:umount -f  -l /data/nfs
]]>
https://open010.com/2024/01/23/%e6%8c%82%e8%bd%bdnfs%e7%9b%ae%e5%bd%95%e5%af%bc%e8%87%b4-df-%e5%91%bd%e4%bb%a4%e5%8d%a1%e6%ad%bb%e9%97%ae%e9%a2%98%e8%a7%a3%e5%86%b3/feed/ 0
mysql 备份和修改表 https://open010.com/2024/01/18/1853/ https://open010.com/2024/01/18/1853/#respond Thu, 18 Jan 2024 02:10:03 +0000 https://meaninglive.com/?p=1853 mysql 备份和修改表

#!/bin/bash

DB_USER='ro_all_db'
DATE=`date -d"today" +%Y%m%d`
TIME=`date "+%Y-%m-%d %H:%M:%S"`
MYSQL_SOURCE='172.20.x.x'
# DB_PASSWORD='xxxx'

echo '--------------开始备份:开始时间为 '$TIME
BEGIN=`date "+%Y-%m-%d %H:%M:%S"`
BEGIN_T=`date -d "$BEGIN" +%s`
echo '===>备份'$MYSQL_SOURCE'的mysql实例,开始时间为 '$BEGIN
BACKUP_DIR=/data/backup/$DATE/$MYSQL_SOURCE;
mkdir -p  $BACKUP_DIR;

for i in `mysql --ssl-mode=DISABLED -h$MYSQL_SOURCE -u$DB_USER -p'mypasswd' -B iuap_apcom_supportcenter -e "show tables;" | grep 'multilang_'`; do
	echo "-->开始备份表$i"
	mysqldump --ssl-mode=DISABLED --skip-opt -h$MYSQL_SOURCE -u$DB_USER -p'mypasswd' iuap_apcom_supportcenter $i > $BACKUP_DIR/$i.sql
	echo "-->表${i}已经备份完成"
done

cd $BACKUP_DIR
mkdir $BACKUP_DIR/replace_sql
for i in `ls *.sql`; do
	cat $i | sed -n '/^INSERT INTO/p'|sed 's/^INSERT INTO/REPLACE INTO/g'> $BACKUP_DIR/replace_sql/$i
done


cd $BACKUP_DIR/replace_sql
let num=1
echo "--------------开始修改:开始时间为 `date "+%Y-%m-%d %H:%M:%S"`"
for i in `ls *.sql`; do
	echo "$num-->开始修改表:$i,开始时间为 `date "+%Y-%m-%d %H:%M:%S"`"
	mysql -h172.20.1.1 -upremises -pmypasswd iuap_apcom_supportcenter < $i
	echo "$num-->完成修改表:$i,完成时间为 `date "+%Y-%m-%d %H:%M:%S"`"
	let num+=1
done
TIME=`date "+%Y-%m-%d %H:%M:%S"`
echo "--------------结束修改:时间为`date "+%Y-%m-%d %H:%M:%S"`"

 

]]>
https://open010.com/2024/01/18/1853/feed/ 0
使用 GitHub Actions 编译 kubernetes 组件 https://open010.com/2023/12/26/%e4%bd%bf%e7%94%a8-github-actions-%e7%bc%96%e8%af%91-kubernetes-%e7%bb%84%e4%bb%b6/ https://open010.com/2023/12/26/%e4%bd%bf%e7%94%a8-github-actions-%e7%bc%96%e8%af%91-kubernetes-%e7%bb%84%e4%bb%b6/#respond Tue, 26 Dec 2023 08:37:31 +0000 https://meaninglive.com/?p=1850 在使用 kubernetes 过程中由于某些需求往往要修改一下 k8s 官方的源码,然后重新编译才行。本文就以修改 kubeadm 生成证书为默认 100 年为例,来讲解如何使用 GitHub Actions 来编译和发布生成的二进制文件。

构建

clone repo

将 kubernetes 官方源码 fork 到自己的 repo 中

$ git clone https://github.com/k8sli/kubernetes.git
$ cd kubernetes
$ git remote add upstream https://github.com/kubernetes/kubernetes.git
$ git fetch --all
$ git checkout upstream/release-1.32
$ git checkout -B kubeadm-1.32

注:如果github上没有相关branch,也需要基于upstream来首先creat一个分支(比如release-1.32)

workflow

  • .github/workflows/kubeadm.yaml

---
name: Build kubeadm binary

on:
  push:
    tag:
      - 'v*'
  # 手动触发事件
  workflow_dispatch:
jobs:
  build:
    runs-on: ubuntu-20.04
    # 这里我们选择以 tag 的方式惩触发 job 的运行
    if: startsWith(github.ref, 'refs/tags/')
    steps:
      - name: Checkout
        uses: actions/checkout@v2

      - name: Build kubeadm binary
        shell: bash
        run: |
          # 运行 build/run.sh 构建脚本来编译相应平台上的二进制文件
          build/run.sh make kubeadm KUBE_BUILD_PLATFORMS=linux/amd64
          build/run.sh make kubeadm KUBE_BUILD_PLATFORMS=linux/arm64

      # 构建好的二进制文件存放在 _output/dockerized/bin/ 中
      # 我们根据二进制目标文件的系统名称+CPU体系架构名称进行命名
      - name: Prepare for upload
        shell: bash
        run: |
          mv _output/dockerized/bin/linux/amd64/kubeadm kubeadm-linux-amd64
          mv _output/dockerized/bin/linux/arm64/kubeadm kubeadm-linux-arm64
          sha256sum kubeadm-linux-{amd64,arm64} > sha256sum.txt

      # 使用 softprops/action-gh-release 来将构建产物上传到 GitHub release 当中
      - name: Release and upload packages
        uses: softprops/action-gh-release@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          files: |
            sha256sum.txt
            kubeadm-linux-amd64
            kubeadm-linux-arm64

  • build/run.sh

: Run a command in a build docker container. Common invocations:

  • build/run.sh make: Build just linux binaries in the container. Pass options and packages as necessary.
  • build/run.sh make cross: Build all binaries for all platforms. To build only a specific platform, add KUBE_BUILD_PLATFORMS=<os>/<arch>
  • build/run.sh make kubectl KUBE_BUILD_PLATFORMS=darwin/amd64: Build the specific binary for the specific platform (kubectl and darwin/amd64 respectively in this example)
  • build/run.sh make test: Run all unit tests
  • build/run.sh make test-integration: Run integration test
  • build/run.sh make test-cmd: Run CLI tests

修改源码

  • cmd/kubeadm/app/constants/constants.go

找到 CertificateValidity 变量将它在 365 天后面加两个 0,就将证书续命为 100 年了。

// CertificateValidity defines the validity for all the signed certificates generated by kubeadm
-	CertificateValidity = time.Hour * 24 * 365
+	CertificateValidity = time.Hour * 24 * 36500

 	// CACertAndKeyBaseName defines certificate authority base name
 	CACertAndKeyBaseName = "ca"

  • staging/src/k8s.io/client-go/util/cert

找到NotAfter,默认生成ca证书是10年,改为100年

func NewSelfSignedCACert(cfg Config, key crypto.Signer) (*x509.Certificate, error) {
	now := time.Now()
	tmpl := x509.Certificate{
		SerialNumber: new(big.Int).SetInt64(0),
		Subject: pkix.Name{
			CommonName:   cfg.CommonName,
			Organization: cfg.Organization,
		},
		DNSNames:  []string{cfg.CommonName},
		NotBefore: now.UTC(),
		// NotAfter:              now.Add(duration365d * 10).UTC(),
		// extend ca cert to 100 years
		NotAfter:              now.Add(duration365d * 100).UTC(),
		KeyUsage:              x509.KeyUsageKeyEncipherment | x509.KeyUsageDigitalSignature | x509.KeyUsageCertSign,
		BasicConstraintsValid: true,
		IsCA:                  true,
	}

  • cmd/kubeadm/app/phases/upgrade/policy.go(可选修改)
    将以下改为30,以不受跨版本升级的限制

const (
	// MaximumAllowedMinorVersionUpgradeSkew describes how many minor versions kubeadm can upgrade the control plane version in one go
	// 改成30
	// MaximumAllowedMinorVersionUpgradeSkew = 1
	MaximumAllowedMinorVersionUpgradeSkew = 15

	// MaximumAllowedMinorVersionDowngradeSkew describes how many minor versions kubeadm can upgrade the control plane version in one go
	// 改成30
	// MaximumAllowedMinorVersionDowngradeSkew = 1
	MaximumAllowedMinorVersionDowngradeSkew = 15

	// MaximumAllowedMinorVersionKubeletSkew describes how many minor versions the control plane version and the kubelet can skew in a kubeadm cluster
	// 改成30
	// MaximumAllowedMinorVersionKubeletSkew = 3
	MaximumAllowedMinorVersionKubeletSkew = 15
)

  • cmd\kubeadm\app\constants\constants.go:
    • 更改最小版本限制:
    • MinimumControlPlaneVersion = getSkewedKubernetesVersion(-15)
    • MinimumKubeletVersion = getSkewedKubernetesVersion(-15)

提交:

  • git add .gitignore cmd/kubeadm/app/constants/constants.go cmd/kubeadm/app/phases/upgrade/policy.go staging/src/k8s.io/client-go/util/cert/cert.go .github/workflows/
  • git commit -m ‘延长证书至100年;使Kubeadm可跨多个大版本升级’
  • git commit -m ‘延长证书至100年;使Kubeadm可跨多个大版本升级’

cherry-pick

在分支上完成修改之后,我们将这个修改 cherry-pick 到其他的 tag 上面去,下面以 v1.21.4 为例子:在 v1.21.4 tag 的基础之上将上述的修改 cherry-pick 过来,重新打上新的 tag。

  • 获取上述修改的 commit id

$ COMMIT_ID=$(git rev-parse HEAD)

  • checkout 到 v1.21.4 这个 tag 上

$ git checkout v1.21.4
Note: checking out 'v1.21.4'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

HEAD is now at 3cce4a82b44 Release commit for Kubernetes v1.21.4

  • 将修改 cherry-pick 到当前 tag 上

$ git cherry-pick $COMMIT_ID
[detached HEAD baadbe03458] Update kubeadm CertificateValidity time to ten years
 Date: Tue Aug 24 16:32:49 2021 +0800
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 .github/workflows/kubeadm.yaml

  • 重新打上新的 tag,如 v1.21.4-patch-1.0

$ git tag v1.21.4-patch-1.0 -f
Updated tag 'v1.21.4-patch-1.0' (was 70bcbd6de6c)

  • 将 tag push 到 repo 中触发 workflow

$ git push origin --tags -f
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 4 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (10/10), 1.13 KiB | 192.00 KiB/s, done.
Total 10 (delta 7), reused 0 (delta 0)
remote: Resolving deltas: 100% (7/7), completed with 7 local objects.
To github.com:k8sli/kubernetes.git
 + c2a633e07ec...baadbe03458 v1.21.4-patch-1.0 -> v1.21.4-patch-1.0 (forced update)

总结

上面只展示了以一个 tag 为单位进行构建的流程,想要构建其他版本的 kubeadm ,可以按照同样的流程和方法来完成。其实写一个 shell 脚本来处理也是十分简单,如下:

#!/bin/bash

set -o errexit
set -o nounset

# 定义 commit ID
: ${COMMIT:="48e4b4c7c62a84ab4ec363588721011b73ee77e6"}

# 定义需要重新编译的版本号
: ${TAGS:="v1.22.1 v1.22.0 v1.21.4 v1.21.3 v1.20.10 v1.19.14 v1.18.10"}

for tag in ${TAGS}; do
    git reset --hard ${tag}
    git cherry-pick ${COMMIT}
    git tag ${tag}-patch-1.0
    git push origin ${tag}-patch-1.0
done

使用 GitHub Actions 的好处就是能够为我们解决代码管理和产物管理,构建好的二进制文件存放在 GitHub release 当中,下载和使用起来十分方便,不用在自己搞一台单独的机器或者存储服务器,节省很多人力维护成本。

更新:

如果下次又要修改的时候

先check到上次提交的tag: git checkout v1.32.2-patch-1.0

然后基于此tag创建一个新分支:git checkout -B kubeadm-1.32

修改完后,git add/commit, 然后

COMMIT_ID=$(git rev-parse HEAD)

git checkout v1.32.2-patch-1.0

git cherry-pick $COMMIT_ID

git tag v1.32.2-patch-1.0 -f

git push origin –tags -f

这样就可以不改变远程的tag名称,进行覆盖推送了,当然如果重新命名一个新tag推送也可以

]]>
https://open010.com/2023/12/26/%e4%bd%bf%e7%94%a8-github-actions-%e7%bc%96%e8%af%91-kubernetes-%e7%bb%84%e4%bb%b6/feed/ 0
github的token使用方法 https://open010.com/2023/12/26/github%e7%9a%84token%e4%bd%bf%e7%94%a8%e6%96%b9%e6%b3%95/ https://open010.com/2023/12/26/github%e7%9a%84token%e4%bd%bf%e7%94%a8%e6%96%b9%e6%b3%95/#respond Tue, 26 Dec 2023 06:20:00 +0000 https://meaninglive.com/?p=1847 github的token使用方法
今天从本地向github push代码发,失败了。错误消息如下:

remote: Support for password authentication was remove on August 123, 2021. Please use a personal access token instead.

原因是github不再使用密码方式验证身份,现在使用个人token。

本文记录,

  • 如何生成token
  • 在命令行下怎样使用token

github如何生成token

github的官方有给出如何生成个人token的文档。参考github官网生成token文档

 

命令行如何使用token

之前,github使用用户名和密码作为身份验证,现在使用用户名和token作为验证。

比如,github官网给出的示例。克隆一个仓库,提示输入用户名和密码,此处就可以使用上面生成的token作为密码使用。

$ git clone https://github.com/username/repo.git
Username: your_username
Password: your_token

但是有一个问题,我们总不能记住那么长的一串token吧

为了解决这个问题,github提供了gh工具,通过gh登录验证身份后,之后再不需要验证身份。

此处只演示ubuntu安装gh工具。

$ sudo apt update
$ sudo apt install snapd
$ sudo apt install gh

然后使用gh进行认证

$ gh auth login
# 输入你的用户名和token

如下图所示:使用键盘上下键选择对应项,回车键确认。
依次选择Github.com, HTTPS(如果使用的https协议), 选择使用网页浏览器认证或者粘贴token认证,二者选择一个即可。如果是ssh远程登录,命令行中无法打开远程的浏览器,那么只能选择token验证了。选择使用网页认证:先复制命令行中生成的一次性验证码,比如我这里本次是5C38-D954。然后回车,自动打开网页浏览器,输入一次性验证码,授权即可完成认证。如果上面选择使用token认证,那么输入你的token即可。

如果换了一台机器,那么重新生成一个新的token,然后gh auth login即可。

]]>
https://open010.com/2023/12/26/github%e7%9a%84token%e4%bd%bf%e7%94%a8%e6%96%b9%e6%b3%95/feed/ 0
解决ansible mitogen 0.3.0+ 插件未渲染ansible_ssh_common_args模板变量问题 https://open010.com/2023/12/07/%e8%a7%a3%e5%86%b3ansible-mitogen-0-3-0-%e6%8f%92%e4%bb%b6%e6%9c%aa%e6%b8%b2%e6%9f%93ansible_ssh_common_args%e6%a8%a1%e6%9d%bf%e5%8f%98%e9%87%8f%e9%97%ae%e9%a2%98/ https://open010.com/2023/12/07/%e8%a7%a3%e5%86%b3ansible-mitogen-0-3-0-%e6%8f%92%e4%bb%b6%e6%9c%aa%e6%b8%b2%e6%9f%93ansible_ssh_common_args%e6%a8%a1%e6%9d%bf%e5%8f%98%e9%87%8f%e9%97%ae%e9%a2%98/#respond Thu, 07 Dec 2023 09:47:22 +0000 https://meaninglive.com/?p=1841 问题

使用ansible mitogen 0.3.4插件进行kubespray安装时,报错:

EOF on stream; last 100 lines received:\nssh: Could not resolve hostname {%: Name or service not known

分析

经过debug分析,kubespray-default默认定义了如下变量模板:

ansible_ssh_common_args: |
"{% if 'bastion' in groups['all'] %} -o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p 
-p {{ hostvars['bastion']['ansible_port'] | default(22) }} {{ hostvars['bastion']['ansible_user'] }}@{{ hostvars['bastion']['ansible_host'] }} {% if ansible_ssh_private_key_file is defined %}
-i {{ ansible_ssh_private_key_file }}{% endif %} ' {% endif %}"

但是通过ansible role去执行后,通过mitogen进行ssh并没有渲染出来变量:
mitogen.parent: command line for Connection(None): ssh -o "LogLevel ERROR" -l root -p 22 -o "Compression yes" -o "ServerAliveInterval 30" -o "ServerAliveCountMax 10" 
-o "StrictHostKeyChecking no" -o "UserKnownHostsFile /dev/null" -o "GlobalKnownHostsFile /dev/null" -C -o ControlMaster=auto 
-o ControlPersist=60s {% if bastion in groups[all] %} -o "ProxyCommand=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p 
-p {{ hostvars[bastion][ansible_port] | default(22) }} {{ hostvars[bastion][ansible_user] }}@{{ hostvars[bastion][ansible_host] }} {% if ansible_ssh_private_key_file is defined %}
-i {{ ansible_ssh_private_key_file }}{% endif %} " {% endif %} 172.20.59.209 /usr/bin/python -c

可以看到,mitogen的ssh将ansible_ssh_common_args原封不动地输出来了
修改源码验证:
修改transport_config.py, 增加debug信息:
476     def ssh_args(self):
478         print('------------>')
479         print('ssh_args: ',C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})))
480         print('ssh_common_args: ',C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})))
481         print('ssh_extra_args: ', C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})))
482         print('----------->')
483         return [
484             mitogen.core.to_text(term)
485             for s in (
486                 C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
487                 C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
488                 C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {}))
489                 #C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=local_vars),
490                 #C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=local_vars),
491                 #C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=local_vars)
492             )
493             for term in ansible.utils.shlex.shlex_split(s or '')
494         ]

 

下面是输出:

------------>
ssh_args:  -o ControlMaster=auto -o ControlPersist=1d -o UserKnownHostsFile=/dev/null -o ConnectTimeout=30 -o Compression=yes -o TCPKeepAlive=yes -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o AddKeysToAgent=yes -o ControlPath=~/.ssh/%r@%h-%p
ssh_common_args:  {% if 'bastion' in groups['all'] %} -o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -p {{ hostvars['bastion']['ansible_port'] | default(22) }} {{ hostvars['bastion']['ansible_user'] }}@{{ hostvars['bastion']['ansible_host'] }} {% if ansible_ssh_private_key_file is defined %}-i {{ ansible_ssh_private_key_file }}{% endif %} ' {% endif %}
ssh_extra_args:  
----------->

 

可以看到ssh_common_args变量没有渲染

解决

_task_vars.get(“vars”, {}) 改为 _task_vars.get(“hostvars”, {}), 从hostvars取值

commit ac34252bcccb60b50e6a8ed3a3b2f42d256d62e0

总结

原代码是通过_task_vars.get(“vars”, {})来获取ssh参数,导致模板不能正确地解析,现修改为通过_task_vars.get(“hostvars”, {})进行获取
不知为何作者没有将此修复合并进去,导致目前最新的master分支也会出现这个问题。
]]>
https://open010.com/2023/12/07/%e8%a7%a3%e5%86%b3ansible-mitogen-0-3-0-%e6%8f%92%e4%bb%b6%e6%9c%aa%e6%b8%b2%e6%9f%93ansible_ssh_common_args%e6%a8%a1%e6%9d%bf%e5%8f%98%e9%87%8f%e9%97%ae%e9%a2%98/feed/ 0