云脉芯联网卡 Offload 支持¶
Kube-OVN 在最终的数据平面使用 OVS 来完成流量转发,相关的流表匹配,隧道封装等功能为 CPU 密集型,在大流量下会消耗大量 CPU 资源并导致 延迟上升和吞吐量下降。云脉芯联的 metaScale 系列智能网卡可以将 OVS 相关的操作卸载到硬件网卡上执行。该技术可以在无需对 OVS 控制平面进行修改的情况下,缩短数据路径,避免对主机 CPU 资源的使用,大幅降低延迟并显著提升吞吐量。
目前云脉芯联只适配了 1.11 系列版本的 Kube-OVN,部分最新功能无法使用。
前置条件¶
- 云脉芯联 metaScale 系列智能网卡。
- MCR 驱动安装包。
- BIOS 开启 SR-IOV 和 VT-d。
安装指南¶
启用 hw-offload 模式安装 Kube-OVN¶
- 下载 安装脚本
wget https://github.com/yunsilicon/kube-ovn/blob/release-1.11/dist/images/install.sh
- 配置节点
修改每个节点的 /opt/ovs-config/ovs-dpdk-config
文件:
# specify log level for ovs dpdk, the value is info or dbg, default is info
VLOG=info
# specify nic offload, the value is true or false, default is true
HW_OFFLOAD=true
# specify cpu mask for ovs dpdk, not specified by default
CPU_MASK=0x02
# specify socket memory, not specified by default
SOCKET_MEM="2048,2048"
# specify encap IP
ENCAP_IP=6.6.6.208/24
# specify pci device
DPDK_DEV=0000:b3:00.0
# specify mtu, default is 1500
PF_MTU=1500
# specify bond name if bond enabled, not specified by default
BR_PHY_BOND_NAME=bond0
- 安装 Kube-OVN
bash install.sh
配置 SR-IOV¶
- 找到 metaScale 设备的设备 ID,下面是
b3:00.0
:
[root@k8s-master ~]# lspci -d 1f67:
b3:00.0 Ethernet controller: Device 1f67:1111 (rev 02)
b3:00.1 Ethernet controller: Device 1f67:1111 (rev 02)
- 找到与设备 ID 相关的接口,下面是
p3p1
:
ls -l /sys/class/net/ | grep b3:00.0
lrwxrwxrwx 1 root root 0 May 7 16:30 p3p1 -> ../../devices/pci0000:b2/0000:b2:00.0/0000:b3:00.0/net/p3p1
- 检查可用的 VF 数量:
cat /sys/class/net/p3p1/device/sriov_totalvfs
512
- 创建 VF:
echo '10' > /sys/class/net/p3p1/device/sriov_numvfs
- 确认 VF 创建成功:
lspci -d 1f67:
b3:00.0 Ethernet controller: Device 1f67:1111 (rev 02)
b3:00.1 Ethernet controller: Device 1f67:1111 (rev 02)
b3:00.2 Ethernet controller: Device 1f67:1112
b3:00.3 Ethernet controller: Device 1f67:1112
b3:00.4 Ethernet controller: Device 1f67:1112
b3:00.5 Ethernet controller: Device 1f67:1112
b3:00.6 Ethernet controller: Device 1f67:1112
b3:00.7 Ethernet controller: Device 1f67:1112
b3:01.0 Ethernet controller: Device 1f67:1112
b3:01.1 Ethernet controller: Device 1f67:1112
b3:01.2 Ethernet controller: Device 1f67:1112
b3:01.3 Ethernet controller: Device 1f67:1112
- 启用 switchdev 模式:
devlink dev eswitch set pci/0000:b3:00.0 mode switchdev
安装 SR-IOV Device Plugin¶
- 创建 SR-IOV 资源 ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "xsc_sriov",
"resourcePrefix": "yunsilicon.com",
"selectors": {
"vendors": ["1f67"],
"devices": ["1012", "1112"]
}}
]
}
-
根据 SR-IOV Device Plugin 部署 DevicePlugin。
-
检查被自动识别出来的 SR-IOV 设备:
# kubectl describe node <node name> | grep yunsilicon.com/xsc_sriov
yunsilicon.com/xsc_sriov: 10
yunsilicon.com/xsc_sriov: 10
yunsilicon.com/xsc_sriov 0 0
安装 Multus-CNI¶
- 参考 Multus-CNI 来部署 Multus-CNI
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset.yml
- 创建 NetworkAttachmentDefinition
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net1
namespace: default
annotations:
k8s.v1.cni.cncf.io/resourceName: yunsilicon.com/xsc_sriov
spec:
config: '{
"cniVersion": "0.3.1",
"name": "kube-ovn",
"plugins":[
{
"type":"kube-ovn",
"server_socket":"/run/openvswitch/kube-ovn-daemon.sock",
"provider": "sriov-net1.default.ovn"
},
{
"type":"portmap",
"capabilities":{
"portMappings":true
}
}
]
}'
创建使用 SR-IOV 的 Pod¶
apiVersion: v1
kind: Pod
metadata:
name: nginx
annotations:
v1.multus-cni.io/default-network: default/sriov-net1
spec:
containers:
- name: nginx
image: docker.io/library/nginx:alpine
resources:
requests:
yunsilicon.com/xsc_sriov: '1'
limits:
yunsilicon.com/xsc_sriov: '1'
Offload 验证¶
可通过在 Pod 运行节点的 ovs-ovn
容器中运行下面的命令观察卸载是否成功:
ovs-appctl dpctl/dump-flows type=offloaded
flow-dump from pmd on cpu core: 9
ct_state(-new+est-rel+rpl+trk),ct_mark(0/0x3),recirc_id(0x2d277),in_port(15),packet_type(ns=0,id=0),eth(src=00:00:00:9d:fb:1a,dst=00:00:00:ce:cf:b9),eth_type(0x0800),ipv4(dst=10.16.0.14,frag=no), packets:6, bytes:588, used:7.276s, actions:ct(zone=4,nat),recirc(0x2d278)
ct_state(-new+est-rel-rpl+trk),ct_mark(0/0x3),recirc_id(0x2d275),in_port(8),packet_type(ns=0,id=0),eth(src=00:00:00:ce:cf:b9,dst=00:00:00:9d:fb:1a),eth_type(0x0800),ipv4(dst=10.16.0.18,frag=no), packets:5, bytes:490, used:7.434s, actions:ct(zone=6,nat),recirc(0x2d276)
ct_state(-new+est-rel-rpl+trk),ct_mark(0/0x1),recirc_id(0x2d276),in_port(8),packet_type(ns=0,id=0),eth(src=00:00:00:ce:cf:b9,dst=00:00:00:9d:fb:1a/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:5, bytes:490, used:7.434s, actions:15
recirc_id(0),in_port(15),packet_type(ns=0,id=0),eth(src=00:00:00:9d:fb:1a/01:00:00:00:00:00,dst=00:00:00:ce:cf:b9),eth_type(0x0800),ipv4(dst=10.16.0.14/255.192.0.0,frag=no), packets:6, bytes:588, used:7.277s, actions:ct(zone=6,nat),recirc(0x2d277)
recirc_id(0),in_port(8),packet_type(ns=0,id=0),eth(src=00:00:00:ce:cf:b9/01:00:00:00:00:00,dst=00:00:00:9d:fb:1a),eth_type(0x0800),ipv4(dst=10.16.0.18/255.192.0.0,frag=no), packets:6, bytes:588, used:7.434s, actions:ct(zone=4,nat),recirc(0x2d275)
ct_state(-new+est-rel+rpl+trk),ct_mark(0/0x1),recirc_id(0x2d278),in_port(15),packet_type(ns=0,id=0),eth(dst=00:00:00:ce:cf:b9/01:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:6, bytes:588, used:7.277s, actions:8
如果有流表内容证明卸载成功。