跳转至

kubectl 插件使用

为了方便日常的运维操作,Kube-OVN 提供了 kubectl 插件工具,网络管理员 可以通过该命令进行日常操作,例如:查看 OVN 数据库信息和状态,OVN 数据库 备份和恢复,OVS 相关信息查看,tcpdump 特定容器,特定链路逻辑拓扑展示, 网络问题诊断和性能优化。

插件安装

Kube-OVN 安装时默认会部署插件到每个节点,若执行 kubectl 的机器不在集群内, 或需要重装插件,可参考下面的步骤:

下载 kubectl-ko 文件:

wget https://raw.githubusercontent.com/kubeovn/kube-ovn/release-1.11/dist/images/kubectl-ko

将该文件移动至 $PATH 目录下:

mv kubectl-ko /usr/local/bin/kubectl-ko

增加可执行权限:

chmod +x /usr/local/bin/kubectl-ko

检查插件是否可以正常使用:

# kubectl plugin list
The following compatible plugins are available:

/usr/local/bin/kubectl-ko

插件使用

运行 kubectl ko 会展示该插件所有可用的命令和用法描述,如下所示:

# kubectl ko
kubectl ko {subcommand} [option...]
Available Subcommands:
  [nb|sb] [status|kick|backup|dbstatus|restore]     ovn-db operations show cluster status, kick stale server, backup database, get db consistency status or restore ovn nb db when met 'inconsistent data' error
  nbctl [ovn-nbctl options ...]    invoke ovn-nbctl
  sbctl [ovn-sbctl options ...]    invoke ovn-sbctl
  vsctl {nodeName} [ovs-vsctl options ...]   invoke ovs-vsctl on the specified node
  ofctl {nodeName} [ovs-ofctl options ...]   invoke ovs-ofctl on the specified node
  dpctl {nodeName} [ovs-dpctl options ...]   invoke ovs-dpctl on the specified node
  appctl {nodeName} [ovs-appctl options ...]   invoke ovs-appctl on the specified node
  tcpdump {namespace/podname} [tcpdump options ...]     capture pod traffic
  trace {namespace/podname} {target ip address} [target mac address] {icmp|tcp|udp} [target tcp/udp port]    trace ICMP/TCP/UDP
  trace {namespace/podname} {target ip address} [target mac address] arp {request|reply}                     trace ARP request/reply
  trace {node//nodename} {target ip address} [target mac address] {icmp|tcp|udp} [target tcp/udp port]       trace ICMP/TCP/UDP
  trace {node//nodename} {target ip address} [target mac address] arp {request|reply}                        trace ARP request/reply
  diagnose {all|node|subnet} [nodename|subnetName]    diagnose connectivity of all nodes or a specific node or specify subnet's ds pod
  tuning {install-fastpath|local-install-fastpath|remove-fastpath|install-stt|local-install-stt|remove-stt} {centos7|centos8}} [kernel-devel-version]  deploy  kernel optimisation components to the system
  reload    restart all kube-ovn components
  log {kube-ovn|ovn|ovs|linux|all}    save log to ./kubectl-ko-log/
  perf [image] performance test default image is kubeovn/test:v1.12.0  

下面将介绍每个命令的具体功能和使用。

[nb | sb] [status | kick | backup | dbstatus | restore]

该子命令主要对 OVN 北向或南向数据库进行操作,包括数据库集群状态查看,数据库节点下线, 数据库备份,数据库存储状态查看和数据库修复。

数据库集群状态查看

该命令会在对应 OVN 数据库的 leader 节点执行 ovs-appctl cluster/status 展示集群状态:

# kubectl ko nb status
306b
Name: OVN_Northbound
Cluster ID: 9a87 (9a872522-3e7d-47ca-83a3-d74333e1a7ca)
Server ID: 306b (306b256b-b5e1-4eb0-be91-4ca96adf6bad)
Address: tcp:[172.18.0.2]:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 280309 ms ago, reason: timeout
Last Election won: 280309 ms ago
Election timer: 5000
Log: [139, 139]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-8723 ->8723 <-85d6 ->85d6
Disconnections: 0
Servers:
    85d6 (85d6 at tcp:[172.18.0.4]:6643) next_index=139 match_index=138 last msg 763 ms ago
    8723 (8723 at tcp:[172.18.0.3]:6643) next_index=139 match_index=138 last msg 763 ms ago
    306b (306b at tcp:[172.18.0.2]:6643) (self) next_index=2 match_index=138
status: ok

Server 下的 match_index 出现较大差别,且 last msg 时间较长则对应 Server 可能长时间没有响应, 需要进一步查看。

数据库节点下线

该命令会将某个节点从 OVN 数据库中移除,在节点下线或更换节点时需要用到。 下面将以上一条命令所查看到的集群状态为例,下线 172.18.0.3 节点:

# kubectl ko nb kick 8723
started removal

再次查看数据库集群状态确认节点已移除:

# kubectl ko nb status
306b
Name: OVN_Northbound
Cluster ID: 9a87 (9a872522-3e7d-47ca-83a3-d74333e1a7ca)
Server ID: 306b (306b256b-b5e1-4eb0-be91-4ca96adf6bad)
Address: tcp:[172.18.0.2]:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 324356 ms ago, reason: timeout
Last Election won: 324356 ms ago
Election timer: 5000
Log: [140, 140]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-85d6 ->85d6
Disconnections: 2
Servers:
    85d6 (85d6 at tcp:[172.18.0.4]:6643) next_index=140 match_index=139 last msg 848 ms ago
    306b (306b at tcp:[172.18.0.2]:6643) (self) next_index=2 match_index=139
status: ok

数据库备份

该子命令会备份当前 OVN 数据库至本地,可用于灾备和恢复:

# kubectl ko nb backup
tar: Removing leading `/' from member names
backup ovn-nb db to /root/ovnnb_db.060223191654183154.backup

数据库存储状态查看

该命令用来查看数据库文件是否存在损坏:

# kubectl ko nb dbstatus
status: ok

若异常则显示 inconsistent data 需要使用下面的命令进行修复。

数据库修复

若数据库状态进入 inconsistent data 可使用该命令进行修复:

# kubectl ko nb restore
deployment.apps/ovn-central scaled
ovn-central original replicas is 3
first nodeIP is 172.18.0.5
ovs-ovn pod on node 172.18.0.5 is ovs-ovn-8jxv9
ovs-ovn pod on node 172.18.0.3 is ovs-ovn-sjzb6
ovs-ovn pod on node 172.18.0.4 is ovs-ovn-t87zk
backup nb db file
restore nb db file, operate in pod ovs-ovn-8jxv9
deployment.apps/ovn-central scaled
finish restore nb db file and ovn-central replicas
recreate ovs-ovn pods
pod "ovs-ovn-8jxv9" deleted
pod "ovs-ovn-sjzb6" deleted
pod "ovs-ovn-t87zk" deleted

[nbctl | sbctl] [options ...]

该子命令会直接进入 OVN 北向数据库或南向数据库 的 leader 节点分别执行 ovn-nbctlovn-sbctl 命令。 更多该命令的详细用法请查询上游 OVN 的官方文档 ovn-nbctl(8)ovn-sbctl(8)

# kubectl ko nbctl show
switch c7cd17e8-ceee-4a91-9bb3-e5a313fe1ece (snat)
    port snat-ovn-cluster
        type: router
        router-port: ovn-cluster-snat
switch 20e0c6d0-023a-4756-aec5-200e0c60f95d (join)
    port node-liumengxin-ovn3-192.168.137.178
        addresses: ["00:00:00:64:FF:A8 100.64.0.4"]
    port node-liumengxin-ovn1-192.168.137.176
        addresses: ["00:00:00:AF:98:62 100.64.0.2"]
    port node-liumengxin-ovn2-192.168.137.177
        addresses: ["00:00:00:D9:58:B8 100.64.0.3"]
    port join-ovn-cluster
        type: router
        router-port: ovn-cluster-join
switch 0191705c-f827-427b-9de3-3c3b7d971ba5 (central)
    port central-ovn-cluster
        type: router
        router-port: ovn-cluster-central
switch 2a45ff05-388d-4f85-9daf-e6fccd5833dc (ovn-default)
    port alertmanager-main-0.monitoring
        addresses: ["00:00:00:6C:DF:A3 10.16.0.19"]
    port kube-state-metrics-5d6885d89-4nf8h.monitoring
        addresses: ["00:00:00:6F:02:1C 10.16.0.15"]
    port fake-kubelet-67c55dfd89-pv86k.kube-system
        addresses: ["00:00:00:5C:12:E8 10.16.19.177"]
    port ovn-default-ovn-cluster
        type: router
        router-port: ovn-cluster-ovn-default
router 212f73dd-d63d-4d72-864b-a537e9afbee1 (ovn-cluster)
    port ovn-cluster-snat
        mac: "00:00:00:7A:82:8F"
        networks: ["172.22.0.1/16"]
    port ovn-cluster-join
        mac: "00:00:00:F8:18:5A"
        networks: ["100.64.0.1/16"]
    port ovn-cluster-central
        mac: "00:00:00:4D:8C:F5"
        networks: ["192.101.0.1/16"]
    port ovn-cluster-ovn-default
        mac: "00:00:00:A3:F8:18"
        networks: ["10.16.0.1/16"]

vsctl {nodeName} [options ...]

该命令会进入对应 nodeName 上的 ovs-ovn 容器,并执行相应的 ovs-vsctl 命令,查询并配置 vswitchd。 更多该命令的详细用法请查询上游 OVS 的官方文档 ovs-vsctl(8)

# kubectl ko vsctl kube-ovn-01 show
0d4c4675-c9cc-440a-8c1a-878e17f81b88
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port a2c1a8a8b83a_h
            Interface a2c1a8a8b83a_h
        Port "4fa5c4cbb1a5_h"
            Interface "4fa5c4cbb1a5_h"
        Port ovn-eef07d-0
            Interface ovn-eef07d-0
                type: stt
                options: {csum="true", key=flow, remote_ip="192.168.137.178"}
        Port ovn0
            Interface ovn0
                type: internal
        Port mirror0
            Interface mirror0
                type: internal
        Port ovn-efa253-0
            Interface ovn-efa253-0
                type: stt
                options: {csum="true", key=flow, remote_ip="192.168.137.177"}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.17.2"

ofctl {nodeName} [options ...]

该命令会进入对应 nodeName 上的 ovs-ovn 容器,并执行相应的 ovs-ofctl 命令,查询或管理 OpenFlow。 更多该命令的详细用法请查询上游 OVS 的官方文档 ovs-ofctl(8)

# kubectl ko ofctl kube-ovn-01 dump-flows br-int
NXST_FLOW reply (xid=0x4): flags=[more]
 cookie=0xcf3429e6, duration=671791.432s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=100,in_port=2 actions=load:0x4->NXM_NX_REG13[],load:0x9->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],load:0x4->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
 cookie=0xc91413c6, duration=671791.431s, table=0, n_packets=907489, n_bytes=99978275, idle_age=0, hard_age=65534, priority=100,in_port=7 actions=load:0x1->NXM_NX_REG13[],load:0x9->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],load:0x4->OXM_OF_METADATA[],load:0x4->NXM_NX_REG14[],resubmit(,8)
 cookie=0xf180459, duration=671791.431s, table=0, n_packets=17348582, n_bytes=2667811214, idle_age=0, hard_age=65534, priority=100,in_port=6317 actions=load:0xa->NXM_NX_REG13[],load:0x9->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],load:0x4->OXM_OF_METADATA[],load:0x9->NXM_NX_REG14[],resubmit(,8)
 cookie=0x7806dd90, duration=671791.431s, table=0, n_packets=3235428, n_bytes=833821312, idle_age=0, hard_age=65534, priority=100,in_port=1 actions=load:0xd->NXM_NX_REG13[],load:0x9->NXM_NX_REG11[],load:0xb->NXM_NX_REG12[],load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],resubmit(,8)
...

dpctl {nodeName} [options ...]

该命令会进入对应 nodeName 上的 ovs-ovn 容器,并执行相应的 ovs-dpctl 命令,查询或管理 OVS datapath。 更多该命令的详细用法请查询上游 OVS 的官方文档 ovs-dpctl(8)

# kubectl ko dpctl kube-ovn-01 show
system@ovs-system:
  lookups: hit:350805055 missed:21983648 lost:73
  flows: 105
  masks: hit:1970748791 total:22 hit/pkt:5.29
  port 0: ovs-system (internal)
  port 1: ovn0 (internal)
  port 2: mirror0 (internal)
  port 3: br-int (internal)
  port 4: stt_sys_7471 (stt: packet_type=ptap)
  port 5: eeb4d9e51b5d_h
  port 6: a2c1a8a8b83a_h
  port 7: 4fa5c4cbb1a5_h

appctl {nodeName} [options ...]

该命令会进入对应 nodeName 上的 ovs-ovn 容器,并执行相应的 ovs-appctl 命令,来操作相关 daemon 进程。 更多该命令的详细用法请查询上游 OVS 的官方文档 ovs-appctl(8)

# kubectl ko appctl kube-ovn-01 vlog/list
                 console    syslog    file
                 -------    ------    ------
backtrace          OFF        ERR       INFO
bfd                OFF        ERR       INFO
bond               OFF        ERR       INFO
bridge             OFF        ERR       INFO
bundle             OFF        ERR       INFO
bundles            OFF        ERR       INFO
...

tcpdump {namespace/podname} [tcpdump options ...]

该命令会进入 namespace/podname 所在机器的 kube-ovn-cni 容器,并执行 tcpdump 抓取对应容器 veth 网卡 端的流量,可以方便排查网络相关问题,如下所示:

# kubectl ko tcpdump default/ds1-l6n7p icmp
+ kubectl exec -it kube-ovn-cni-wlg4s -n kube-ovn -- tcpdump -nn -i d7176fe7b4e0_h icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on d7176fe7b4e0_h, link-type EN10MB (Ethernet), capture size 262144 bytes
06:52:36.619688 IP 100.64.0.3 > 10.16.0.4: ICMP echo request, id 2, seq 1, length 64
06:52:36.619746 IP 10.16.0.4 > 100.64.0.3: ICMP echo reply, id 2, seq 1, length 64
06:52:37.619588 IP 100.64.0.3 > 10.16.0.4: ICMP echo request, id 2, seq 2, length 64
06:52:37.619630 IP 10.16.0.4 > 100.64.0.3: ICMP echo reply, id 2, seq 2, length 64
06:52:38.619933 IP 100.64.0.3 > 10.16.0.4: ICMP echo request, id 2, seq 3, length 64
06:52:38.619973 IP 10.16.0.4 > 100.64.0.3: ICMP echo reply, id 2, seq 3, length 64

trace {namespace/podname} {target ip address} [target mac address] {icmp|tcp|udp} [target tcp or udp port]

该命令将会打印 Pod 通过特定协议访问某地址时对应的 OVN 逻辑流表和最终的 Openflow 流表, 方便开发或运维时定位流表相关问题。

# kubectl ko trace default/ds1-l6n7p 8.8.8.8 icmp
+ kubectl exec ovn-central-5bc494cb5-np9hm -n kube-ovn -- ovn-trace --ct=new ovn-default 'inport == "ds1-l6n7p.default" && ip.ttl == 64 && icmp && eth.src == 0a:00:00:10:00:05 && ip4.src == 10.16.0.4 && eth.dst == 00:00:00:B8:CA:43 && ip4.dst == 8.8.8.8'
# icmp,reg14=0xf,vlan_tci=0x0000,dl_src=0a:00:00:10:00:05,dl_dst=00:00:00:b8:ca:43,nw_src=10.16.0.4,nw_dst=8.8.8.8,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

ingress(dp="ovn-default", inport="ds1-l6n7p.default")
-----------------------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:4143): inport == "ds1-l6n7p.default" && eth.src == {0a:00:00:10:00:05}, priority 50, uuid 39453393
    next;
 1. ls_in_port_sec_ip (ovn-northd.c:2898): inport == "ds1-l6n7p.default" && eth.src == 0a:00:00:10:00:05 && ip4.src == {10.16.0.4}, priority 90, uuid 81bcd485
    next;
 3. ls_in_pre_acl (ovn-northd.c:3269): ip, priority 100, uuid 7b4f4971
    reg0[0] = 1;
    next;
 5. ls_in_pre_stateful (ovn-northd.c:3396): reg0[0] == 1, priority 100, uuid 36cdd577
    ct_next;

ct_next(ct_state=new|trk)
-------------------------
 6. ls_in_acl (ovn-northd.c:3759): ip && (!ct.est || (ct.est && ct_label.blocked == 1)), priority 1, uuid 7608af5b
    reg0[1] = 1;
    next;
10. ls_in_stateful (ovn-northd.c:3995): reg0[1] == 1, priority 100, uuid 2aba1b90
    ct_commit(ct_label=0/0x1);
    next;
16. ls_in_l2_lkup (ovn-northd.c:4470): eth.dst == 00:00:00:b8:ca:43, priority 50, uuid 5c9c3c9f
    outport = "ovn-default-ovn-cluster";
    output;

...

若 trace 对象为运行于 Underlay 网络下的虚拟机,需要添加额外参数来指定目的 Mac 地址:

kubectl ko trace default/virt-handler-7lvml 8.8.8.8 82:7c:9f:83:8c:01 icmp

diagnose {all|node|subnet} [nodename|subnetName]

诊断集群网络组件状态,并去对应节点的 kube-ovn-pinger 检测当前节点到其他节点和关键服务的连通性和网络延迟:

# kubectl ko diagnose all
switch c7cd17e8-ceee-4a91-9bb3-e5a313fe1ece (snat)
    port snat-ovn-cluster
        type: router
        router-port: ovn-cluster-snat
switch 20e0c6d0-023a-4756-aec5-200e0c60f95d (join)
    port node-liumengxin-ovn3-192.168.137.178
        addresses: ["00:00:00:64:FF:A8 100.64.0.4"]
    port node-liumengxin-ovn1-192.168.137.176
        addresses: ["00:00:00:AF:98:62 100.64.0.2"]
    port join-ovn-cluster
        type: router
        router-port: ovn-cluster-join
switch 0191705c-f827-427b-9de3-3c3b7d971ba5 (central)
    port central-ovn-cluster
        type: router
        router-port: ovn-cluster-central
switch 2a45ff05-388d-4f85-9daf-e6fccd5833dc (ovn-default)
    port ovn-default-ovn-cluster
        type: router
        router-port: ovn-cluster-ovn-default
    port prometheus-k8s-1.monitoring
        addresses: ["00:00:00:AA:37:DF 10.16.0.23"]
router 212f73dd-d63d-4d72-864b-a537e9afbee1 (ovn-cluster)
    port ovn-cluster-snat
        mac: "00:00:00:7A:82:8F"
        networks: ["172.22.0.1/16"]
    port ovn-cluster-join
        mac: "00:00:00:F8:18:5A"
        networks: ["100.64.0.1/16"]
    port ovn-cluster-central
        mac: "00:00:00:4D:8C:F5"
        networks: ["192.101.0.1/16"]
    port ovn-cluster-ovn-default
        mac: "00:00:00:A3:F8:18"
        networks: ["10.16.0.1/16"]
Routing Policies
     31000                            ip4.dst == 10.16.0.0/16           allow
     31000                           ip4.dst == 100.64.0.0/16           allow
     30000                         ip4.dst == 192.168.137.177         reroute                100.64.0.3
     30000                         ip4.dst == 192.168.137.178         reroute                100.64.0.4
     29000                 ip4.src == $ovn.default.fake.6_ip4         reroute               100.64.0.22
     29000                 ip4.src == $ovn.default.fake.7_ip4         reroute               100.64.0.21
     29000                 ip4.src == $ovn.default.fake.8_ip4         reroute               100.64.0.23
     29000 ip4.src == $ovn.default.liumengxin.ovn3.192.168.137.178_ip4         reroute                100.64.0.4
     20000 ip4.src == $ovn.default.liumengxin.ovn1.192.168.137.176_ip4 && ip4.dst != $ovn.cluster.overlay.subnets.IPv4         reroute                100.64.0.2
     20000 ip4.src == $ovn.default.liumengxin.ovn2.192.168.137.177_ip4 && ip4.dst != $ovn.cluster.overlay.subnets.IPv4         reroute                100.64.0.3
     20000 ip4.src == $ovn.default.liumengxin.ovn3.192.168.137.178_ip4 && ip4.dst != $ovn.cluster.overlay.subnets.IPv4         reroute                100.64.0.4
IPv4 Routes
Route Table <main>:
                0.0.0.0/0                100.64.0.1 dst-ip
UUID                                    LB                  PROTO      VIP                     IPs
e9bcfd9d-793e-4431-9073-6dec96b75d71    cluster-tcp-load    tcp        10.100.209.132:10660    192.168.137.176:10660
                                                            tcp        10.101.239.192:6641     192.168.137.177:6641
                                                            tcp        10.101.240.101:3000     10.16.0.7:3000
                                                            tcp        10.103.184.186:6642     192.168.137.177:6642
35d2b7a5-e3a7-485a-a4b7-b4970eb0e63b    cluster-tcp-sess    tcp        10.100.158.128:8080     10.16.0.10:8080,10.16.0.5:8080,10.16.63.30:8080
                                                            tcp        10.107.26.215:8080      10.16.0.19:8080,10.16.0.20:8080,10.16.0.21:8080
                                                            tcp        10.107.26.215:9093      10.16.0.19:9093,10.16.0.20:9093,10.16.0.21:9093
                                                            tcp        10.98.187.99:8080       10.16.0.22:8080,10.16.0.23:8080
                                                            tcp        10.98.187.99:9090       10.16.0.22:9090,10.16.0.23:9090
f43303e4-89aa-4d3e-a3dc-278a552fe27b    cluster-udp-load    udp        10.96.0.10:53           10.16.0.4:53,10.16.0.9:53
_uuid               : 06776304-5a96-43ed-90c4-c4854c251699
addresses           : []
external_ids        : {vendor=kube-ovn}
name                : node_liumengxin_ovn2_192.168.137.177_underlay_v6

_uuid               : 62690625-87d5-491c-8675-9fd83b1f433c
addresses           : []
external_ids        : {vendor=kube-ovn}
name                : node_liumengxin_ovn1_192.168.137.176_underlay_v6

_uuid               : b03a9bae-94d5-4562-b34c-b5f6198e180b
addresses           : ["10.16.0.0/16", "100.64.0.0/16", "172.22.0.0/16", "192.101.0.0/16"]
external_ids        : {vendor=kube-ovn}
name                : ovn.cluster.overlay.subnets.IPv4

_uuid               : e1056f3a-24cc-4666-8a91-75ee6c3c2426
addresses           : []
external_ids        : {vendor=kube-ovn}
name                : ovn.cluster.overlay.subnets.IPv6

_uuid               : 3e5d5fff-e670-47b2-a2f5-a39f4698a8c5
addresses           : []
external_ids        : {vendor=kube-ovn}
name                : node_liumengxin_ovn3_192.168.137.178_underlay_v6
_uuid               : 2d85dbdc-d0db-4abe-b19e-cc806d32b492
action              : drop
direction           : from-lport
external_ids        : {}
label               : 0
log                 : false
match               : "inport==@ovn.sg.kubeovn_deny_all && ip"
meter               : []
name                : []
options             : {}
priority            : 2003
severity            : []

_uuid               : de790cc8-f155-405f-bb32-5a51f30c545f
action              : drop
direction           : to-lport
external_ids        : {}
label               : 0
log                 : false
match               : "outport==@ovn.sg.kubeovn_deny_all && ip"
meter               : []
name                : []
options             : {}
priority            : 2003
severity            : []
Chassis "e15ed4d4-1780-4d50-b09e-ea8372ed48b8"
    hostname: liumengxin-ovn1-192.168.137.176
    Encap stt
        ip: "192.168.137.176"
        options: {csum="true"}
    Port_Binding node-liumengxin-ovn1-192.168.137.176
    Port_Binding perf-6vxkn.default
    Port_Binding kube-state-metrics-5d6885d89-4nf8h.monitoring
    Port_Binding alertmanager-main-0.monitoring
    Port_Binding kube-ovn-pinger-6ftdf.kube-system
    Port_Binding fake-kubelet-67c55dfd89-pv86k.kube-system
    Port_Binding prometheus-k8s-0.monitoring
Chassis "eef07da1-f8ad-4775-b14d-bd6a3b4eb0d5"
    hostname: liumengxin-ovn3-192.168.137.178
    Encap stt
        ip: "192.168.137.178"
        options: {csum="true"}
    Port_Binding kube-ovn-pinger-7twb4.kube-system
    Port_Binding prometheus-adapter-86df476d87-rl88g.monitoring
    Port_Binding prometheus-k8s-1.monitoring
    Port_Binding node-liumengxin-ovn3-192.168.137.178
    Port_Binding perf-ff475.default
    Port_Binding alertmanager-main-1.monitoring
    Port_Binding blackbox-exporter-676d976865-tvsjd.monitoring
Chassis "efa253c9-494d-4719-83ae-b48ab0f11c03"
    hostname: liumengxin-ovn2-192.168.137.177
    Encap stt
        ip: "192.168.137.177"
        options: {csum="true"}
    Port_Binding grafana-6c4c6b8fb7-pzd2c.monitoring
    Port_Binding node-liumengxin-ovn2-192.168.137.177
    Port_Binding alertmanager-main-2.monitoring
    Port_Binding coredns-6789c94dd8-9jqsz.kube-system
    Port_Binding coredns-6789c94dd8-25d4r.kube-system
    Port_Binding prometheus-operator-7bbc99fc8b-wgjm4.monitoring
    Port_Binding prometheus-adapter-86df476d87-gdxmc.monitoring
    Port_Binding perf-fjnws.default
    Port_Binding kube-ovn-pinger-vh2xg.kube-system
ds kube-proxy ready
kube-proxy ready
deployment ovn-central ready
deployment kube-ovn-controller ready
ds kube-ovn-cni ready
ds ovs-ovn ready
deployment coredns ready
ovn-nb leader check ok
ovn-sb leader check ok
ovn-northd leader check ok
### kube-ovn-controller recent log

### start to diagnose node liumengxin-ovn1-192.168.137.176
#### ovn-controller log:
2022-06-03T00:56:44.897Z|16722|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:06:44.912Z|16723|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:16:44.925Z|16724|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:26:44.936Z|16725|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:36:44.959Z|16726|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:46:44.974Z|16727|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T01:56:44.988Z|16728|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T02:06:45.001Z|16729|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T02:16:45.025Z|16730|inc_proc_eng|INFO|User triggered force recompute.
2022-06-03T02:26:45.040Z|16731|inc_proc_eng|INFO|User triggered force recompute.

#### ovs-vswitchd log:
2022-06-02T23:03:00.137Z|00079|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=14,label=0/0x1,nat(src)),8 failed (Invalid argument) on packet icmp,vlan_tci=0x0000,dl_src=00:00:00:f8:07:c8,dl_dst=00:00:00:fa:1e:50,nw_src=10.16.0.5,nw_dst=10.16.0.10,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:f9d1
 with metadata skb_priority(0),tunnel(tun_id=0x160017000004,src=192.168.137.177,dst=192.168.137.176,ttl=64,tp_src=38881,tp_dst=7471,flags(csum|key)),skb_mark(0),ct_state(0x21),ct_zone(0xe),ct_tuple4(src=10.16.0.5,dst=10.16.0.10,proto=1,tp_src=8,tp_dst=0),in_port(4) mtu 0
2022-06-02T23:23:31.840Z|00080|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=14,label=0/0x1,nat(src)),8 failed (Invalid argument) on packet icmp,vlan_tci=0x0000,dl_src=00:00:00:f8:07:c8,dl_dst=00:00:00:fa:1e:50,nw_src=10.16.0.5,nw_dst=10.16.0.10,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:15b2
 with metadata skb_priority(0),tunnel(tun_id=0x160017000004,src=192.168.137.177,dst=192.168.137.176,ttl=64,tp_src=38881,tp_dst=7471,flags(csum|key)),skb_mark(0),ct_state(0x21),ct_zone(0xe),ct_tuple4(src=10.16.0.5,dst=10.16.0.10,proto=1,tp_src=8,tp_dst=0),in_port(4) mtu 0
2022-06-03T00:09:15.659Z|00081|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=14,label=0/0x1,nat(src)),8 failed (Invalid argument) on packet icmp,vlan_tci=0x0000,dl_src=00:00:00:dc:e3:63,dl_dst=00:00:00:fa:1e:50,nw_src=10.16.63.30,nw_dst=10.16.0.10,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:e5a5
 with metadata skb_priority(0),tunnel(tun_id=0x150017000004,src=192.168.137.178,dst=192.168.137.176,ttl=64,tp_src=9239,tp_dst=7471,flags(csum|key)),skb_mark(0),ct_state(0x21),ct_zone(0xe),ct_tuple4(src=10.16.63.30,dst=10.16.0.10,proto=1,tp_src=8,tp_dst=0),in_port(4) mtu 0
2022-06-03T00:30:13.409Z|00064|dpif(handler2)|WARN|system@ovs-system: execute ct(commit,zone=14,label=0/0x1,nat(src)),8 failed (Invalid argument) on packet icmp,vlan_tci=0x0000,dl_src=00:00:00:f8:07:c8,dl_dst=00:00:00:fa:1e:50,nw_src=10.16.0.5,nw_dst=10.16.0.10,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:6b4a
 with metadata skb_priority(0),tunnel(tun_id=0x160017000004,src=192.168.137.177,dst=192.168.137.176,ttl=64,tp_src=38881,tp_dst=7471,flags(csum|key)),skb_mark(0),ct_state(0x21),ct_zone(0xe),ct_tuple4(src=10.16.0.5,dst=10.16.0.10,proto=1,tp_src=8,tp_dst=0),in_port(4) mtu 0
2022-06-03T02:02:33.832Z|00082|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=14,label=0/0x1,nat(src)),8 failed (Invalid argument) on packet icmp,vlan_tci=0x0000,dl_src=00:00:00:f8:07:c8,dl_dst=00:00:00:fa:1e:50,nw_src=10.16.0.5,nw_dst=10.16.0.10,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0 icmp_csum:a819
 with metadata skb_priority(0),tunnel(tun_id=0x160017000004,src=192.168.137.177,dst=192.168.137.176,ttl=64,tp_src=38881,tp_dst=7471,flags(csum|key)),skb_mark(0),ct_state(0x21),ct_zone(0xe),ct_tuple4(src=10.16.0.5,dst=10.16.0.10,proto=1,tp_src=8,tp_dst=0),in_port(4) mtu 0

#### ovs-vsctl show results:
0d4c4675-c9cc-440a-8c1a-878e17f81b88
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port a2c1a8a8b83a_h
            Interface a2c1a8a8b83a_h
        Port "4fa5c4cbb1a5_h"
            Interface "4fa5c4cbb1a5_h"
        Port ovn-eef07d-0
            Interface ovn-eef07d-0
                type: stt
                options: {csum="true", key=flow, remote_ip="192.168.137.178"}
        Port ovn0
            Interface ovn0
                type: internal
        Port "04d03360e9a0_h"
            Interface "04d03360e9a0_h"
        Port eeb4d9e51b5d_h
            Interface eeb4d9e51b5d_h
        Port mirror0
            Interface mirror0
                type: internal
        Port "8e5d887ccd80_h"
            Interface "8e5d887ccd80_h"
        Port ovn-efa253-0
            Interface ovn-efa253-0
                type: stt
                options: {csum="true", key=flow, remote_ip="192.168.137.177"}
        Port "17512d5be1f1_h"
            Interface "17512d5be1f1_h"
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.17.2"

#### pinger diagnose results:
I0603 10:35:04.349404   17619 pinger.go:19]
-------------------------------------------------------------------------------
Kube-OVN:
  Version:       v1.11.14
  Build:         2022-04-24_08:02:50
  Commit:        git-73f9d15
  Go Version:    go1.17.8
  Arch:          amd64
-------------------------------------------------------------------------------
I0603 10:35:04.376797   17619 config.go:166] pinger config is &{KubeConfigFile: KubeClient:0xc000493380 Port:8080 DaemonSetNamespace:kube-system DaemonSetName:kube-ovn-pinger Interval:5 Mode:job ExitCode:0 InternalDNS:kubernetes.default ExternalDNS: NodeName:liumengxin-ovn1-192.168.137.176 HostIP:192.168.137.176 PodName:kube-ovn-pinger-6ftdf PodIP:10.16.0.10 PodProtocols:[IPv4] ExternalAddress: NetworkMode:kube-ovn PollTimeout:2 PollInterval:15 SystemRunDir:/var/run/openvswitch DatabaseVswitchName:Open_vSwitch DatabaseVswitchSocketRemote:unix:/var/run/openvswitch/db.sock DatabaseVswitchFileDataPath:/etc/openvswitch/conf.db DatabaseVswitchFileLogPath:/var/log/openvswitch/ovsdb-server.log DatabaseVswitchFilePidPath:/var/run/openvswitch/ovsdb-server.pid DatabaseVswitchFileSystemIDPath:/etc/openvswitch/system-id.conf ServiceVswitchdFileLogPath:/var/log/openvswitch/ovs-vswitchd.log ServiceVswitchdFilePidPath:/var/run/openvswitch/ovs-vswitchd.pid ServiceOvnControllerFileLogPath:/var/log/ovn/ovn-controller.log ServiceOvnControllerFilePidPath:/var/run/ovn/ovn-controller.pid}
I0603 10:35:04.449166   17619 exporter.go:75] liumengxin-ovn1-192.168.137.176: exporter connect successfully
I0603 10:35:04.554011   17619 ovn.go:21] ovs-vswitchd and ovsdb are up
I0603 10:35:04.651293   17619 ovn.go:33] ovn_controller is up
I0603 10:35:04.651342   17619 ovn.go:39] start to check port binding
I0603 10:35:04.749613   17619 ovn.go:135] chassis id is 1d7f3d6c-eec5-4b3c-adca-2969d9cdfd80
I0603 10:35:04.763487   17619 ovn.go:49] port in sb is [node-liumengxin-ovn1-192.168.137.176 perf-6vxkn.default kube-state-metrics-5d6885d89-4nf8h.monitoring alertmanager-main-0.monitoring kube-ovn-pinger-6ftdf.kube-system fake-kubelet-67c55dfd89-pv86k.kube-system prometheus-k8s-0.monitoring]
I0603 10:35:04.763583   17619 ovn.go:61] ovs and ovn-sb binding check passed
I0603 10:35:05.049309   17619 ping.go:259] start to check apiserver connectivity
I0603 10:35:05.053666   17619 ping.go:268] connect to apiserver success in 4.27ms
I0603 10:35:05.053786   17619 ping.go:129] start to check pod connectivity
I0603 10:35:05.249590   17619 ping.go:159] ping pod: kube-ovn-pinger-6ftdf 10.16.0.10, count: 3, loss count 0, average rtt 16.30ms
I0603 10:35:05.354135   17619 ping.go:159] ping pod: kube-ovn-pinger-7twb4 10.16.63.30, count: 3, loss count 0, average rtt 1.81ms
I0603 10:35:05.458460   17619 ping.go:159] ping pod: kube-ovn-pinger-vh2xg 10.16.0.5, count: 3, loss count 0, average rtt 1.92ms
I0603 10:35:05.458523   17619 ping.go:83] start to check node connectivity

如果 diagnose 的目标指定为 subnet 该脚本会在 subnet 上建立 daemonset,由 kube-ovn-pinger 去探测这个 daemonset 的所有 pod 的连通性和网络延时,测试完后自动销毁该 daemonset。

tuning {install-fastpath|local-install-fastpath|remove-fastpath|install-stt|local-install-stt|remove-stt} {centos7|centos8}} [kernel-devel-version]

该命令执行性能调优相关操作,具体使用请参考性能调优

reload

该命令重启所有 Kube-OVN 相关组件:

# kubectl ko reload
pod "ovn-central-8684dd94bd-vzgcr" deleted
Waiting for deployment "ovn-central" rollout to finish: 0 of 1 updated replicas are available...
deployment "ovn-central" successfully rolled out
pod "ovs-ovn-bsnvz" deleted
pod "ovs-ovn-m9b98" deleted
pod "kube-ovn-controller-8459db5ff4-64c62" deleted
Waiting for deployment "kube-ovn-controller" rollout to finish: 0 of 1 updated replicas are available...
deployment "kube-ovn-controller" successfully rolled out
pod "kube-ovn-cni-2klnh" deleted
pod "kube-ovn-cni-t2jz4" deleted
Waiting for daemon set "kube-ovn-cni" rollout to finish: 0 of 2 updated pods are available...
Waiting for daemon set "kube-ovn-cni" rollout to finish: 1 of 2 updated pods are available...
daemon set "kube-ovn-cni" successfully rolled out
pod "kube-ovn-pinger-ln72z" deleted
pod "kube-ovn-pinger-w8lrk" deleted
Waiting for daemon set "kube-ovn-pinger" rollout to finish: 0 of 2 updated pods are available...
Waiting for daemon set "kube-ovn-pinger" rollout to finish: 1 of 2 updated pods are available...
daemon set "kube-ovn-pinger" successfully rolled out
pod "kube-ovn-monitor-7fb67d5488-7q6zb" deleted
Waiting for deployment "kube-ovn-monitor" rollout to finish: 0 of 1 updated replicas are available...
deployment "kube-ovn-monitor" successfully rolled out

log

使用该命令会抓取 kube-ovn 所有节点上的 Kube-OVN,OVN,Openvswitch 的 log 以及 linux 常用的一些 debug 信息。

# kubectl ko log all
Collecting kube-ovn logging files
Collecting ovn logging files
Collecting openvswitch logging files
Collecting linux dmesg files
Collecting linux iptables-legacy files
Collecting linux iptables-nft files
Collecting linux route files
Collecting linux link files
Collecting linux neigh files
Collecting linux memory files
Collecting linux top files
Collecting linux sysctl files
Collecting linux netstat files
Collecting linux addr files
Collecting linux ipset files
Collecting linux tcp files
Collected files have been saved in the directory /root/kubectl-ko-log

目录如下:

# tree kubectl-ko-log/
kubectl-ko-log/
|-- kube-ovn-control-plane
|   |-- kube-ovn
|   |   |-- kube-ovn-cni.log
|   |   |-- kube-ovn-monitor.log
|   |   `-- kube-ovn-pinger.log
|   |-- linux
|   |   |-- addr.log
|   |   |-- dmesg.log
|   |   |-- ipset.log
|   |   |-- iptables-legacy.log
|   |   |-- iptables-nft.log
|   |   |-- link.log
|   |   |-- memory.log
|   |   |-- neigh.log
|   |   |-- netstat.log
|   |   |-- route.log
|   |   |-- sysctl.log
|   |   |-- tcp.log
|   |   `-- top.log
|   |-- openvswitch
|   |   |-- ovs-vswitchd.log
|   |   `-- ovsdb-server.log
|   `-- ovn
|       |-- ovn-controller.log
|       |-- ovn-northd.log
|       |-- ovsdb-server-nb.log
|       `-- ovsdb-server-sb.log

perf [image]

该命令会去测试 Kube-OVN 的一些性能指标如下:

  1. 容器网络的性能指标;
  2. Hostnetwork 网络性能指标;
  3. 容器网络组播报文性能指标;
  4. OVN-NB, OVN-SB, OVN-Northd leader 删除恢复所需时间。

参数 image 用于指定性能测试 pod 所用的镜像,默认情况下是 kubeovn/test:v1.12.0, 设置该参数主要是为了离线场景,将镜像拉到内网环境可能会有镜像名变化。

# kubectl ko perf
pod/test-client created
pod/test-host-client created
pod/test-server created
pod/test-host-server created
pod/test-client condition met
pod/test-host-client condition met
pod/test-host-server condition met
pod/test-server condition met
Start doing pod network performance
=================================== unicast performance test =============================================================
Size            TCP Latency     TCP Bandwidth   UDP Latency     UDP Lost Rate   UDP Bandwidth
64              93.7 us         93.3 Mbits/sec  74.5 us         (0%)            8.28 Mbits/sec
128             83 us           155 Mbits/sec   69.8 us         (0%)            16.5 Mbits/sec
512             85 us           400 Mbits/sec   70.7 us         (0%)            62.7 Mbits/sec
1k              83.5 us         622 Mbits/sec   71.1 us         (0%)            130 Mbits/sec
4k              137 us          979 Mbits/sec   71.9 us         (0%)            508 Mbits/sec
=========================================================================================================================
Start doing host network performance
=================================== unicast performance test =============================================================
Size            TCP Latency     TCP Bandwidth   UDP Latency     UDP Lost Rate   UDP Bandwidth
64              51.1 us         114 Mbits/sec   40.5 us         (0%)            18.6 Mbits/sec
128             50.7 us         207 Mbits/sec   38.5 us         (0%)            36.5 Mbits/sec
512             50.5 us         579 Mbits/sec   38.5 us         (0%)            146 Mbits/sec
1k              50.4 us         926 Mbits/sec   38.4 us         (0%)            295 Mbits/sec
4k              74.4 us         1.96 Gbits/sec  39.2 us         (0%)            1.18 Gbits/sec
=========================================================================================================================
Start doing pod multicast network performance
=================================== multicast performance test =========================================================
Size            UDP Latency     UDP Lost Rate   UDP Bandwidth
64              0.015 ms        (0%)            5.73 Mbits/sec
128             0.012 ms        (0%)            11.1 Mbits/sec
512             0.018 ms        (0%)            44.1 Mbits/sec
1k              0.022 ms        (0.058%)        85.0 Mbits/sec
4k              0.017 ms        (1%)            130 Mbits/sec
=========================================================================================================================
Start doing leader recovery time test
Delete ovn central nb pod
pod "ovn-central-7bb79c5c57-m82vv" deleted
Waiting for ovn central nb pod running
================================  OVN nb recover takes 4.107957572 s ==================================
Delete ovn central sb pod
pod "ovn-central-7bb79c5c57-97474" deleted
Waiting for ovn central sb pod running
================================  OVN sb recover takes 3.713956419 s ==================================
Delete ovn central northd pod
pod "ovn-central-7bb79c5c57-59gh4" deleted
Waiting for ovn central northd pod running
================================  OVN northd recover takes 3.701646071 s ==================================
pod "test-client" deleted
pod "test-host-client" deleted
pod "test-host-server" deleted
pod "test-server" deleted

微信群 Slack Twitter Support


最后更新: 2023年6月18日
创建日期: 2022年5月24日

评论

回到页面顶部