Bug 2028493
Summary: | OVN-migration failed - ovnkube-node: error waiting for node readiness: timed out waiting for the condition | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yurii Prokulevych <yprokule> | |
Component: | Networking | Assignee: | Tim Rozet <trozet> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | achernet, anusaxen, cgoncalves, ctrautma, dceara, eglottma, fpaoline, jcaamano, jiji, tonyg, trozet | |
Version: | 4.10 | |||
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2077129 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:40:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2077129 |
Description
Yurii Prokulevych
2021-12-02 13:56:16 UTC
I think there's something odd in ovs. Here are my findings: Ovnkube node complains about the missing patchport for br-ex: I1202 15:04:22.319087 434538 ovs.go:206] exec(2095): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface patch-br-ex_openshift-master-1-to-br-int ofport I1202 15:04:22.322518 434538 ovs.go:209] exec(2095): stdout: "" I1202 15:04:22.322524 434538 ovs.go:210] exec(2095): stderr: "" There's a check that verifies the result being different from "" : https://github.com/openshift/ovn-kubernetes/blob/29b2532df414f4af3519a9da205146380a8dac55/go-controller/pkg/node/gateway.go#L258 The patchport is not there: [root@openshift-master-1 ~]# /usr/bin/ovs-vsctl list-ports br-ex1 bond0.113 patch-br-ex1_openshift-master-1-to-br-int [root@openshift-master-1 ~]# /usr/bin/ovs-vsctl list-ports br-ex bond0 The physnet option is set on the port: _uuid : 9201aac4-c512-42dd-9e46-4746c34296e1 addresses : [unknown] dhcpv4_options : [] dhcpv6_options : [] dynamic_addresses : [] enabled : [] external_ids : {} ha_chassis_group : [] name : br-ex_openshift-master-1 options : {network_name=physnet} parent_name : [] port_security : [] tag : [] tag_request : 0 type : localnet up : false The mappings are in place: [root@openshift-master-1 ~]# /usr/bin/ovs-vsctl list Open_vSwitch . _uuid : 0a6ea54b-e583-42eb-8d63-a9df6ab56eb0 bridges : [0beb018b-4939-4a26-964d-530f2e433570, 45c2d5cd-3945-4250-9297-7f51a50e91dd, 7ac2fb5e-dd11-4d71-bbc7-a79b3e28b919, b3cc26fc-e209-486d-a959-1de6d33575f7] cur_cfg : 266 datapath_types : [netdev, system] datapaths : {system=b272b3d3-2e73-4ca8-9257-b446c03cf558} db_version : "8.2.0" dpdk_initialized : false dpdk_version : "DPDK 20.11.1" external_ids : {hostname=openshift-master-1, ovn-bridge-mappings="physnet:br-ex,exgwphysnet:br-ex1", ovn-enable-lflow-cache="true", ovn-encap-ip="10.1.208.21", ovn-encap-type=geneve, ovn-limit-lflow-cache-kb="1048576", ovn-monitor-all="true", ovn-openflow-probe-interval="180", ovn-remote="ssl:10.1.208.20:9642,ssl:10.1.208.21:9642,ssl:10.1.208.22:9642", ovn-remote-probe-interval="180000", rundir="/var/run/openvswitch", system-id="0d2b1da5-a235-463a-8055-c71272cacbf6"} iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan] manager_options : [] next_cfg : 266 other_config : {vlan-limit="0"} ovs_version : "2.15.2" ssl : e37ab694-b1d6-4367-992b-e9e0b8f714b5 statistics : {} system_type : rhcos system_version : "4.10" This should trigger some magic in ovs to create the patchport, afaik. OVN SB chassis: _uuid : 6143a1d8-e278-41af-a60e-d0d2f697906b encaps : [9a7af96b-db0a-4fe1-9c23-4e70767b259d] external_ids : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="physnet:br-ex,exgwphysnet:br-ex1", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="true", ovn-trim-limit-lflow-cache="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"} hostname : openshift-master-1 name : "0d2b1da5-a235-463a-8055-c71272cacbf6" nb_cfg : 0 other_config : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="physnet:br-ex,exgwphysnet:br-ex1", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="true", ovn-trim-limit-lflow-cache="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"} transport_zones : [] vtep_logical_switches: [] NB switch ports: root@fedora ~]# ovn-nbctl list logical_switch_port br-ex1_openshift-master-1 _uuid : 12ff7c78-5607-4f26-85a1-be2ddf3b4ee8 addresses : [unknown] dhcpv4_options : [] dhcpv6_options : [] dynamic_addresses : [] enabled : [] external_ids : {} ha_chassis_group : [] name : br-ex1_openshift-master-1 options : {network_name=exgwphysnet} parent_name : [] port_security : [] tag : [] tag_request : [] type : localnet up : false [root@fedora ~]# ovn-nbctl list logical_switch_port 9201aac4-c512-42dd-9e46-4746c34296e1 _uuid : 9201aac4-c512-42dd-9e46-4746c34296e1 addresses : [unknown] dhcpv4_options : [] dhcpv6_options : [] dynamic_addresses : [] enabled : [] external_ids : {} ha_chassis_group : [] name : br-ex_openshift-master-1 options : {network_name=physnet} parent_name : [] port_security : [] tag : [] tag_request : 0 type : localnet up : false Per offline discussion, this is a bug on the ovn-kubernetes side, incorrectly assigning the same LSP to multiple logical switches. [root@fedora ~]# ovn-nbctl show exgw-ext_openshift-master-1 switch 4825e419-f614-4ff2-b1e7-3dd506fa386a (exgw-ext_openshift-master-1) port br-ex1_openshift-master-1 type: localnet addresses: ["unknown"] port etor-GR_openshift-master-1 type: router addresses: ["0c:42:a1:ee:7f:12"] router-port: rtoe-GR_openshift-master-1 [root@fedora ~]# ovn-nbctl show ext_openshift-master-1 switch f361acea-b980-416b-b518-5fe24c6f3413 (ext_openshift-master-1) port etor-GR_openshift-master-1 type: router addresses: ["0c:42:a1:ee:7f:12"] router-port: rtoe-GR_openshift-master-1 port br-ex_openshift-master-1 type: localnet addresses: ["unknown"] Same router port on multiple switches, this is a regression due to libovsdb changes. Verified on 4.11.0-0.nightly-2022-05-20-213928 BM IPI dual-stack I see different routers on different switches. exgw-rtoe-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com rtoe-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com sh-4.4# for f in $(ovn-nbctl show | awk '/ext_/ {gsub("[()]", "", $3) ; print $3}' | sort ) ; do ovn-nbctl show $f ; done switch 131438be-352a-46f5-a4ce-c3e00ef26ed6 (exgw-ext_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) port br-ex1_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] port exgw-etor-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:12:c1:38"] router-port: exgw-rtoe-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com switch 904e96f7-9362-4737-a490-25b8997c647f (exgw-ext_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) port exgw-etor-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:64:68:ed"] router-port: exgw-rtoe-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com port br-ex1_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] switch ccc43ce6-2a4a-49d8-8fa1-37fee2c32a2d (exgw-ext_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) port exgw-etor-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:3a:b8:e2"] router-port: exgw-rtoe-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com port br-ex1_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] switch 4d4789d1-c0b5-44c9-aa2e-6e4c90dee061 (ext_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) port etor-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:12:c1:38"] router-port: rtoe-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com port br-ex_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] switch 68fa87b5-0a8b-4e4a-83d1-3aa1ac3c7be7 (ext_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) port br-ex_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] port etor-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:64:68:ed"] router-port: rtoe-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com switch d2b74a48-bdbf-4ba6-b632-4d6222cdd6ab (ext_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) port etor-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com type: router addresses: ["52:54:00:3a:b8:e2"] router-port: rtoe-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com port br-ex_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com type: localnet addresses: ["unknown"] sh-4.4# for f in $(ovn-nbctl show | awk '/ext_/ {gsub("[()]", "", $3) ; print $3}' | sort ) ; do echo -e "\n$f" ; ovn-nbctl lsp-list $f ; echo ; done exgw-ext_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com 42b0d986-d7c8-4b73-8af5-800b747172ea (br-ex1_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) ebea449e-a3d9-4780-88a9-f6830488c22b (exgw-etor-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) exgw-ext_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com 94d6b35e-ce13-4874-b7e9-1e94f0543b25 (br-ex1_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) 7cfd0622-06f6-4d57-982a-460e6e861175 (exgw-etor-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) exgw-ext_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com aa1368a6-3267-4620-9dca-16d71bbcd594 (br-ex1_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) 7c0d1f38-9b14-4e49-8afc-46da64e9385d (exgw-etor-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) ext_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com 8f0e1d40-d1fe-408c-8dce-ac71532cb7b7 (br-ex_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) 31e0c30a-bdb5-4ba4-ab7e-cfabb7e83bd5 (etor-GR_master-0-0.rbrattai-o411e1db-0.qe.lab.redhat.com) ext_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com 103d0e52-f3a7-4f41-a843-98e4db199d2a (br-ex_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) 62e3f17a-e051-46b4-a723-67d7605f3bae (etor-GR_master-0-1.rbrattai-o411e1db-0.qe.lab.redhat.com) ext_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com 8c6441c6-b8a9-452f-aabf-448d65d78001 (br-ex_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) 887e2721-5399-49b7-a704-37e151e863ab (etor-GR_master-0-2.rbrattai-o411e1db-0.qe.lab.redhat.com) I don't have F5 so can't test that. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |