Bug 2048352 - [OVN] node does not recover after NetworkManager restart, NotReady and unreachable
Summary: [OVN] node does not recover after NetworkManager restart, NotReady and unreac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Andreas Karis
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-31 04:05 UTC by Ross Brattain
Modified: 2022-08-10 10:46 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Known issue with OVNKubernetes and NetworkManager restart: If the NetworkManager service ever restarts the node will lose network connectivity and must be recovered.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:45:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2941 0 None open Bug 2048352: ovn-kubernetes: Fixed ofport_request for physical prots 2022-02-02 19:25:30 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:46:22 UTC

Description Ross Brattain 2022-01-31 04:05:53 UTC
Description of problem:

After link flap or NetworkManager restart node goes NotReady, is unreachable on the network and does not recover without reboot.


Version-Release number of selected component (if applicable):

4.10.0-0.nightly-2022-01-30-073053


How reproducible:

Always

Steps to Reproduce:
1. oc debug node/rbrattai-o410i77-9zd98-master-0 --  chroot /host systemctl restart NetworkManager
OR
2. oc debug node/rbrattai-o410i77-9zd98-master-0 -- bash -c 'ip l s ens5 down; sleep 5 ; ip l s ens5 up'


Actual results:

Node goes NotReady state and stays in that state until reboot.  Node is unreachable on the network.

several NetworkManager error messages

Jan 30 16:53:43.742653 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7426] dhcp4 (br-ex): state changed unknown -> timeout
Jan 30 16:53:43.742705 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7426] device (br-ex): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Jan 30 16:53:43.743121 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7431] manager: NetworkManager state is now CONNECTED_LOCAL
Jan 30 16:53:43.743278 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7432] device (br-ex): releasing ovs interface br-ex
Jan 30 16:53:43.743693 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7436] device (br-ex): released from master device br-ex
Jan 30 16:53:43.744282 ip-10-0-154-177 NetworkManager[240416] <warn>  [1643561623.7442] device (br-ex): Activation: failed for connection 'ovs-if-br-ex'
Jan 30 16:53:43.744671 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7446] device (br-ex): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Jan 30 16:53:43.746528 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7465] dhcp4 (br-ex): canceled DHCP transaction
Jan 30 16:53:43.746540 ip-10-0-154-177 NetworkManager[240416] <info>  [1643561623.7465] dhcp4 (br-ex): state changed timeout -> done

OVN pods might have issues.

Jan 30 16:53:15.902150 ip-10-0-154-177 hyperkube[2095]: E0130 16:53:15.902134    2095 pod_workers.go:919] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ovnkube-node\" with CrashLoopBackOff: \"back-off 10s restarting failed container=ovnkube-node pod=ovnkube-node-zlhsx_openshift-ovn-kubernetes(5cb9804c-3f6b-4362-ab23-4281bbdeee41)\"" pod="openshift-ovn-kubernetes/ovnkube-node-zlhsx" podUID=5cb9804c-3f6b-4362-ab23-4281bbdeee41


Expected results:

Network recovers, Node recovers, TCP connections recover.

Additional info:

Reproduced on AWS and IBM Cloud

Comment 2 Ross Brattain 2022-01-31 04:37:43 UTC
Both clusters were IPI.   systemctl NetworkManager restart reliably causes the issue.

ip link down and up doesn't seem to trigger the issue.

tcpdump was running on ens5 while NetworkManager restarted during the first occurrence on AWS, so ens5 was in promiscuous mode.

Comment 3 Dan Williams 2022-01-31 16:13:07 UTC
@ross can you set NM logging to 'trace' level and reproduce?

nmcli log level trace

Comment 5 Ross Brattain 2022-01-31 17:02:16 UTC
start of TRACE

Jan 31 16:42:28.532437 rbrattai-o410i11-kvds9-master-1 NetworkManager[1759] <info>  [1643647348.5324] manager: logging: level 'TRACE' 

Jan 31 16:44:01.248819 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <warn>  [1643647441.2487] dhcp4 (br-ex): request timed out
Jan 31 16:44:01.248847 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <info>  [1643647441.2488] dhcp4 (br-ex): state changed unknown -> timeout
Jan 31 16:44:01.248881 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2488] device[1af75820cc171c57] (br-ex): new DHCPv4 client state 3
Jan 31 16:44:01.248893 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2488] device[1af75820cc171c57] (br-ex): DHCPv4 failed (ip_state conf, was_active 0)
Jan 31 16:44:01.248908 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2489] device[1af75820cc171c57] (br-ex): activation-stage: schedule activate_stage4_ip_config_timeout_4,v4 (id 445)
Jan 31 16:44:01.248938 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2489] device[1af75820cc171c57] (br-ex): activation-stage: invoke activate_stage4_ip_config_timeout_4,v4 (id 445)
Jan 31 16:44:01.248958 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <info>  [1643647441.2489] device (br-ex): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Jan 31 16:44:01.248968 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2489] device[1af75820cc171c57] (br-ex): add_pending_action (2): 'in-state-change'
Jan 31 16:44:01.248980 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2489] device[1af75820cc171c57] (br-ex): stable-id: clear
Jan 31 16:44:01.249356 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2493] active-connection[0x564e83a835b0] set state deactivated (was activating)
Jan 31 16:44:01.249483 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <info>  [1643647441.2494] manager: NetworkManager state is now CONNECTED_LOCAL
Jan 31 16:44:01.249600 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2495] active-connection[0x564e83a835b0] check-master-ready: already signalled (state deactivated, master 0x564e83949dc0 is in state activated)
Jan 31 16:44:01.249614 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2496] device[1af75820cc171c57] (br-ex): remove_pending_action (1): 'activation-5'
Jan 31 16:44:01.249705 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2496] device[13509969e3080d4b] (br-ex): slave br-ex state change 70 (ip-config) -> 120 (failed)
Jan 31 16:44:01.249716 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2497] device[13509969e3080d4b] (br-ex): master: release one slave 0x564e83a5d780/br-ex (enslaved) (configure)
Jan 31 16:44:01.249751 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <info>  [1643647441.2497] device (br-ex): releasing ovs interface br-ex
Jan 31 16:44:01.249764 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2497] ovsdb: call[095be9f13b8df817] new: del-interface interface=br-ex
Jan 31 16:44:01.250162 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2501] ovsdb: call[095be9f13b8df817] send: call-id=5, {"id": 5, "method": "transact", "params": ["Open_vSwitch", {"op": "mutate", "table": "Open_vSwitch", "mutations": [["ne>
Jan 31 16:44:01.250441 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2504] device[13509969e3080d4b] (br-ex): mtu: commit-mtu...
Jan 31 16:44:01.250461 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2504] device[1af75820cc171c57] (br-ex): ifindex: set ifindex 0
Jan 31 16:44:01.251398 rbrattai-o410i11-kvds9-master-1 ovs-vswitchd[1239] ovs|01133|bridge|INFO|bridge br-ex: deleted interface br-ex on port 65534
Jan 31 16:44:01.251902 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <info>  [1643647441.2505] device (br-ex): released from master device br-ex
Jan 31 16:44:01.252004 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2519] active-connection[0x564e83949dc0] set state-flags is-master,is-slave,layer2-ready,lifetime-bound-to-profile-visibility (was is-master,is-slave,layer2-ready,master-has>
Jan 31 16:44:01.252137 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2521] device[1af75820cc171c57] (br-ex): unmanaged: flags set to [!sleeping,!by-type,!platform-init,!user-explicit,!user-settings=0x0/0x79/managed], forget [is-slave=0x1000]>
Jan 31 16:44:01.252153 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2521] device[13509969e3080d4b] (br-ex): last slave removed; remaining activated
Jan 31 16:44:01.252172 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2521] policy: connection 'ovs-if-br-ex' failed to autoconnect; 3 tries left
Jan 31 16:44:01.252185 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2521] settings-connection[ee1b3e5130e70cab,a87c5ca0-696e-48a9-80e9-069102fba7d6] autoconnect: retries set 3
Jan 31 16:44:01.252534 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2524] settings: update[a87c5ca0-696e-48a9-80e9-069102fba7d6] clear-secrets: update profile "ovs-if-br-ex" (not persisted)
Jan 31 16:44:01.252574 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2525] settings: storage[a87c5ca0-696e-48a9-80e9-069102fba7d6,a659895e1f528596/keyfile] change event with connection "ovs-if-br-ex" (file "/etc/NetworkManager/system-connect>
Jan 31 16:44:01.252607 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2525] settings: update[a87c5ca0-696e-48a9-80e9-069102fba7d6] updating connection "ovs-if-br-ex" (a659895e1f528596/keyfile)
Jan 31 16:44:01.253091 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <trace> [1643647441.2530] settings-connection[ee1b3e5130e70cab,a87c5ca0-696e-48a9-80e9-069102fba7d6] update agent secrets: secrets cleared
Jan 31 16:44:01.253810 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <debug> [1643647441.2537] manager: (ovs-if-br-ex) already created virtual interface name br-ex
Jan 31 16:44:01.253977 rbrattai-o410i11-kvds9-master-1 NetworkManager[421414] <warn>  [1643647441.2539] device (br-ex): Activation: failed for connection 'ovs-if-br-ex'

Comment 9 Andreas Karis 2022-01-31 22:23:40 UTC
Mini-update: it's possible to recover the node without a reboot by restarting openvswitch `systemctl restart openvswitch`

Comment 10 Andreas Karis 2022-01-31 23:27:13 UTC
It must be something in the flows. When I push an action=normal and then restart immediately, this works --- so my best guess so far is that dhcp is somehow being dropped.
~~~
[akaris@linux data]$ oc debug node/ip-10-0-148-33.ec2.internal 
chStarting pod/ip-10-0-148-33ec2internal-debug ...
To use host binaries, run `chroot /host`
root /host
Pod IP: 10.0.148.33
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ovs-ofctl add-flow br-ex action=normal
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=86.094s, table=0, n_packets=2632, n_bytes=4304089, idle_age=28, priority=0 actions=NORMAL
 cookie=0x0, duration=4.547s, table=0, n_packets=463, n_bytes=706732, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=65.510s, table=0, n_packets=0, n_bytes=0, idle_age=65, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=65.510s, table=0, n_packets=0, n_bytes=0, idle_age=65, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=65.510s, table=0, n_packets=161, n_bytes=29816, idle_age=12, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=65.510s, table=0, n_packets=157, n_bytes=289337, idle_age=12, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=65.509s, table=0, n_packets=248, n_bytes=52394, idle_age=4, priority=205,udp,in_port=1,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=65.509s, table=0, n_packets=0, n_bytes=0, idle_age=65, priority=200,udp,in_port=1,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=65.509s, table=0, n_packets=192, n_bytes=107864, idle_age=4, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:1
sh-4.4# nmcli conn
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    8a55c503-4638-4390-b70f-83837e7776ac  ovs-interface  br-ex  
br-ex           0f3d0d7b-62b7-4586-8439-2dc678ffe71c  ovs-bridge     br-ex  
ovs-if-phys0    6c600c36-6997-40ef-8089-831dfcdea6df  ethernet       ens5   
ovs-port-br-ex  a9187acf-8445-4f96-a9c7-a5f5053b9d01  ovs-port       br-ex  
ovs-port-phys0  75223cff-0568-4046-849e-df8c0421dc4a  ovs-port       ens5   
sh-4.4# systemctl restart NetworkManager
sh-4.4# nmcli conn
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    8a55c503-4638-4390-b70f-83837e7776ac  ovs-interface  br-ex  
br-ex           0f3d0d7b-62b7-4586-8439-2dc678ffe71c  ovs-bridge     br-ex  
ovs-if-phys0    6c600c36-6997-40ef-8089-831dfcdea6df  ethernet       ens5   
ovs-port-br-ex  a9187acf-8445-4f96-a9c7-a5f5053b9d01  ovs-port       br-ex  
ovs-port-phys0  75223cff-0568-4046-849e-df8c0421dc4a  ovs-port       ens5   
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0xdeff105, duration=80.096s, table=0, n_packets=0, n_bytes=0, idle_age=80, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=80.096s, table=0, n_packets=0, n_bytes=0, idle_age=80, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=80.096s, table=0, n_packets=190, n_bytes=34147, idle_age=1, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=80.096s, table=0, n_packets=183, n_bytes=333606, idle_age=0, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=3.287s, table=0, n_packets=18, n_bytes=3635, idle_age=1, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=3.287s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=3.287s, table=0, n_packets=209, n_bytes=122217, idle_age=1, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0xb06186a3a85d964d, duration=3.287s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=110,tcp,in_port=4,tp_dst=31619 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x208c9ca3e7c6379a, duration=3.287s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=110,tcp,in_port=4,tp_dst=31926 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
sh-4.4# ovs-ofctl add-flow br-ex action=normal
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0xdeff105, duration=91.893s, table=0, n_packets=0, n_bytes=0, idle_age=91, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=91.893s, table=0, n_packets=0, n_bytes=0, idle_age=91, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=91.893s, table=0, n_packets=198, n_bytes=34857, idle_age=11, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=91.893s, table=0, n_packets=188, n_bytes=334109, idle_age=11, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=15.084s, table=0, n_packets=73, n_bytes=13799, idle_age=3, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=15.084s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=15.084s, table=0, n_packets=251, n_bytes=139185, idle_age=3, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0xb06186a3a85d964d, duration=15.085s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=110,tcp,in_port=4,tp_dst=31619 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x208c9ca3e7c6379a, duration=15.085s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=110,tcp,in_port=4,tp_dst=31926 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
sh-4.4# ovs-ofctl dump-flows br-ex | less
sh-4.4# 
sh-4.4# 
sh-4.4# 
(reverse-i-search)`ad': ovs-ofctl dump-flows br-ex | he^C
sh-4.4# ovs-ofctl add-flow br-ex action=normal
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=154.017s, table=0, n_packets=2639, n_bytes=4304443, idle_age=30, priority=0 actions=NORMAL
 cookie=0x0, duration=2.214s, table=0, n_packets=851, n_bytes=1763257, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=133.433s, table=0, n_packets=0, n_bytes=0, idle_age=133, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=133.433s, table=0, n_packets=0, n_bytes=0, idle_age=133, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=133.433s, table=0, n_packets=251, n_bytes=47238, idle_age=8, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=133.433s, table=0, n_packets=235, n_bytes=353153, idle_age=8, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=56.624s, table=0, n_packets=1070, n_bytes=919977, idle_age=2, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=56.624s, table=0, n_packets=0, n_bytes=0, idle_age=56, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=56.624s, table=0, n_packets=1240, n_bytes=493875, idle_age=2, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=156.072s, table=0, n_packets=2639, n_bytes=4304443, idle_age=32, priority=0 actions=NORMAL
 cookie=0x0, duration=4.269s, table=0, n_packets=1295, n_bytes=2548151, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=135.488s, table=0, n_packets=0, n_bytes=0, idle_age=135, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=135.488s, table=0, n_packets=0, n_bytes=0, idle_age=135, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=135.488s, table=0, n_packets=251, n_bytes=47238, idle_age=10, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=135.488s, table=0, n_packets=235, n_bytes=353153, idle_age=10, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=58.679s, table=0, n_packets=1070, n_bytes=919977, idle_age=4, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=58.679s, table=0, n_packets=0, n_bytes=0, idle_age=58, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=58.679s, table=0, n_packets=1240, n_bytes=493875, idle_age=4, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=157.031s, table=0, n_packets=2639, n_bytes=4304443, idle_age=33, priority=0 actions=NORMAL
 cookie=0x0, duration=5.228s, table=0, n_packets=1399, n_bytes=2704718, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=136.447s, table=0, n_packets=0, n_bytes=0, idle_age=136, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=136.447s, table=0, n_packets=0, n_bytes=0, idle_age=136, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=136.447s, table=0, n_packets=251, n_bytes=47238, idle_age=11, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=136.447s, table=0, n_packets=235, n_bytes=353153, idle_age=11, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=59.638s, table=0, n_packets=1070, n_bytes=919977, idle_age=5, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=59.638s, table=0, n_packets=0, n_bytes=0, idle_age=59, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=59.638s, table=0, n_packets=1240, n_bytes=493875, idle_age=5, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0xdeff105, duration=138.192s, table=0, n_packets=0, n_bytes=0, idle_age=138, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=138.192s, table=0, n_packets=0, n_bytes=0, idle_age=138, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=138.192s, table=0, n_packets=251, n_bytes=47238, idle_age=13, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=138.192s, table=0, n_packets=235, n_bytes=353153, idle_age=13, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=61.383s, table=0, n_packets=1091, n_bytes=940135, idle_age=0, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=61.383s, table=0, n_packets=0, n_bytes=0, idle_age=61, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=61.383s, table=0, n_packets=1323, n_bytes=503229, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0xb06186a3a85d964d, duration=61.383s, table=0, n_packets=0, n_bytes=0, idle_age=61, priority=110,tcp,in_port=4,tp_dst=31619 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x208c9ca3e7c6379a, duration=61.383s, table=0, n_packets=0, n_bytes=0, idle_age=61, priority=110,tcp,in_port=4,tp_dst=31926 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
sh-4.4# 
sh-4.4# 
sh-4.4# 
sh-4.4# 
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0xdeff105, duration=141.587s, table=0, n_packets=0, n_bytes=0, idle_age=141, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=141.587s, table=0, n_packets=0, n_bytes=0, idle_age=141, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=141.587s, table=0, n_packets=254, n_bytes=47510, idle_age=1, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=141.587s, table=0, n_packets=238, n_bytes=353808, idle_age=1, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=64.778s, table=0, n_packets=1177, n_bytes=1041676, idle_age=0, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=64.778s, table=0, n_packets=0, n_bytes=0, idle_age=64, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=64.778s, table=0, n_packets=1399, n_bytes=548432, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0xb06186a3a85d964d, duration=64.778s, table=0, n_packets=0, n_bytes=0, idle_age=64, priority=110,tcp,in_port=4,tp_dst=31619 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x208c9ca3e7c6379a, duration=64.778s, table=0, n_packets=0, n_bytes=0, idle_age=64, priority=110,tcp,in_port=4,tp_dst=31926 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
sh-4.4# ovs-ofctl add-flow br-ex action=normal
sh-4.4# TOOLBOX_IMAGE=registry.fedoraproject.org/f33/fedora-toolbox
sh-4.4# ^C
sh-4.4# ^C
sh-4.4# ^C
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=168.519s, table=0, n_packets=2639, n_bytes=4304443, idle_age=44, priority=0 actions=NORMAL
 cookie=0x0, duration=3.948s, table=0, n_packets=1171, n_bytes=1831499, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=147.935s, table=0, n_packets=0, n_bytes=0, idle_age=147, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=147.935s, table=0, n_packets=0, n_bytes=0, idle_age=147, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=147.935s, table=0, n_packets=254, n_bytes=47510, idle_age=7, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=147.935s, table=0, n_packets=238, n_bytes=353808, idle_age=7, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=71.126s, table=0, n_packets=1215, n_bytes=1075568, idle_age=4, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=71.126s, table=0, n_packets=0, n_bytes=0, idle_age=71, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=71.126s, table=0, n_packets=1435, n_bytes=556871, idle_age=4, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# systemctl restart NetworkManager
sh-4.4# nmcli conn
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    8a55c503-4638-4390-b70f-83837e7776ac  ovs-interface  br-ex  
br-ex           0f3d0d7b-62b7-4586-8439-2dc678ffe71c  ovs-bridge     br-ex  
ovs-if-phys0    6c600c36-6997-40ef-8089-831dfcdea6df  ethernet       ens5   
ovs-port-br-ex  a9187acf-8445-4f96-a9c7-a5f5053b9d01  ovs-port       br-ex  
ovs-port-phys0  75223cff-0568-4046-849e-df8c0421dc4a  ovs-port       ens5   
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=184.848s, table=0, n_packets=2639, n_bytes=4304443, idle_age=61, priority=0 actions=NORMAL
 cookie=0x0, duration=20.277s, table=0, n_packets=4812, n_bytes=9589412, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=164.264s, table=0, n_packets=0, n_bytes=0, idle_age=164, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=164.264s, table=0, n_packets=0, n_bytes=0, idle_age=164, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=164.264s, table=0, n_packets=254, n_bytes=47510, idle_age=24, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=164.264s, table=0, n_packets=238, n_bytes=353808, idle_age=24, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=87.455s, table=0, n_packets=1215, n_bytes=1075568, idle_age=20, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=87.455s, table=0, n_packets=0, n_bytes=0, idle_age=87, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=87.455s, table=0, n_packets=1435, n_bytes=556871, idle_age=20, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=187.150s, table=0, n_packets=2639, n_bytes=4304443, idle_age=63, priority=0 actions=NORMAL
 cookie=0x0, duration=22.579s, table=0, n_packets=4984, n_bytes=9714213, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=166.566s, table=0, n_packets=0, n_bytes=0, idle_age=166, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=166.566s, table=0, n_packets=0, n_bytes=0, idle_age=166, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=166.566s, table=0, n_packets=254, n_bytes=47510, idle_age=26, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=166.566s, table=0, n_packets=238, n_bytes=353808, idle_age=26, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=89.757s, table=0, n_packets=1215, n_bytes=1075568, idle_age=22, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=89.757s, table=0, n_packets=0, n_bytes=0, idle_age=89, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=89.757s, table=0, n_packets=1435, n_bytes=556871, idle_age=22, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=189.304s, table=0, n_packets=2639, n_bytes=4304443, idle_age=65, priority=0 actions=NORMAL
 cookie=0x0, duration=24.733s, table=0, n_packets=5249, n_bytes=9870828, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=168.720s, table=0, n_packets=0, n_bytes=0, idle_age=168, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=168.720s, table=0, n_packets=0, n_bytes=0, idle_age=168, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=168.720s, table=0, n_packets=254, n_bytes=47510, idle_age=28, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=168.720s, table=0, n_packets=238, n_bytes=353808, idle_age=28, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=91.911s, table=0, n_packets=1215, n_bytes=1075568, idle_age=24, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=91.911s, table=0, n_packets=0, n_bytes=0, idle_age=91, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=91.911s, table=0, n_packets=1435, n_bytes=556871, idle_age=24, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=190.462s, table=0, n_packets=2639, n_bytes=4304443, idle_age=66, priority=0 actions=NORMAL
 cookie=0x0, duration=25.891s, table=0, n_packets=5394, n_bytes=9945465, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=169.878s, table=0, n_packets=0, n_bytes=0, idle_age=169, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=169.878s, table=0, n_packets=0, n_bytes=0, idle_age=169, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=169.878s, table=0, n_packets=254, n_bytes=47510, idle_age=29, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=169.878s, table=0, n_packets=238, n_bytes=353808, idle_age=29, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=93.069s, table=0, n_packets=1215, n_bytes=1075568, idle_age=26, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=93.069s, table=0, n_packets=0, n_bytes=0, idle_age=93, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=93.069s, table=0, n_packets=1435, n_bytes=556871, idle_age=26, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=191.329s, table=0, n_packets=2639, n_bytes=4304443, idle_age=67, priority=0 actions=NORMAL
 cookie=0x0, duration=26.758s, table=0, n_packets=5508, n_bytes=10065032, idle_age=0, actions=NORMAL
 cookie=0xdeff105, duration=170.745s, table=0, n_packets=0, n_bytes=0, idle_age=170, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=170.745s, table=0, n_packets=0, n_bytes=0, idle_age=170, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=170.745s, table=0, n_packets=254, n_bytes=47510, idle_age=30, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=170.745s, table=0, n_packets=238, n_bytes=353808, idle_age=30, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=93.936s, table=0, n_packets=1215, n_bytes=1075568, idle_age=26, priority=205,udp,in_port=4,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=93.936s, table=0, n_packets=0, n_bytes=0, idle_age=93, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=93.936s, table=0, n_packets=1435, n_bytes=556871, idle_age=26, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
sh-4.4# systemctl restart NetworkManager
sh-4.4# nmcli conn
NAME            UUID                                  TYPE           DEVICE 
ovs-if-br-ex    8a55c503-4638-4390-b70f-83837e7776ac  ovs-interface  br-ex  
br-ex           0f3d0d7b-62b7-4586-8439-2dc678ffe71c  ovs-bridge     br-ex  
ovs-if-phys0    6c600c36-6997-40ef-8089-831dfcdea6df  ethernet       ens5   
ovs-port-br-ex  a9187acf-8445-4f96-a9c7-a5f5053b9d01  ovs-port       br-ex  
ovs-port-phys0  75223cff-0568-4046-849e-df8c0421dc4a  ovs-port       ens5   
sh-4.4# ovs-ofctl dump-flows br-ex | head
NXST_FLOW reply (xid=0x4):
 cookie=0xdeff105, duration=179.125s, table=0, n_packets=0, n_bytes=0, idle_age=179, priority=500,ip,in_port=3,nw_src=10.0.148.33,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=10.0.148.33))
 cookie=0xdeff105, duration=179.125s, table=0, n_packets=0, n_bytes=0, idle_age=179, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=179.125s, table=0, n_packets=323, n_bytes=59269, idle_age=0, priority=500,ip,in_port=LOCAL,nw_dst=172.30.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=179.125s, table=0, n_packets=310, n_bytes=551240, idle_age=0, priority=500,ip,in_port=3,nw_src=172.30.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=3.560s, table=0, n_packets=71, n_bytes=76497, idle_age=0, priority=205,udp,in_port=6,dl_dst=0e:b5:e0:d1:73:1d,tp_dst=6081 actions=LOCAL
 cookie=0xb06186a3a85d964d, duration=3.560s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=110,tcp,in_port=6,tp_dst=31619 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x208c9ca3e7c6379a, duration=3.560s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=110,tcp,in_port=6,tp_dst=31926 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=3.560s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=200,udp,in_port=6,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=3.560s, table=0, n_packets=1494, n_bytes=569715, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:6
sh-4.4# systemctl restart NetworkManager

--- > disconnected
~~~

Comment 11 Andreas Karis 2022-01-31 23:55:58 UTC
After lots of banging my head, I think I found the offending flow:
~~~
sh-4.4# ovs-ofctl dump-flows br-ex  | grep priority=100
 cookie=0xdeff105, duration=588.277s, table=0, n_packets=27796, n_bytes=3028594, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:1
 cookie=0xdeff105, duration=588.277s, table=0, n_packets=14225, n_bytes=6544765, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),output:1
 cookie=0xdeff105, duration=588.277s, table=1, n_packets=34152, n_bytes=105626161, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=588.277s, table=1, n_packets=15418, n_bytes=41699230, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=588.277s, table=1, n_packets=0, n_bytes=0, idle_age=588, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=588.277s, table=1, n_packets=0, n_bytes=0, idle_age=588, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
sh-4.4# ovs-ofctl del-flows --strict br-ex 'priority=100,ip,in_port=LOCAL'
sh-4.4# ovs-ofctl dump-flows br-ex  | grep priority=100
 cookie=0xdeff105, duration=622.159s, table=0, n_packets=29255, n_bytes=3172329, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:1
 cookie=0xdeff105, duration=622.159s, table=1, n_packets=36001, n_bytes=111509524, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=622.159s, table=1, n_packets=16339, n_bytes=41878382, idle_age=1, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=622.159s, table=1, n_packets=0, n_bytes=0, idle_age=622, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=622.159s, table=1, n_packets=0, n_bytes=0, idle_age=622, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
sh-4.4# systemctl restart NetworkManager
sh-4.4# 
~~~

Comment 12 Andreas Karis 2022-02-01 00:09:02 UTC
Likely here:
~~~
1018                 // table 0, packets coming from host Commit connections with ct_mark ctMarkHost
1019                 // so that reverse direction goes back to the host.
1020                 dftFlows = append(dftFlows,
1021                         fmt.Sprintf("cookie=%s, priority=100, in_port=%s, ip, "+
1022                                 "actions=ct(commit, zone=%d, exec(set_field:%s->ct_mark)), output:%s",
1023                                 defaultOpenFlowCookie, ofPortHost, config.Default.ConntrackZone, ctMarkHost, ofPortPhys))
1024 
1025                 // table 0, packets coming from external. Send it through conntrack and
1026                 // resubmit to table 1 to know the state and mark of the connection.
1027                 dftFlows = append(dftFlows,
 NORMAL  ᚠ 5e1ad49  pkg/node/gateway_shared_intf.go  
~~~

I'll follow up tomorrow.

Comment 13 Andreas Karis 2022-02-01 11:39:33 UTC
Example of what is happening:
####################################################################

Let's look at the issue on a specific node in detail.

Before NetworkManager restart - we can see that flows from LOCAL were programmed to have actions=output:2, that is port ens5:
~~~
sh-4.4# ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110' | grep ct
 cookie=0x7d956a9aff1e82d9, duration=806.004s, table=0, n_packets=0, n_bytes=0, idle_age=806, priority=110,tcp,in_port=2,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=806.004s, table=0, n_packets=559, n_bytes=72898, idle_age=6, priority=110,tcp,in_port=2,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=806.004s, table=0, n_packets=0, n_bytes=0, idle_age=806, priority=110,tcp,in_port=3,tp_src=30906 actions=output:2
 cookie=0xa9f3b7acc211a828, duration=806.004s, table=0, n_packets=460, n_bytes=159705, idle_age=6, priority=110,tcp,in_port=3,tp_src=32648 actions=output:2
 cookie=0xdeff105, duration=1001.391s, table=0, n_packets=48851, n_bytes=5208677, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:2

# this flow here
 cookie=0xdeff105, duration=1001.391s, table=0, n_packets=44786, n_bytes=9825273, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),output:2
# flow above

 cookie=0xdeff105, duration=1001.391s, table=1, n_packets=54915, n_bytes=176089659, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1001.391s, table=1, n_packets=130997, n_bytes=625065266, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1001.391s, table=1, n_packets=0, n_bytes=0, idle_age=1001, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1001.391s, table=1, n_packets=0, n_bytes=0, idle_age=1001, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
sh-4.4# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 2(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 3(patch-br-ex_ip-): addr:ae:71:7f:b8:a9:8b
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
sh-4.4# ip a ls dev br-ex
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0e:5e:00:0a:14:47 brd ff:ff:ff:ff:ff:ff
    inet 10.0.155.59/20 brd 10.0.159.255 scope global dynamic noprefixroute br-ex
       valid_lft 2296sec preferred_lft 2296sec
    inet6 fe80::3350:966d:18ae:143e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
~~~

After NetworkManager restart - note how the flows still point out port number 2, which does not exist any more because NM unplugged and replugged all ports out of/into the bridge:
~~~
[root@ip-10-0-155-59 tmp]# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:ae:71:7f:b8:a9:8b
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 4(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
[root@ip-10-0-155-59 tmp]# ip a ls dev br-ex
24: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0e:5e:00:0a:14:47 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3350:966d:18ae:143e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@ip-10-0-155-59 tmp]#  ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110' | grep ct
 cookie=0x7d956a9aff1e82d9, duration=1149.827s, table=0, n_packets=0, n_bytes=0, idle_age=1149, priority=110,tcp,in_port=2,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=1149.827s, table=0, n_packets=920, n_bytes=118853, idle_age=58, priority=110,tcp,in_port=2,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=1149.827s, table=0, n_packets=0, n_bytes=0, idle_age=1149, priority=110,tcp,in_port=3,tp_src=30906 actions=output:2
 cookie=0xa9f3b7acc211a828, duration=1149.827s, table=0, n_packets=754, n_bytes=326659, idle_age=0, priority=110,tcp,in_port=3,tp_src=32648 actions=output:2
 cookie=0xdeff105, duration=1345.214s, table=0, n_packets=64333, n_bytes=6926605, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:2

# this flow here
 cookie=0xdeff105, duration=1345.214s, table=0, n_packets=48981, n_bytes=13472381, idle_age=2, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),output:2
# flow above

 cookie=0xdeff105, duration=1345.214s, table=1, n_packets=72044, n_bytes=228122244, idle_age=57, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1345.214s, table=1, n_packets=135125, n_bytes=626407905, idle_age=57, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1345.214s, table=1, n_packets=0, n_bytes=0, idle_age=1345, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1345.214s, table=1, n_packets=0, n_bytes=0, idle_age=1345, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
~~~

And NM cannot get the IP address, because the packets never make it to ens5 due to the now broken flows:
~~~
[root@ip-10-0-155-59 tmp]# nmcli conn
NAME                UUID                                  TYPE           DEVICE 
ovs-if-br-ex        2fa1deca-aed4-4b54-b96e-7114f3c135b3  ovs-interface  br-ex         # <--- yellow, all others are green
br-ex               4ff63c37-b94d-40a0-a084-c8a4e28ce549  ovs-bridge     br-ex  
ovs-if-phys0        d8643a0f-f9f6-497b-a2ae-81160e0b43ee  ethernet       ens5   
ovs-port-br-ex      cb60e9a7-9aa3-4a45-9397-d86b311cb326  ovs-port       br-ex  
ovs-port-phys0      32c84cb1-7023-41ac-8f9e-d9780ad74057  ovs-port       ens5   
Wired connection 1  67ebaae8-b025-361b-aafc-dd1061c2a4ba  ethernet       -- 
~~~

A tcpdump at this point in time does show DHCP requests on br-ex, but they never make it to ens5 --- logically:
~~~
sh-5.0# tcpdump -nne -i br-ex port 68 or port 67
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes
22:56:06.552361 0e:b5:e0:d1:73:1d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 324: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 0e:b5:e0:d1:73:1d, length 282
22:56:10.640169 0e:b5:e0:d1:73:1d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 324: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 0e:b5:e0:d1:73:1d, length 282
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
sh-5.0# timeout 90 tcpdump -nne -i ens5 port 68 or port 67
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
~~~

Now, upon deletion of the flow in question, DHCP will work again:
~~~
[root@ip-10-0-155-59 tmp]# ]# ovs-ofctl del-flows --strict br-ex 'priority=100,ip,in_port=LOCAL'
[root@ip-10-0-155-59 tmp]# ]# ovs-ofctl dump-flows br-ex | egrep 'priority=100|p cookie=0x7d956a9aff1e82d9, duration=1248.874s, table=0, n_packets=0, n_bytes=0, idle_age=1248, priority=110,tcp,in_port=2,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=1248.874s, table=0, n_packets=920, n_bytes=118853, idle_age=157, priority=110,tcp,in_port=2,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=1248.874s, table=0, n_packets=0, n_bytes=0, idle_age=1248, priority=110,tcp,in_port=3,tp_src=30906 actions=output:2
 cookie=0xa9f3b7acc211a828, duration=1248.875s, table=0, n_packets=756, n_bytes=326791, idle_age=68, priority=110,tcp,in_port=3,tp_src=32648 actions=output:2
 cookie=0xdeff105, duration=1444.262s, table=0, n_packets=68235, n_bytes=7254084, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:2
 cookie=0xdeff105, duration=1444.262s, table=1, n_packets=72044, n_bytes=228122244, idle_age=156, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1444.262s, table=1, n_packets=135125, n_bytes=626407905, idle_age=156, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1444.262s, table=1, n_packets=0, n_bytes=0, idle_age=1444, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1444.262s, table=1, n_packets=0, n_bytes=0, idle_age=1444, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
[root@ip-10-0-155-59 tmp]# 
~~~

Network manager can connect and get an IP address:
~~~
[root@ip-10-0-155-59 tmp]# nmcli conn
NAME                UUID                                  TYPE           DEVICE 
ovs-if-br-ex        2fa1deca-aed4-4b54-b96e-7114f3c135b3  ovs-interface  br-ex      # <---- green
br-ex               4ff63c37-b94d-40a0-a084-c8a4e28ce549  ovs-bridge     br-ex  
ovs-if-phys0        d8643a0f-f9f6-497b-a2ae-81160e0b43ee  ethernet       ens5   
ovs-port-br-ex      cb60e9a7-9aa3-4a45-9397-d86b311cb326  ovs-port       br-ex  
ovs-port-phys0      32c84cb1-7023-41ac-8f9e-d9780ad74057  ovs-port       ens5   
Wired connection 1  67ebaae8-b025-361b-aafc-dd1061c2a4ba  ethernet       --  
~~~

After a couple of minutes, ovnkube will reconfigure the flows (look at the output ports):
~~~
[root@ip-10-0-155-59 tmp]# ovs-ofctl dump-flows br-ex | egrep 'priority=100|prior                                                                                   everse-i-search)`dump-flow': ovs-ofctl 
 cookie=0x7d956a9aff1e82d9, duration=18.791s, table=0, n_packets=0, n_bytes=0, idle_age=18, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=18.791s, table=0, n_packets=0, n_bytes=0, idle_age=18, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=18.791s, table=0, n_packets=0, n_bytes=0, idle_age=1296, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xa9f3b7acc211a828, duration=18.791s, table=0, n_packets=756, n_bytes=326791, idle_age=115, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0xdeff105, duration=18.791s, table=0, n_packets=73693, n_bytes=7969151, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=18.791s, table=0, n_packets=840, n_bytes=466458, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=1491.452s, table=1, n_packets=77346, n_bytes=244035518, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1491.452s, table=1, n_packets=135824, n_bytes=629032462, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1491.452s, table=1, n_packets=0, n_bytes=0, idle_age=1491, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1491.452s, table=1, n_packets=0, n_bytes=0, idle_age=1491, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
~~~

Root cause
###################################################################

The problem here is that ens5 is disconnected from the OVS bridge, and that ens5's internal port ID inside OVS changes. However, OVS flows are programmed by port ID with the exception of the LOCAL match ((the bridge's internal port). So, the flows that were programmed earlier to steer packets coming from in_port=LOCAL to out_port=x will break after a NetworkManager restart, because ens5 is no longer at OVS internal id x but at a new ID y.

In my opinion, the main problem is that NetworkManager upon a restart deletes all interface <-> OVS connections. NetworkManager is designed to simply pull out and replug everything. Instead, it should shutdown the linux ports, and keep the interfaces which are connected to the OVS bridges shutdown instead of ripping them out and plugging them in completely. 

Just as an example for the "power" that NM has: NetworkManager rips out the internal bridge interface when it restarts. That is something that administrators **cannot** do. For an admin, it's impossible to do `ip link del br-ex`, but NetworkManager achieves that feat and adds a new internal port to OVS bridge br-ex (the interface index increases to a new value (as shown by ip link ls dev br-ex), clearly showing that the OVS internal interface is removed from the kernel, then readded). But that's actually not the problem in this specific case, so let's focus on the essential.

I'd argue that pulling all interfaces from an OVS bridge and plugging them back in on a running system is an invalid operation for OVS and should not be done. This IMO clearly is a NetworkManager bug or design problem.
Removing and recreating ports that were plugged into OVS (not just shutting them down) has the potential to lead to issues. When an interface is removed and readded to OVS, the OVS interface ID will change as well, thus wreaking havoc to installed OVS flows which identify the in/out port via OF port id.

Again, I personally believe that this is a NetworkManager design issue and that it must be addressed in NetworkManager eventually. NetworkManager runs totally against administrator intuition here and
unnecessarily pulls out and replugs ports into OVS. An NM restart should *not* lead to such an action

With that said, the issue can only harm us due to the fact that ovnkube-node programs OVS flows and connects to the rest of the cluster via br-ex. So, our controller uses the dataplane that it programs itself, leading to chicken/egg style issues when flows are broken. When the node sends out a DHCP request to get its IP address after a NetworkManager restart, the DHCP request is steered into the non-existing out_port=x and is dropped. The node never gets an IP address. Without an IP address, ovnkube-node cannot contact its control plane. And without that, it will not reprogram the OpenFlow flows to match the new port numbers..

However, with regards to the symptoms that we are seeing, and as to why we are only observing them as of OCP 4.10: packets outbound from br-ex (in_port=LOCAL) used to hit action=normal and would go out any port, regardless of its port_id.

A quick comparison of OCP 4.9 to 4.10 for the flows in question shows:

OpenShift 4.9 (logic for IPv4 shown only):
~~~
func commonFlows(ofPortPhys, bridgeMacAddress, ofPortPatch string) []string {
	var dftFlows []string
	maxPktLength := getMaxFrameLength()

	// table 0, we check to see if this dest mac is the shared mac, if so flood to both ports
	dftFlows = append(dftFlows,
		fmt.Sprintf("cookie=%s, priority=10, table=0, in_port=%s, dl_dst=%s, actions=output:%s,output:LOCAL",
			defaultOpenFlowCookie, ofPortPhys, bridgeMacAddress, ofPortPatch))

	if config.IPv4Mode {
		// table 0, packets coming from pods headed externally. Commit connections
		// so that reverse direction goes back to the pods.
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, in_port=%s, ip, "+
				"actions=ct(commit, zone=%d), output:%s",
				defaultOpenFlowCookie, ofPortPatch, config.Default.ConntrackZone, ofPortPhys))

		// table 0, packets coming from external. Send it through conntrack and
		// resubmit to table 1 to know the state of the connection.
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=50, in_port=%s, ip, "+
				"actions=ct(zone=%d, table=1)", defaultOpenFlowCookie, ofPortPhys, config.Default.ConntrackZone))
	}
~~~

OpenShift 4.10 (logic for IPv4 shown only):
~~~
	if config.IPv4Mode {
		// table 0, packets coming from pods headed externally. Commit connections with ct_mark ctMarkOVN
		// so that reverse direction goes back to the pods.
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, in_port=%s, ip, "+
				"actions=ct(commit, zone=%d, exec(set_field:%s->ct_mark)), output:%s",
				defaultOpenFlowCookie, ofPortPatch, config.Default.ConntrackZone, ctMarkOVN, ofPortPhys))

		// table 0, packets coming from host Commit connections with ct_mark ctMarkHost
		// so that reverse direction goes back to the host.
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, in_port=%s, ip, "+
				"actions=ct(commit, zone=%d, exec(set_field:%s->ct_mark)), output:%s",
				defaultOpenFlowCookie, ofPortHost, config.Default.ConntrackZone, ctMarkHost, ofPortPhys))

		// table 0, packets coming from external. Send it through conntrack and
		// resubmit to table 1 to know the state and mark of the connection.
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=50, in_port=%s, ip, "+
				"actions=ct(zone=%d, table=1)", defaultOpenFlowCookie, ofPortPhys, config.Default.ConntrackZone))
	}
~~~


The commit that introduced the problematic flows is:
https://github.com/ovn-org/ovn-kubernetes/commit/126b132f863bcc86fb06e1557cd54a46c4ad9b02
~~~
Shared gateway use ct_mark to classify host/ovn traffic

Today, openflow rules on the shared gateway bridge
send incoming traffic from uplink to conntrack on zone
64000 to decide if the traffic needs to be steered towards
OVN or the host.

established/related connections are sent to OVN via
the patch port to OVN integration bridge while new
connections are set to the host.

This poses an issue for NICs that are capable of offloading
conneciton tracking rules as only established connections are
offloadable.

This commit modifies the openflow rules on the physical
bridge to allow the use of ct_mark on ct zone 64000 to
classify traffic going to host and OVN

- table 0: set up higher priority rule (205/200) to steer tunneled traffic
  based on udp encap port
- table 0: commit traffic coming out of ovn (patch to br-int) to zone 64000
  with ct_mark=0x1
- table 0: commit traffic coming from host (LOCAL) to zone 64000 with
  ct_mark=0x2
- table 1: send established/related connections on zone 64000 to
  either OVN or host based on ct_mark
~~~

That means before that commit, the host could still send a DHCP request and it hit the normal action and thus was flooded out all ports on OVS bridge br-ex. The host thus could get an IP address and eventually fix the broken flows.

Since that commit, however, outbound host flows are marked with  ct_mark=0x2 and are steered explicitly out of ofPortHost (our out_port=x) instead of having action=normal. When network manager restarts, the flows remains the same, but the OVS ID of ens5 is not port=y, and thus the flow does not match ens5 any more:
~~~
	if config.IPv4Mode {
(...)
		// table 1, established and related connections in zone 64000 with ct_mark ctMarkHost go to host
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+est, ct_mark=%s, "+
				"actions=output:%s",
				defaultOpenFlowCookie, ctMarkHost, ofPortHost))

		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+rel, ct_mark=%s, "+
				"actions=output:%s",
				defaultOpenFlowCookie, ctMarkHost, ofPortHost))
(...)
~~~

The return flows matching that mark ct_mark=0x2 are then steered with the following flows, to LOCAL:
~~~
(...)
	if config.IPv4Mode {
		// table 1, established and related connections in zone 64000 with ct_mark ctMarkOVN go to OVN
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+est, ct_mark=%s, "+
				"actions=%s",
				defaultOpenFlowCookie, ctMarkOVN, actions))

		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+rel, ct_mark=%s, "+
				"actions=%s",
				defaultOpenFlowCookie, ctMarkOVN, actions))

		// table 1, established and related connections in zone 64000 with ct_mark ctMarkHost go to host
		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+est, ct_mark=%s, "+
				"actions=output:%s",
				defaultOpenFlowCookie, ctMarkHost, ofPortHost))

		dftFlows = append(dftFlows,
			fmt.Sprintf("cookie=%s, priority=100, table=1, ip, ct_state=+trk+rel, ct_mark=%s, "+
				"actions=output:%s",
				defaultOpenFlowCookie, ctMarkHost, ofPortHost))
(...)
~~~

Possible solutions:
##########################################################################

Short term:
We can hence fix this on the ovnkube side by creating a bypass flow specifically for DHCP traffic (a), or by modifying the action to normal and by having normal switch logic handle this for outbound packets from the host (b). 
The return traffic will then either hit the NORMAL flow (a), or it will match the conntrack return flow with actions=LOCAL as the out port (b). 

Long term:
Fix NetworkManager.

Comment 15 Andreas Karis 2022-02-01 14:33:02 UTC
Here's a fix for this specific issue:
https://github.com/ovn-org/ovn-kubernetes/pull/2785

With that fix in OCP 4.10, NetworkManager can be safely restarted:
~~~
sh-4.4# ovs-ofctl dump-flows br-ex | grep LOCAL | grep priority=100
 cookie=0xdeff105, duration=1029.107s, table=0, n_packets=177772, n_bytes=150655268, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=12824.653s, table=1, n_packets=354589, n_bytes=1148457012, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=12824.653s, table=1, n_packets=0, n_bytes=0, idle_age=12824, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
sh-4.4# systemctl restart NetworkManager
sh-4.4# 
sh-4.4# nmcli conn
NAME                UUID                                  TYPE           DEVICE 
ovs-if-br-ex        2fa1deca-aed4-4b54-b96e-7114f3c135b3  ovs-interface  br-ex  
br-ex               4ff63c37-b94d-40a0-a084-c8a4e28ce549  ovs-bridge     br-ex  
ovs-if-phys0        d8643a0f-f9f6-497b-a2ae-81160e0b43ee  ethernet       ens5   
ovs-port-br-ex      cb60e9a7-9aa3-4a45-9397-d86b311cb326  ovs-port       br-ex  
ovs-port-phys0      32c84cb1-7023-41ac-8f9e-d9780ad74057  ovs-port       ens5   
Wired connection 1  67ebaae8-b025-361b-aafc-dd1061c2a4ba  ethernet       --  
~~~

There's a connected issue though and that again has to do with the port changes. Upon NetworkManager restart, traffic to pods / to OVN might break until the new flows are actually installed on br-ex.

That issue can be shown with:
~~~
ovs-ofctl show br-ex
ovs-ofctl dump-flows br-ex | egrep 'in_port=2|output:2|in_port=4|output:4'
systemctl restart NetworkManager
sleep 5
ovs-ofctl show br-ex
ovs-ofctl dump-flows br-ex | egrep 'in_port=2|output:2|in_port=4|output:4'
for i in {0..10}; do sleep 5; date;  ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110'; done 
~~~

~~~
sh-4.4# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 2(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
sh-4.4# ovs-ofctl dump-flows br-ex | egrep 'in_port=2|output:2|in_port=4|output:4'
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=25139, n_bytes=23707979, idle_age=0, priority=205,udp,in_port=2,dl_dst=0e:5e:00:0a:14:47,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=0, n_bytes=0, idle_age=970, priority=200,udp,in_port=2,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=27462, n_bytes=46434632, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:2
 cookie=0x7d956a9aff1e82d9, duration=970.121s, table=0, n_packets=7, n_bytes=632, idle_age=371, priority=110,tcp,in_port=2,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=970.121s, table=0, n_packets=596, n_bytes=75460, idle_age=23, priority=110,tcp,in_port=2,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=970.121s, table=0, n_packets=6, n_bytes=3041, idle_age=371, priority=110,tcp,in_port=3,tp_src=30906 actions=output:2
 cookie=0xa9f3b7acc211a828, duration=970.121s, table=0, n_packets=480, n_bytes=272857, idle_age=23, priority=110,tcp,in_port=3,tp_src=32648 actions=output:2
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=44893, n_bytes=4918796, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:2
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=73368, n_bytes=198575001, idle_age=0, priority=50,ip,in_port=2 actions=ct(table=1,zone=64000)
 cookie=0xdeff105, duration=970.121s, table=0, n_packets=1, n_bytes=42, idle_age=969, priority=10,in_port=2,dl_dst=0e:5e:00:0a:14:47 actions=output:3,LOCAL
 cookie=0xdeff105, duration=970.121s, table=1, n_packets=0, n_bytes=0, idle_age=970, priority=13,udp,in_port=2,tp_dst=3784 actions=output:3,LOCAL
sh-4.4# systemctl restart NetworkManager
sh-4.4# sleep 5
sh-4.4# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 4(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
sh-4.4# ovs-ofctl dump-flows br-ex | egrep 'in_port=2|output:2|in_port=4|output:4'
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=25139, n_bytes=23707979, idle_age=5, priority=205,udp,in_port=2,dl_dst=0e:5e:00:0a:14:47,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=0, n_bytes=0, idle_age=975, priority=200,udp,in_port=2,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=27563, n_bytes=46450303, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:2
 cookie=0x7d956a9aff1e82d9, duration=975.604s, table=0, n_packets=7, n_bytes=632, idle_age=377, priority=110,tcp,in_port=2,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=975.604s, table=0, n_packets=596, n_bytes=75460, idle_age=28, priority=110,tcp,in_port=2,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=975.604s, table=0, n_packets=6, n_bytes=3041, idle_age=377, priority=110,tcp,in_port=3,tp_src=30906 actions=output:2
 cookie=0xa9f3b7acc211a828, duration=975.604s, table=0, n_packets=480, n_bytes=272857, idle_age=28, priority=110,tcp,in_port=3,tp_src=32648 actions=output:2
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=45063, n_bytes=4957973, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:2
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=73504, n_bytes=198862321, idle_age=4, priority=50,ip,in_port=2 actions=ct(table=1,zone=64000)
 cookie=0xdeff105, duration=975.604s, table=0, n_packets=1, n_bytes=42, idle_age=975, priority=10,in_port=2,dl_dst=0e:5e:00:0a:14:47 actions=output:3,LOCAL
 cookie=0xdeff105, duration=975.604s, table=1, n_packets=0, n_bytes=0, idle_age=975, priority=13,udp,in_port=2,tp_dst=3784 actions=output:3,LOCAL
sh-4.4# for i in {0..10}; do sleep 5; date;  ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110'; done 
Tue Feb  1 14:27:07 UTC 2022
 cookie=0xa9f3b7acc211a828, duration=2.394s, table=0, n_packets=16, n_bytes=1676, idle_age=0, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=2.394s, table=0, n_packets=0, n_bytes=0, idle_age=2, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=2.394s, table=0, n_packets=491, n_bytes=280481, idle_age=0, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=2.394s, table=0, n_packets=6, n_bytes=3041, idle_age=383, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=981.790s, table=0, n_packets=19768, n_bytes=12769972, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=2.394s, table=0, n_packets=45734, n_bytes=5036639, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=981.790s, table=1, n_packets=55338, n_bytes=161809601, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=981.790s, table=1, n_packets=18915, n_bytes=39471177, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=981.790s, table=1, n_packets=0, n_bytes=0, idle_age=981, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=981.790s, table=1, n_packets=0, n_bytes=0, idle_age=981, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
^C
~~~

~~~
ovs-ofctl show br-ex
ovs-ofctl dump-flows br-ex | egrep 'in_port=4|output:4|in_port=5|output:5'
systemctl restart NetworkManager
sleep 5
ovs-ofctl show br-ex
ovs-ofctl dump-flows br-ex | egrep 'in_port=4|output:4|in_port=5|output:5'
for i in {0..10}; do sleep 5; date;  ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110'; done 
~~~

It depends a bit on the timing, but this can potentially last quite a long time:
~~~
sh-4.4# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 4(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
sh-4.4# ovs-ofctl dump-flows br-ex | egrep 'in_port=4|output:4|in_port=5|output:5'
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=2815, n_bytes=2644550, idle_age=0, priority=205,udp,in_port=4,dl_dst=0e:5e:00:0a:14:47,tp_dst=6081 actions=LOCAL
 cookie=0xa9f3b7acc211a828, duration=94.075s, table=0, n_packets=112, n_bytes=13715, idle_age=5, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=0, n_bytes=0, idle_age=94, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=30780, n_bytes=52486447, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=94.075s, table=0, n_packets=0, n_bytes=0, idle_age=94, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=94.075s, table=0, n_packets=565, n_bytes=322054, idle_age=5, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=94.075s, table=0, n_packets=6, n_bytes=3041, idle_age=474, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=50248, n_bytes=5499600, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=9312, n_bytes=29858394, idle_age=0, priority=50,ip,in_port=4 actions=ct(table=1,zone=64000)
 cookie=0xdeff105, duration=94.075s, table=0, n_packets=0, n_bytes=0, idle_age=94, priority=10,in_port=4,dl_dst=0e:5e:00:0a:14:47 actions=output:3,LOCAL
 cookie=0xdeff105, duration=94.075s, table=1, n_packets=0, n_bytes=0, idle_age=94, priority=13,udp,in_port=4,tp_dst=3784 actions=output:3,LOCAL
sh-4.4# systemctl restart NetworkManager
sh-4.4# sleep 5
sh-4.4# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000e5e000a1447
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 3(patch-br-ex_ip-): addr:5e:f1:2f:35:7d:ef
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 5(ens5): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ex): addr:0e:5e:00:0a:14:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
sh-4.4# ovs-ofctl dump-flows br-ex | egrep 'in_port=4|output:4|in_port=5|output:5'
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=2817, n_bytes=2646678, idle_age=5, priority=205,udp,in_port=4,dl_dst=0e:5e:00:0a:14:47,tp_dst=6081 actions=LOCAL
 cookie=0xa9f3b7acc211a828, duration=99.381s, table=0, n_packets=112, n_bytes=13715, idle_age=10, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=0, n_bytes=0, idle_age=99, priority=200,udp,in_port=4,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=30915, n_bytes=52537336, idle_age=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=99.381s, table=0, n_packets=0, n_bytes=0, idle_age=99, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=99.381s, table=0, n_packets=565, n_bytes=322054, idle_age=10, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=99.381s, table=0, n_packets=6, n_bytes=3041, idle_age=480, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=50378, n_bytes=5521679, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=9387, n_bytes=29996178, idle_age=5, priority=50,ip,in_port=4 actions=ct(table=1,zone=64000)
 cookie=0xdeff105, duration=99.381s, table=0, n_packets=0, n_bytes=0, idle_age=99, priority=10,in_port=4,dl_dst=0e:5e:00:0a:14:47 actions=output:3,LOCAL
 cookie=0xdeff105, duration=99.381s, table=1, n_packets=0, n_bytes=0, idle_age=99, priority=13,udp,in_port=4,tp_dst=3784 actions=output:3,LOCAL
sh-4.4# for i in {0..10}; do sleep 5; date;  ovs-ofctl dump-flows br-ex | egrep 'priority=100|priority=110'; done 
Tue Feb  1 14:28:51 UTC 2022
 cookie=0xa9f3b7acc211a828, duration=106.011s, table=0, n_packets=112, n_bytes=13715, idle_age=17, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=106.011s, table=0, n_packets=0, n_bytes=0, idle_age=106, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=106.011s, table=0, n_packets=565, n_bytes=322054, idle_age=17, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=106.011s, table=0, n_packets=6, n_bytes=3041, idle_age=486, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=1085.407s, table=0, n_packets=22996, n_bytes=14260812, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=106.011s, table=0, n_packets=50574, n_bytes=5562731, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=1085.407s, table=1, n_packets=60870, n_bytes=178000227, idle_age=11, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1085.407s, table=1, n_packets=21842, n_bytes=50374376, idle_age=11, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1085.407s, table=1, n_packets=0, n_bytes=0, idle_age=1085, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1085.407s, table=1, n_packets=0, n_bytes=0, idle_age=1085, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
Tue Feb  1 14:28:56 UTC 2022
 cookie=0xa9f3b7acc211a828, duration=111.016s, table=0, n_packets=112, n_bytes=13715, idle_age=22, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=111.016s, table=0, n_packets=0, n_bytes=0, idle_age=111, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=111.016s, table=0, n_packets=565, n_bytes=322054, idle_age=22, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=111.016s, table=0, n_packets=6, n_bytes=3041, idle_age=491, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=1090.412s, table=0, n_packets=23167, n_bytes=14312909, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=111.016s, table=0, n_packets=50730, n_bytes=5612034, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=1090.412s, table=1, n_packets=60870, n_bytes=178000227, idle_age=16, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1090.412s, table=1, n_packets=21842, n_bytes=50374376, idle_age=16, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1090.412s, table=1, n_packets=0, n_bytes=0, idle_age=1090, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1090.412s, table=1, n_packets=0, n_bytes=0, idle_age=1090, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
Tue Feb  1 14:29:01 UTC 2022
 cookie=0xa9f3b7acc211a828, duration=116.027s, table=0, n_packets=112, n_bytes=13715, idle_age=27, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=116.027s, table=0, n_packets=0, n_bytes=0, idle_age=116, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=116.027s, table=0, n_packets=565, n_bytes=322054, idle_age=27, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=116.027s, table=0, n_packets=6, n_bytes=3041, idle_age=496, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=1095.423s, table=0, n_packets=23240, n_bytes=14322781, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=116.027s, table=0, n_packets=50861, n_bytes=5650827, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=1095.423s, table=1, n_packets=60870, n_bytes=178000227, idle_age=21, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1095.423s, table=1, n_packets=21842, n_bytes=50374376, idle_age=21, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1095.423s, table=1, n_packets=0, n_bytes=0, idle_age=1095, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1095.423s, table=1, n_packets=0, n_bytes=0, idle_age=1095, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
Tue Feb  1 14:29:06 UTC 2022
 cookie=0xa9f3b7acc211a828, duration=121.035s, table=0, n_packets=112, n_bytes=13715, idle_age=32, priority=110,tcp,in_port=4,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=121.035s, table=0, n_packets=0, n_bytes=0, idle_age=121, priority=110,tcp,in_port=4,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=121.035s, table=0, n_packets=570, n_bytes=322504, idle_age=0, priority=110,tcp,in_port=3,tp_src=32648 actions=output:4
 cookie=0x7d956a9aff1e82d9, duration=121.035s, table=0, n_packets=6, n_bytes=3041, idle_age=501, priority=110,tcp,in_port=3,tp_src=30906 actions=output:4
 cookie=0xdeff105, duration=1100.431s, table=0, n_packets=23362, n_bytes=14638804, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=121.035s, table=0, n_packets=50977, n_bytes=5684377, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:4
 cookie=0xdeff105, duration=1100.431s, table=1, n_packets=60870, n_bytes=178000227, idle_age=26, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1100.431s, table=1, n_packets=21842, n_bytes=50374376, idle_age=26, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1100.431s, table=1, n_packets=0, n_bytes=0, idle_age=1100, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1100.431s, table=1, n_packets=0, n_bytes=0, idle_age=1100, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
Tue Feb  1 14:29:11 UTC 2022
 cookie=0x7d956a9aff1e82d9, duration=3.653s, table=0, n_packets=0, n_bytes=0, idle_age=3, priority=110,tcp,in_port=5,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=3.653s, table=0, n_packets=1, n_bytes=54, idle_age=0, priority=110,tcp,in_port=5,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=3.653s, table=0, n_packets=6, n_bytes=3041, idle_age=506, priority=110,tcp,in_port=3,tp_src=30906 actions=output:5
 cookie=0xa9f3b7acc211a828, duration=3.653s, table=0, n_packets=572, n_bytes=322684, idle_age=0, priority=110,tcp,in_port=3,tp_src=32648 actions=output:5
 cookie=0xdeff105, duration=1105.436s, table=0, n_packets=23847, n_bytes=14715284, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=3.653s, table=0, n_packets=51520, n_bytes=5757170, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:5
 cookie=0xdeff105, duration=1105.436s, table=1, n_packets=61499, n_bytes=180129102, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1105.436s, table=1, n_packets=21966, n_bytes=50723481, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1105.436s, table=1, n_packets=0, n_bytes=0, idle_age=1105, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1105.436s, table=1, n_packets=0, n_bytes=0, idle_age=1105, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
Tue Feb  1 14:29:16 UTC 2022
 cookie=0x7d956a9aff1e82d9, duration=8.659s, table=0, n_packets=0, n_bytes=0, idle_age=8, priority=110,tcp,in_port=5,tp_dst=30906 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xa9f3b7acc211a828, duration=8.659s, table=0, n_packets=1, n_bytes=54, idle_age=5, priority=110,tcp,in_port=5,tp_dst=32648 actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0x7d956a9aff1e82d9, duration=8.659s, table=0, n_packets=6, n_bytes=3041, idle_age=511, priority=110,tcp,in_port=3,tp_src=30906 actions=output:5
 cookie=0xa9f3b7acc211a828, duration=8.659s, table=0, n_packets=572, n_bytes=322684, idle_age=5, priority=110,tcp,in_port=3,tp_src=32648 actions=output:5
 cookie=0xdeff105, duration=1110.442s, table=0, n_packets=23986, n_bytes=14735332, idle_age=0, priority=100,ip,in_port=LOCAL actions=ct(commit,zone=64000,exec(load:0x2->NXM_NX_CT_MARK[])),NORMAL
 cookie=0xdeff105, duration=8.659s, table=0, n_packets=51926, n_bytes=5803708, idle_age=0, priority=100,ip,in_port=3 actions=ct(commit,zone=64000,exec(load:0x1->NXM_NX_CT_MARK[])),output:5
 cookie=0xdeff105, duration=1110.442s, table=1, n_packets=62019, n_bytes=181795696, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1110.442s, table=1, n_packets=22108, n_bytes=51405496, idle_age=0, priority=100,ct_state=+est+trk,ct_mark=0x2,ip actions=LOCAL
 cookie=0xdeff105, duration=1110.442s, table=1, n_packets=0, n_bytes=0, idle_age=1110, priority=100,ct_state=+rel+trk,ct_mark=0x1,ip actions=check_pkt_larger(8915)->NXM_NX_REG0[0],resubmit(,11)
 cookie=0xdeff105, duration=1110.442s, table=1, n_packets=0, n_bytes=0, idle_age=1110, priority=100,ct_state=+rel+trk,ct_mark=0x2,ip actions=LOCAL
^C
~~~

That timing might have to do with the ovnkube-node pod restarts on that node:
~~~
[akaris@linux ipi-us-east-1]$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS    RESTARTS        AGE   IP             NODE                           NOMINATED NODE   READINESS GATES
ovnkube-master-b887s   6/6     Running   0               51m   10.0.164.163   ip-10-0-164-163.ec2.internal   <none>           <none>
ovnkube-master-csttd   6/6     Running   0               53m   10.0.147.226   ip-10-0-147-226.ec2.internal   <none>           <none>
ovnkube-master-pmvfc   6/6     Running   0               55m   10.0.132.217   ip-10-0-132-217.ec2.internal   <none>           <none>
ovnkube-node-74pl5     5/5     Running   0               54m   10.0.135.122   ip-10-0-135-122.ec2.internal   <none>           <none>
ovnkube-node-8cj8d     5/5     Running   0               53m   10.0.162.245   ip-10-0-162-245.ec2.internal   <none>           <none>
ovnkube-node-bbx5q     5/5     Running   0               55m   10.0.147.226   ip-10-0-147-226.ec2.internal   <none>           <none>
ovnkube-node-gl2xk     5/5     Running   14 (2m6s ago)   54m   10.0.155.59    ip-10-0-155-59.ec2.internal    <none>           <none>
ovnkube-node-qfmbm     5/5     Running   0               55m   10.0.164.163   ip-10-0-164-163.ec2.internal   <none>           <none>
ovnkube-node-swms6     5/5     Running   0               53m   10.0.132.217   ip-10-0-132-217.ec2.internal   <none>           <none>
~~~

~~~
oc describe pod -n openshift-ovn-kubernetes ovnkube-node-gl2xk
(...)
I0201 14:27:05.433194   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::b89d:c4ff:fe14:65fa
I0201 14:27:05.433199   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::4ad:4aff:fed6:e0c7
I0201 14:27:05.433204   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::5ce4:8dff:fe37:dad
I0201 14:27:05.433209   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::a05f:40ff:fe01:2555
I0201 14:27:05.433213   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::ec5e:deff:fe0d:9b0a
I0201 14:27:05.433219   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::1447:6dff:fe9f:ef0
I0201 14:27:05.433223   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::e4bd:f3ff:feac:3b97
I0201 14:27:05.433227   21059 node_ip_handler_linux.go:213] Skipping non-useable IP address for host: fe80::7b93:10e:313b:fa77
I0201 14:27:05.439621   21059 helper_linux.go:74] Found default gateway interface br-ex 10.0.144.1
I0201 14:27:05.464040   21059 ovn_db.go:382] Not registering OVN DB Metrics on this Node since OVN DBs are not running on this node.
W0201 14:27:05.481993   21059 management-port_linux.go:309] missing management port nat rule in chain OVN-KUBE-SNAT-MGMTPORT, adding it
I0201 14:28:39.649559   21059 node_ip_handler_linux.go:63] Removing IP: 10.0.155.59, from node IP manager
I0201 14:28:39.649612   21059 kube.go:99] Setting annotations map[k8s.ovn.org/host-addresses:[]] on node ip-10-0-155-59.ec2.internal
I0201 14:28:40.553095   21059 node_ip_handler_linux.go:49] Adding IP: 10.0.155.59, to node IP manager
I0201 14:28:40.553122   21059 kube.go:99] Setting annotations map[k8s.ovn.org/host-addresses:["10.0.155.59"]] on node ip-10-0-155-59.ec2.internal
E0201 14:28:50.622980   21059 healthcheck.go:364] Fatal error: phys port ens5 ofport changed from 4 to 5
~~~

I also hit a case during my tests where NM disconnected ens5 from br-ex and would never reconnect it until I manually fixed the issue with:
~~~
ovs-vsctl add-port br-ex ens5
~~~

And that then lead to ovnkube-node crash looping:
~~~
I0201 14:06:57.863647  214702 ovs.go:204] exec(14): /usr/bin/ovs-vsctl --timeout=15 br-get-external-id br-ex bridge-uplink
I0201 14:06:57.869210  214702 ovs.go:207] exec(14): stdout: ""
I0201 14:06:57.869230  214702 ovs.go:208] exec(14): stderr: ""
I0201 14:06:57.869243  214702 ovs.go:204] exec(15): /usr/bin/ovs-vsctl --timeout=15 get interface -ex ofport
I0201 14:06:57.874760  214702 ovs.go:207] exec(15): stdout: ""
I0201 14:06:57.874779  214702 ovs.go:208] exec(15): stderr: "ovs-vsctl: no row \"-ex\" in table Interface\n"
I0201 14:06:57.874785  214702 ovs.go:210] exec(15): err: exit status 1
F0201 14:06:57.874823  214702 ovnkube.go:133] Bridge for interface failed for br-ex: Failed to find intfName for br-ex: failed to get ofport of -ex, stderr: "ovs-vsctl: no row \"-ex\" in table Interface\n", error: exit status 1
~~~

Comment 16 Andreas Karis 2022-02-01 14:35:53 UTC
Fix upstream: https://github.com/ovn-org/ovn-kubernetes/pull/2785

Comment 19 Ross Brattain 2022-02-01 20:42:58 UTC
Unable to reproduce the issue with simulated link flap (`ip l s ens3 down` on host or virtual network adapter disconnect/reconnect on vSphere).  Have not tested link flap on baremetal.  It looks like link flaps can't trigger the issue.

I'm not sure what endogenous events could cause NetworkManager to stop or restart.  This is a low probability, high impact issue.  If it happens nodes will eventually fail connectivity health checks in things like AWS UI, so admin will see node as unhealthy and reboot in the rare case that it happens.

We could document something like "if NetworkManager ever restarts the node will lose network connectivity and must be recovered."

Customers should be advised to not manually restart NetworkManager without atomically restarting openvswitch as well, as indicated in Comment 9.

Comment 21 Ross Brattain 2022-02-08 15:01:51 UTC
Removing "link flap" from summary, issue does not occur on link flap.

Clarifying release note text, issue only occurs on OVNKubernetes.

Doc Text verified.

Comment 24 Ross Brattain 2022-03-04 18:57:57 UTC
Verified on 4.11.0-0.nightly-2022-03-04-063157

Starting pod/rbrattai-o411i53-j7xqh-master-0-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="2" )

Starting pod/rbrattai-o411i53-j7xqh-master-1-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="2" )

Starting pod/rbrattai-o411i53-j7xqh-master-2-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="2" )

Starting pod/rbrattai-o411i53-j7xqh-worker-1-q9lvk-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="1" )

Starting pod/rbrattai-o411i53-j7xqh-worker-2-2xqt4-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="1" )

Starting pod/rbrattai-o411i53-j7xqh-worker-3-9mf6k-debug ...
/run/ofport_requests.br-ex:declare -A INTERFACES=([ens3]="2" )

Mar 04 17:52:55.273869 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: + INTERFACE_NAME=ens3
Mar 04 17:52:55.274703 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: ++ awk -F : '{if($1=="ens3" && $2!~/^ovs*/) print $NF}'
Mar 04 17:52:56.257974 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: + ovs-vsctl list interface ens3
Mar 04 17:52:56.284102 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: declare -A INTERFACES=([ens3]="2" )
Mar 04 17:52:56.284341 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: ++ INTERFACES=([ens3]="2")
Mar 04 17:52:56.284632 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[172425]: + ovs-vsctl set Interface ens3 ofport_request=2
Mar 04 18:13:34.749790 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: + INTERFACE_NAME=ens3
Mar 04 18:13:34.750666 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: ++ awk -F : '{if($1=="ens3" && $2!~/^ovs*/) print $NF}'
Mar 04 18:13:35.780342 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: + ovs-vsctl list interface ens3
Mar 04 18:13:35.805120 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: declare -A INTERFACES=([ens3]="2" )
Mar 04 18:13:35.805293 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: ++ INTERFACES=([ens3]="2")
Mar 04 18:13:35.805505 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[223669]: + ovs-vsctl set Interface ens3 ofport_request=2
Mar 04 18:30:45.785195 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: + INTERFACE_NAME=ens3
Mar 04 18:30:45.786035 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: ++ awk -F : '{if($1=="ens3" && $2!~/^ovs*/) print $NF}'
Mar 04 18:30:46.804640 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: + ovs-vsctl list interface ens3
Mar 04 18:30:46.829820 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: declare -A INTERFACES=([ens3]="2" )
Mar 04 18:30:46.830011 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: ++ INTERFACES=([ens3]="2")
Mar 04 18:30:46.830184 rbrattai-o411i53-j7xqh-master-0 nm-dispatcher[267177]: + ovs-vsctl set Interface ens3 ofport_request=2

Comment 26 errata-xmlrpc 2022-08-10 10:45:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.