Description of problem: CNO changes to configure EgressIP timeout Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Verified in 4.12.0-0.nightly-2022-07-20-030220 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-07-20-030220 True False 3h4m Cluster version is 4.12.0-0.nightly-2022-07-20-030220 $ oc label node jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal labeled $ oc label node jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal labeled $ oc new-project test $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101 oc edit networks.operator.openshift.io -oyaml $ oc get networks.operator.openshift.io -oyaml apiVersion: v1 items: - apiVersion: operator.openshift.io/v1 kind: Network metadata: annotations: networkoperator.openshift.io/ovn-cluster-initiator: 10.0.0.5 creationTimestamp: "2022-07-20T21:15:45Z" generation: 92 name: cluster resourceVersion: "41478" uid: a3642a14-7e54-4402-87b2-52d6a221477a spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 defaultNetwork: ovnKubernetesConfig: egressIPConfig: reachabilityTotalTimeoutSeconds: 10 $ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator deployment.apps/cluster-version-operator scaled $ oc get -n openshift-cluster-version deployments/cluster-version-operator NAME READY UP-TO-DATE AVAILABLE AGE cluster-version-operator 1/1 1 1 3h27m $ oc debug node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal Warning: would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") Starting pod/jechen-0720d-l56lv-worker-a-cv5d9copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.128.3 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# sh-4.4# sh-4.4# sh-4.4# shutdown Shutdown scheduled for Wed 2022-07-20 22:25:06 UTC, use 'shutdown -c' to cancel. $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101 $ date Wed Jul 20 18:25:33 EDT 2022 $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101 $ date Wed Jul 20 18:25:37 EDT 2022 oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101 $ date Wed Jul 20 18:25:40 EDT 2022 $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 $ date Wed Jul 20 18:25:45 EDT 2022 $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal 10.0.128.101 measured the time it took to failover to the second egressNode, it is about 10s # repeat the steps above, increased reachabilityTotalTimeoutSeconds to 20s, repeated the test above measured the time it took to failover to the other egressNode, it was about 20s ==> Verified reachabilityTotalTimeoutSeconds
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days