Bug 2074544

Summary: e2e-metal-ipi-ovn-ipv6 failing due to recent CEO changes
Product: OpenShift Container Platform Reporter: Casey Callendrello <cdc>
Component: NetworkingAssignee: jamo luhrsen <jluhrsen>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: dcbw, dwest, jluhrsen, jniu, shardy
Version: 4.11Flags: jluhrsen: needinfo-
jluhrsen: needinfo-
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:06:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Casey Callendrello 2022-04-12 13:15:47 UTC
There appears to be something wrong with etcd on this job. It has a 99% failure rate -- https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6

I have filed a PR to make it optional. Once we fix the issue, we can re-enable it.

Comment 1 Steven Hardy 2022-04-12 13:22:20 UTC
> I have filed a PR to make it optional. Once we fix the issue, we can re-enable it.

It's a real regression so I'd prefer we fix the issue - I pushed https://github.com/openshift/cluster-etcd-operator/pull/785 but investigation/testing ongoing to confirm if that's the only issue

Comment 2 Casey Callendrello 2022-04-12 13:39:22 UTC
Stephen --

Agreed, this is a real regression. However, thanks to the interesting nature of prow and merge pools, it is essentially impossible to merge any PRs until this is fixed. Hence the blocker bug.

With less than two weeks to go before feature freeze, we can't afford to be stuck for another week.

Feel free to file a PR re-enabling the job when things are stable.

Comment 3 Steven Hardy 2022-04-12 16:12:21 UTC
It seems my fix isn't sufficient, I'll remove my assignment so this can hopefully be triaged/investigated by the etcd team

I triggered e2e-metal-ipi-ovn-ipv6 on https://github.com/openshift/cluster-etcd-operator/pull/785 so we can hopefully collect more details re the remaining issues

Comment 4 Steven Hardy 2022-04-12 17:57:18 UTC
Spotted some similar issues with https://github.com/openshift/cluster-etcd-operator/pull/784 - updated my PR with another fix and re-testing

Comment 5 Steven Hardy 2022-04-12 18:29:25 UTC
Not yet got the fixes working so trying a revert https://github.com/openshift/cluster-etcd-operator/pull/786 (this did work locally for me, but lets confirm in CI)

Comment 7 Casey Callendrello 2022-04-25 10:26:30 UTC
According to testgrid [1], this job finally went green on 4/22. So, yes, I think we can set it as blocking for CNO if desired.

1: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6

Comment 8 melbeher 2022-04-25 17:45:33 UTC
@cdc Would you kindly change the component if possible ? .. I think it is no longer etcd problem, or ? 

cc @dwest

Comment 9 Casey Callendrello 2022-04-26 10:51:18 UTC
Agreed, we can kick this back to the SDN team. (It should have been a BLOCKS bug for etcd team anyways). Thanks for the prompt fix.

Comment 10 jamo luhrsen 2022-05-10 21:28:56 UTC
This job is no longer failing at such a high rate. sippy [0] is showing that it's passing more than 50% of
the time. This PR [1] will revert the initial change that made these jobs optional in CNO and OVNK. Also,
it's good to see that the job is also a payload blocker again [2], as it was also moved to informing/optional
by TRT when it was failing so often.


[0] https://sippy.dptools.openshift.org/sippy-ng/jobs/4.11/analysis?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6%22%7D%5D%7D
[1] https://github.com/openshift/release/pull/28469
[2] https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/#4.11.0-0.nightly

Comment 18 errata-xmlrpc 2022-08-10 11:06:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069