Description of problem: With Kuryr, the CNI requests can take a considerable time given that it has to wait for a VIF from Neutron. We've seen warning alerts being raised with KuryrCNISlow and reported on the following test "Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured". The test failure makes the Kuryr upgrade to fail. Version-Release number of selected component (if applicable): How reproducible: Upgrade from OCP 4.9 to OCP 4.10. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Verified with the following steps: - Installed OCP 4.10.0-0.nightly-2022-04-27-212741 on top of RHOS-16.1-RHEL-8-20220329.n.1 with Kuryr. - Make sure the cluster is up and the Watchdog and AlertmanagerReceiversNotConfigured alerts exist: ``` (shiftstack) [stack@undercloud-0 ~]$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' "Watchdog" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "APIRemovedInNextEUSReleaseInUse" "APIRemovedInNextEUSReleaseInUse" "AlertmanagerReceiversNotConfigured" ``` - Upgraded successfully to 4.11.0-0.nightly-2022-04-26-181148 using the upgrade command: ``` $ oc adm upgrade --to-image="registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-26-181148" --allow-explicit-upgrade --force=true ``` - Make sure the cluster is up. - Check the alerts, the Watchdog and AlertmanagerReceiversNotConfigured alerts exist, but the KuryrCNISlow is not. ``` (shiftstack) [stack@undercloud-0 ~]$ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "NodeClockNotSynchronising" "AlertmanagerReceiversNotConfigured" "Watchdog" ``` - Keep checking the alerts and make sure the KuryrCNISlow is not raised. - Destroy and create the cluster with OCP 4.11.0-0.nightly-2022-04-26-181148 version. - Keep checking the alerts and make sure the KuryrCNISlow is not raised.
The similar issue is seen for version 4.8.45 Description of problem: test "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" this test is failing consistently on latest 4.8.45 build. Version-Release number of selected component (if applicable): [root@rdr-zscurst-348a-bastion-0 ~]# oc version Client Version: 4.8.44 Server Version: 4.8.45 Kubernetes Version: v1.21.11+6b3cbdd How reproducible: Deploy the newly come 4.8.45 on power platform and run e2e test. Actual results: Test is failing. Flaky invariants: [sig-arch] Monitor cluster while tests execute Failing tests: [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] Expected results: Test should pass without any error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069