Bug 2010663 - OpenShift Alerting Rules Style-Guide Compliance (ovn-kubernetes subcomponent)
Summary: OpenShift Alerting Rules Style-Guide Compliance (ovn-kubernetes subcomponent)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: All
OS: All
medium
low
Target Milestone: ---
: 4.10.0
Assignee: Martin Kennelly
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-05 10:10 UTC by Brad Ison
Modified: 2022-03-28 19:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:16:32 UTC
Target Upstream Version:
Embargoed:
mkennell: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1246 0 None open Bug 2010663: OVN-K alerts: conform to monitoring team style guide 2021-12-09 15:08:51 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:16:43 UTC

Description Brad Ison 2021-10-05 10:10:20 UTC
Hello,

The OpenShift Monitoring Team has published a set guidelines for
writing alerting rules in OpenShift, including a basic style guide.
You can find these here:

  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md
  https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

A subset of these are now being enforced in OpenShift End-to-End
tests [1], with temporary exceptions for existing non-compliant rules.

This component was found to have the following issues:

* Alerts without summary and/or description annotations:

  - NetworkPodsCrashLooping
  - NoOvnMasterLeader
  - NoRunningOvnMaster
  - NodeWithoutOVNKubeNodePodRunning
  - NorthboundStale
  - SouthboundStale
  - V4SubnetAllocationThresholdExceeded
  - V6SubnetAllocationThresholdExceeded

Alerts MUST include summary and description annotations.

Think of summary as the first line of a commit message, or an email
subject line. It should be brief but informative. The description is
the longer, more detailed explanation of the alert.

The enhancement document linked above has examples of alerts with
these annotations.

Thank you!

Repo: openshift/cluster-network-operator (ovn-kubernetes subcomponent)

[1]: https://github.com/openshift/origin/commit/097e7a6

Comment 1 Martin Kennelly 2021-12-09 12:51:14 UTC
Do we need to update metrics prefixed with component name? I see thats a requirement in style guide.
We could be breaking any client consumer scripts if we do this.

Comment 2 Brad Ison 2021-12-10 11:00:37 UTC
No, that's not a strict requirement. No need to change existing metric names in this case.

Comment 6 Martin Kennelly 2022-01-21 11:08:21 UTC
@mik

Comment 7 Martin Kennelly 2022-01-21 11:11:15 UTC
@mifiedle hey, I didnt see it in the change log either but I launch the latest nightly there (4.10.0-0.nightly-2022-01-21-074618) and I could see the results of my PR in OpenShift console -> observability -> alerts -> alerts rules.

Comment 8 Mike Fiedler 2022-01-26 16:40:00 UTC
Verified on 4.10.0-0.nightly-2022-01-25-023600 via Alert Rule inspection in the console.

Note:  bz description references NetworkPodsCrashLooping, but no rule with that name exists in the console nor was changed in the PR for this bug.

Comment 11 errata-xmlrpc 2022-03-10 16:16:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.