Bug 2015386 - Possibility to add labels to the built-in OCP alerts
Summary: Possibility to add labels to the built-in OCP alerts
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.10.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-19 05:38 UTC by Vedanti Jaypurkar
Modified: 2024-12-20 21:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-12 04:39:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1439 0 None open Bug 2015386: jsonnet: Add PodDisruptionBudget to KSM metric allow list 2021-10-19 11:24:09 UTC
Github openshift cluster-monitoring-operator pull 1516 0 None open Bug 2015386: Enable PDB label metric 2021-12-17 17:51:50 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:39:32 UTC

Comment 1 David J. M. Karlsen 2021-10-19 10:06:40 UTC
This will be particularly useful for PDBs. Today these just get the namespace label, but if arbitrary labels could be picked up from the PDB, this will make it easier to route-based alerting in alertmanager.

Comment 2 Philip Gough 2021-10-20 13:14:30 UTC
Hi David, I've created a PR (linked) that exposes the PDB metrics via the allow-list.

However if you take a look upstream https://github.com/kubernetes/kube-state-metrics/blob/master/docs/poddisruptionbudget-metrics.md you can see that this particular resource has no additional labels available to bolt on.

We could treat this as the first step of a RFE and I can propose upstream that we add "kube_poddisruptionbudget_labels" and "kube_poddisruptionbudget_annotations" metrics which is what I believe is the first step in solving your requirements. These are available already on many other resources already such as namespace. https://github.com/kubernetes/kube-state-metrics/blob/master/docs/namespace-metrics.md

Comment 4 David J. M. Karlsen 2021-10-27 18:57:28 UTC
OK, so what you are saying is that https://github.com/openshift/cluster-monitoring-operator/pull/1439/files#diff-b61f7d6e3529525eef15693c9529b4e065ac3e9d1af6308573e42e825fc1218bR37 won't expose the labels on the PDB, because KSM does not expose them: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/poddisruptionbudget-metrics.md ?

Having kube_poddisruptionbudget_labels/annotations makes sense, as one can then provide labels and hence routing in alert mgr based on these, which is very useful and how we do routing.

Another option would be to "join" this metric with the namespace labels, so that one could simply label the namespace to obtain the routing - but that's not how the other alerts are designed in OCP, so I guess we don't want to go down that route?

Comment 5 Philip Gough 2021-11-01 10:14:50 UTC
Hi David, yes you are correct, there will be no additional series or labels other than https://github.com/kubernetes/kube-state-metrics/blob/v2.2.3/docs/poddisruptionbudget-metrics.md exposed via KSM.

No we have already merged https://github.com/kubernetes/kube-state-metrics/pull/1623 to move this RFE forward and expose those additional series. I'll merge https://github.com/openshift/cluster-monitoring-operator/pull/1439 also.

As per the comment around the join, that would indeed work, but the majority of the alerts are being pulled from upstream and I don't think that is the road we want to go down to fill individual use cases. Hopefully that is understandable. I think the above changes in combination with https://issues.redhat.com/browse/OBSDA-2 will allow you to tweak the alerts according to specific needs.

Let me know if that satisfies this RFE and we can close it.

Thanks

Comment 6 David J. M. Karlsen 2021-11-01 10:27:35 UTC
I think this is as good as it can get at this stage, thanks!
This can be closed.

Comment 10 Philip Gough 2021-11-04 10:02:41 UTC
As mentioned, we need to wait for a release of KSM to be cut that includes https://github.com/kubernetes/kube-state-metrics/pull/1623 and pull it to our downstream fork before verifying this change.

Comment 11 Philip Gough 2021-11-18 14:37:50 UTC
Reassigning to @filip since the final piece of this ticket requires cutting a new release of KSM which is scheduled for mid December. That in conjunction with the ability to override the default alerts (https://github.com/openshift/enhancements/pull/958) and https://github.com/openshift/cluster-monitoring-operator/pull/1439 should provide customer with the ability to achieve what they want and close the RFE.

Comment 13 Junqi Zhao 2021-12-20 11:20:17 UTC
tested with 4.10.0-0.nightly-2021-12-18-034942, kube_poddisruptionbudget_annotations and kube_poddisruptionbudget_labels is added, but we only could see pdb labels for kube_poddisruptionbudget_labels, can't see the pdb annotations from kube_poddisruptionbudget_annotations
# oc -n openshift-monitoring get deploy kube-state-metrics -oyaml | grep metric-labels-allowlist
        - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*],poddisruptionbudget=[*]


# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep poddisruptionbudget
    "kube_poddisruptionbudget_annotations",
    "kube_poddisruptionbudget_labels",
    "kube_poddisruptionbudget_status_current_healthy",
    "kube_poddisruptionbudget_status_desired_healthy",
    "kube_poddisruptionbudget_status_expected_pods",
    "kube_poddisruptionbudget_status_observed_generation",
    "kube_poddisruptionbudget_status_pod_disruptions_allowed",

pdb file
**********************
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: zk-cm
  annotations:
    imageregistry: "https://hub.docker.com/"
    contactor: help
  labels:
    app.kubernetes.io/component: zookeeper
    app.kubernetes.io/instance: main
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: zookeeper
**********************
# oc -n default get pdb zk-pdb -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    contactor: help
    imageregistry: https://hub.docker.com/
  creationTimestamp: "2021-12-20T10:50:51Z"
  generation: 1
  labels:
    app.kubernetes.io/component: zookeeper
    app.kubernetes.io/instance: main
  name: zk-pdb
  namespace: default
  resourceVersion: "211532"
  uid: ef4b4060-314c-46de-85fb-592b098c8c93
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: zookeeper
status:
  conditions:
  - lastTransitionTime: "2021-12-20T10:50:51Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 0
  desiredHealthy: 2
  disruptionsAllowed: 0
  expectedPods: 0
  observedGeneration: 1
**********************
could see pdb labels for kube_poddisruptionbudget_labels
kube_poddisruptionbudget_labels{container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", label_app_kubernetes_io_component="zookeeper", label_app_kubernetes_io_instance="main", namespace="default", poddisruptionbudget="zk-pdb", service="kube-state-metrics"} 1

can't find the pdb annotations from kube_poddisruptionbudget_annotations
kube_poddisruptionbudget_annotations{container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="default", poddisruptionbudget="zk-pdb", service="kube-state-metrics"}  1

also find, we can not get annotations from kube_*_annotations, example: kube_daemonset_annotations, kube_deployment_annotations
# oc -n openshift-monitoring get ds node-exporter -o jsonpath="{.metadata.annotations}"
{"deprecated.daemonset.template.generation":"1"}

result from prometheus
kube_daemonset_annotations{container="kube-rbac-proxy-main", daemonset="node-exporter", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", service="kube-state-metrics"} 1

# oc -n openshift-monitoring get deploy cluster-monitoring-operator -o jsonpath="{.metadata.annotations}"
{"deployment.kubernetes.io/revision":"1","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}

result from prometheus
kube_deployment_annotations{container="kube-rbac-proxy-main", deployment="cluster-monitoring-operator", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", prometheus="openshift-monitoring/k8s", service="kube-state-metrics"}  1

Comment 15 Simon Pasquier 2021-12-22 09:56:02 UTC
@Junqi it is expected that kube_poddisruptionbudget_annotations isn't present, we choose to expose kube_poddisruptionbudget_labels only which should be enough to filter .

Comment 16 Junqi Zhao 2021-12-22 11:03:15 UTC
based on Comment 14 and 15, set to VERIFIED

Comment 17 Junqi Zhao 2021-12-22 11:04:00 UTC
(In reply to Junqi Zhao from comment #16)
> based on Comment 14 and 15, set to VERIFIED

change to
based on Comment 13 and 15, set to VERIFIED

Comment 21 errata-xmlrpc 2022-03-12 04:39:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.