Bug 2073937 - Invalid retention time and invalid retention size should be validated at one place and have error log in one place for UMW
Summary: Invalid retention time and invalid retention size should be validated at one ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Jayapriya Pai
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-11 08:04 UTC by hongyan li
Modified: 2022-08-10 11:05 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:05:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:05:58 UTC

Description hongyan li 2022-04-11 08:04:01 UTC
Description of problem:
There is no error in prometheus-operator pod when giving retentionSize: 10

Version-Release number of selected component (if applicable):
4.11.0-0.ci.test-2022-04-11-064701-ci-ln-xzxri3t-latest
PR:https://github.com/openshift/cluster-monitoring-operator/pull/1630

How reproducible:
always

Steps to Reproduce:
% oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      retentionSize: 10
EOF
configmap/user-workload-monitoring-config configured

% oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-6cc998bd66-mlz2c   2/2     Running   0          26m
% oc -n openshift-user-workload-monitoring logs prometheus-operator-6cc998bd66-mlz2c
------
level=info ts=2022-04-11T07:37:34.689907077Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:50.677822089Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:52.314321869Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:52.484480228Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:38:08.420120921Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"

Actual results:
There is no error in prometheus-operator pod when giving retentionSize: 10


Expected results:
There is error in prometheus-operator pod when giving retentionSize: 10


Additional info:

Comment 1 hongyan li 2022-04-11 08:55:08 UTC
For latest ocp 4.11 without your pr, given invalid retention time value, will get error in the log

Comment 2 Jayapriya Pai 2022-04-11 09:41:28 UTC
W0411 09:29:18.951211       1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$'


I see error in cluster-monitoring-operator logs

Comment 3 hongyan li 2022-04-12 02:04:59 UTC
Retention time and retention size should be verified at the same place.

Comment 4 Jayapriya Pai 2022-04-12 08:22:29 UTC
We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator

That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there.

Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log

Comment 5 hongyan li 2022-04-12 08:41:48 UTC
For validate of retention time will be updated to be consistent with retention size, so I changed target release.

Comment 6 hongyan li 2022-04-13 05:01:33 UTC
Test with ocp version 4.11.0-0.nightly-2022-04-12-072444
prometheus version 2.34.0
prometheus operator version 0.55.1

Comment 7 Jayapriya Pai 2022-04-18 04:44:59 UTC
https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged 
https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself

Comment 8 Jayapriya Pai 2022-04-22 02:41:32 UTC
https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well

Comment 10 hongyan li 2022-04-24 02:58:04 UTC
Test with payload 4.11.0-0.nightly-2022-04-23-153426

When give invalid retention time and invalid retention size, error information of both is logged in CMO

% oc -n openshift-monitoring logs cluster-monitoring-operator-bdb67f748-fwznk cluster-monitoring-operator
....
W0424 02:55:21.677523       1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: [spec.retention: Invalid value: "10": spec.retention in body should match '^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$', spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$']
I0424 02:55:21.693870       1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter

Comment 14 errata-xmlrpc 2022-08-10 11:05:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.