Description of problem: There is no error in prometheus-operator pod when giving retentionSize: 10 Version-Release number of selected component (if applicable): 4.11.0-0.ci.test-2022-04-11-064701-ci-ln-xzxri3t-latest PR:https://github.com/openshift/cluster-monitoring-operator/pull/1630 How reproducible: always Steps to Reproduce: % oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: retentionSize: 10 EOF configmap/user-workload-monitoring-config configured % oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-6cc998bd66-mlz2c 2/2 Running 0 26m % oc -n openshift-user-workload-monitoring logs prometheus-operator-6cc998bd66-mlz2c ------ level=info ts=2022-04-11T07:37:34.689907077Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus" level=info ts=2022-04-11T07:37:50.677822089Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus" level=info ts=2022-04-11T07:37:52.314321869Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus" level=info ts=2022-04-11T07:37:52.484480228Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus" level=info ts=2022-04-11T07:38:08.420120921Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus" Actual results: There is no error in prometheus-operator pod when giving retentionSize: 10 Expected results: There is error in prometheus-operator pod when giving retentionSize: 10 Additional info:
For latest ocp 4.11 without your pr, given invalid retention time value, will get error in the log
W0411 09:29:18.951211 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$' I see error in cluster-monitoring-operator logs
Retention time and retention size should be verified at the same place.
We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there. Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log
For validate of retention time will be updated to be consistent with retention size, so I changed target release.
Test with ocp version 4.11.0-0.nightly-2022-04-12-072444 prometheus version 2.34.0 prometheus operator version 0.55.1
https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself
https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well
Test with payload 4.11.0-0.nightly-2022-04-23-153426 When give invalid retention time and invalid retention size, error information of both is logged in CMO % oc -n openshift-monitoring logs cluster-monitoring-operator-bdb67f748-fwznk cluster-monitoring-operator .... W0424 02:55:21.677523 1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: [spec.retention: Invalid value: "10": spec.retention in body should match '^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$', spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$'] I0424 02:55:21.693870 1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069