2073937 – Invalid retention time and invalid retention size should be validated at one place and have error log in one place for UMW

Bug 2073937 - Invalid retention time and invalid retention size should be validated at one place and have error log in one place for UMW

Summary: Invalid retention time and invalid retention size should be validated at one ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Jayapriya Pai
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-11 08:04 UTC by hongyan li
Modified:	2022-08-10 11:05 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:05:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:05:58 UTC

Description hongyan li 2022-04-11 08:04:01 UTC

Description of problem:
There is no error in prometheus-operator pod when giving retentionSize: 10

Version-Release number of selected component (if applicable):
4.11.0-0.ci.test-2022-04-11-064701-ci-ln-xzxri3t-latest
PR:https://github.com/openshift/cluster-monitoring-operator/pull/1630

How reproducible:
always

Steps to Reproduce:
% oc apply -f - <<EOF                                                     
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      retentionSize: 10
EOF
configmap/user-workload-monitoring-config configured

% oc -n openshift-user-workload-monitoring get pod
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-6cc998bd66-mlz2c   2/2     Running   0          26m
% oc -n openshift-user-workload-monitoring logs prometheus-operator-6cc998bd66-mlz2c
------
level=info ts=2022-04-11T07:37:34.689907077Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:50.677822089Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:52.314321869Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:37:52.484480228Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"
level=info ts=2022-04-11T07:38:08.420120921Z caller=operator.go:1228 component=prometheusoperator key=openshift-user-workload-monitoring/user-workload msg="sync prometheus"

Actual results:
There is no error in prometheus-operator pod when giving retentionSize: 10


Expected results:
There is error in prometheus-operator pod when giving retentionSize: 10


Additional info:

Comment 1 hongyan li 2022-04-11 08:55:08 UTC

For latest ocp 4.11 without your pr, given invalid retention time value, will get error in the log

Comment 2 Jayapriya Pai 2022-04-11 09:41:28 UTC

W0411 09:29:18.951211       1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$'


I see error in cluster-monitoring-operator logs

Comment 3 hongyan li 2022-04-12 02:04:59 UTC

Retention time and retention size should be verified at the same place.

Comment 4 Jayapriya Pai 2022-04-12 08:22:29 UTC

We had added openapi validation at CRD level to retentionSize https://github.com/prometheus-operator/prometheus-operator/pull/4661 in prometheus-operator

That change is synced to cmo as well as part of kube-prometheus update https://github.com/openshift/cluster-monitoring-operator/pull/1615/files. So RetentionSize now is validated by the API server. If the value is invalid CMO throws an error when applying the Prometheus spec, since its rejected prometheus-operator never even gets to see the value hence no logging is done there.

Once https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged you will see same effect for retention as well which will log error to cmo log

Comment 5 hongyan li 2022-04-12 08:41:48 UTC

For validate of retention time will be updated to be consistent with retention size, so I changed target release.

Comment 6 hongyan li 2022-04-13 05:01:33 UTC

Test with ocp version 4.11.0-0.nightly-2022-04-12-072444
prometheus version 2.34.0
prometheus operator version 0.55.1

Comment 7 Jayapriya Pai 2022-04-18 04:44:59 UTC

https://github.com/prometheus-operator/prometheus-operator/pull/4684 is merged 
https://github.com/openshift/cluster-monitoring-operator/pull/1643 once jsonnet dependencies synced time interval fields in prometheus will be logged to cmo logs itself

Comment 8 Jayapriya Pai 2022-04-22 02:41:32 UTC

https://github.com/openshift/cluster-monitoring-operator/pull/1643 is merged which contains the retention time changes as well

Comment 10 hongyan li 2022-04-24 02:58:04 UTC

Test with payload 4.11.0-0.nightly-2022-04-23-153426

When give invalid retention time and invalid retention size, error information of both is logged in CMO

% oc -n openshift-monitoring logs cluster-monitoring-operator-bdb67f748-fwznk cluster-monitoring-operator
....
W0424 02:55:21.677523       1 tasks.go:71] task 5 of 14: Updating Prometheus-user-workload failed: reconciling UserWorkload Prometheus object failed: updating Prometheus object failed: Prometheus.monitoring.coreos.com "user-workload" is invalid: [spec.retention: Invalid value: "10": spec.retention in body should match '^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$', spec.retentionSize: Invalid value: "10": spec.retentionSize in body should match '(^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$']
I0424 02:55:21.693870       1 tasks.go:74] ran task 10 of 14: Updating prometheus-adapter

Comment 14 errata-xmlrpc 2022-08-10 11:05:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.