Description of problem: Parsing of catalogsource values is not done. Version-Release number of selected component (if applicable): Openshift 4.7 How reproducible: Create a catalogsource with the following yaml, example of incorrect input: "interval: 45mError code" apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog namespace: openshift-marketplace spec: displayName: ibm-operator-catalog publisher: IBM Content sourceType: grpc image: docker.io/ibmcom/ibm-operator-catalog updateStrategy: registryPoll: interval: 45mError code The catalog source gets created but the marketplace operator cannot handle the invalid interval and so the logs are full with this error: reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha1.CatalogSource: v1alpha1.CatalogSourceList.Items: []v1alpha1.CatalogSource: v1alpha1.CatalogSource.v1alpha1.CatalogSource.Spec: v1alpha1.CatalogSourceSpec.UpdateStrategy: v1alpha1.UpdateStrategy.RegistryPoll: v1alpha1.RegistryPoll.Interval: unmarshalerDecoder: time: unknown unit "mCopy code" in duration "45mError code", error found in #10 byte of ...|Copy code"}}}},{"api|..., bigger context ...|rategy":{"registryPoll":{"interval":"45mCopy code"}}}},{"apiVersion":"operators.coreos.com/v1alpha1"|... Actual results: No parsing. All of the catalog source pods are constantly being recreated as soon as they become ready which means no operators are available in the operator hub. There should be a validating webhook on catalogsources to catch this sort of issue, to prevent cluster errors. Expected results: Parsing of expected values. Additional info:
I don't think that adding a validating webhook is something that would be accepted as a backportable patch, but it certainly seems like other catalog sources shouldn't be forcefully recreated in this case. It's probably reasonable for us to backport a fix for that issue and include something on the status, and try to come up with some more holistic upfront validation in a future release.
I think the focus at least should be that this should not affect other catalogue sources in case any syntax issue is done. Let me know if you need anything else.
Change is open, pull requests still need review
PR has been merged - ready for QA
1, Create an OCP cluster that contains the fixed PR: https://github.com/openshift/operator-framework-olm/pull/279 mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-11-055105 -a .dockerconfigjson --commits|grep olm operator-lifecycle-manager https://github.com/openshift/operator-framework-olm ed7cf0db6fe1f5e91990ca2c02593ba7d1e3cc2e operator-registry https://github.com/openshift/operator-framework-olm ed7cf0db6fe1f5e91990ca2c02593ba7d1e3cc2e mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-11-055105 True False 42m Cluster version is 4.11.0-0.nightly-2022-04-11-055105 2, Create a CatalogSource that contains the syntax issue, as follows, mac:~ jianzhang$ cat cs-issue.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog namespace: openshift-marketplace spec: displayName: ibm-operator-catalog publisher: IBM Content sourceType: grpc image: docker.io/ibmcom/ibm-operator-catalog updateStrategy: registryPoll: interval: 45mError code mac:~ jianzhang$ oc create -f cs-issue.yaml catalogsource.operators.coreos.com/ibm-operator-catalog created 3, Check the marketplace-operator logs and the CatalogSource status. I can find some errors in the marketplace-operator logs, but no reasons/messages on that issued catalogsource, as follows, W0411 07:02:07.511359 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code" E0411 07:02:07.511403 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1alpha1.CatalogSource: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code" mac:~ jianzhang$ oc get catalogsource ibm-operator-catalog -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-04-11T07:01:30Z" generation: 1 name: ibm-operator-catalog namespace: openshift-marketplace resourceVersion: "41957" uid: 8c217d55-44ca-45e2-829a-da0f90c2d9a4 spec: displayName: ibm-operator-catalog image: docker.io/ibmcom/ibm-operator-catalog publisher: IBM Content sourceType: grpc updateStrategy: registryPoll: interval: 45mError code status: connectionState: address: ibm-operator-catalog.openshift-marketplace.svc:50051 lastConnect: "2022-04-11T07:01:55Z" lastObservedState: READY latestImageRegistryPoll: "2022-04-11T07:19:47Z" registryService: createdAt: "2022-04-11T07:01:31Z" port: "50051" protocol: grpc serviceName: ibm-operator-catalog serviceNamespace: openshift-marketplace Change the status to ASSIGNED. PS: > I think the focus at least should be that this should not affect other catalogue sources in case any syntax issue is done. > We should preferably return an error here, but the immediate fix would be to ensure that this doesn't cause problems for other catalogs on the cluster. I also try to test this bug on a cluster without the fixed PR, but I couldn't reproduce it. The other catalogsource works well. mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-08-205307 True False 7h59m Cluster version is 4.11.0-0.nightly-2022-04-08-205307 1, Create that issued catalogsource. mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 8h community-operators Community Operators grpc Red Hat 8h ibm-operator-catalog ibm-operator-catalog grpc IBM Content 67m qe-app-registry Production Operators grpc OpenShift QE 7h57m redhat-marketplace Red Hat Marketplace grpc Red Hat 8h redhat-operators Red Hat Operators grpc Red Hat 8h mac:~ jianzhang$ oc get catalogsource -n openshift-marketplace ibm-operator-catalog -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-04-11T06:30:17Z" generation: 1 name: ibm-operator-catalog namespace: openshift-marketplace resourceVersion: "195259" uid: eb81a878-7bf3-459e-bd50-c70bc04fc179 spec: displayName: ibm-operator-catalog image: docker.io/ibmcom/ibm-operator-catalog publisher: IBM Content sourceType: grpc updateStrategy: registryPoll: interval: 45mError code status: connectionState: address: ibm-operator-catalog.openshift-marketplace.svc:50051 lastConnect: "2022-04-11T06:36:51Z" lastObservedState: READY latestImageRegistryPoll: "2022-04-11T07:24:49Z" registryService: createdAt: "2022-04-11T06:30:17Z" port: "50051" protocol: grpc serviceName: ibm-operator-catalog serviceNamespace: openshift-marketplace mac:~ jianzhang$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE 02e3403ec6e01f1c5f1ed01afd671c795f8412982e73971271b39b1d16rh5tc 0/1 Completed 0 7h56m 725fb0713557581cb01780e1cdfbc0d7492ca604a54b9e773fb39be15ewdqdl 0/1 Completed 0 7h56m certified-operators-9l2bc 1/1 Running 0 59s community-operators-5vmwt 1/1 Running 0 8h e8c9651078ae45ddb2807e3a07727d459b82d7def5572a7b7ccaae332b2lxqd 0/1 Completed 0 4m22s ibm-operator-catalog-7mr5q 1/1 Running 0 66m marketplace-operator-59f5d78dcf-ddsk9 1/1 Running 0 8h qe-app-registry-5j67j 1/1 Running 0 6h38m redhat-marketplace-bdxhj 1/1 Running 0 8h redhat-operators-5j5cq 1/1 Running 0 8h > Actual results: No parsing. All of the catalog source pods are constantly being recreated as soon as they become ready which means no operators are available in the operator hub. Sorry, I didn't meet this. Seems like all other catalog source pods worked well. Try to subscribe to an operator provided by other catalogsource, it worked well. mac:~ jianzhang$ oc get sub -n jian NAME PACKAGE SOURCE CHANNEL etcd-0.9.4 etcd community-operators singlenamespace-alpha mac:~ jianzhang$ oc get ip -n jian NAME CSV APPROVAL APPROVED install-w78hb etcdoperator.v0.9.4 Automatic true mac:~ jianzhang$ oc get csv -n jian NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.4.0-143 OpenShift Elasticsearch Operator 5.4.0-143 Succeeded etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Succeeded
@pegoncal looks like the PR https://github.com/openshift/operator-framework-olm/pull/279/commits doesn't contain the commit for the fix for this bz (https://github.com/operator-framework/operator-lifecycle-manager/pull/2447/commits). So the fix hasn't been pulled downstream yet from upstream, and there needs to be another downstream sync PR that pulls in the commits for the fix. I verified the test Jian was running works as expected on olm's main branch: ``` status: connectionState: address: operatorhubio-catalog.olm.svc:50051 lastConnect: "2022-04-11T12:27:29Z" lastObservedState: READY message: 'error parsing spec.updateStrategy.registryPoll.interval. Using the default value of 15m0s instead. Error: time: unknown unit "mError code" in duration "45mError code"' reason: InvalidIntervalError registryService: createdAt: "2022-04-11T12:27:01Z" port: "50051" protocol: grpc serviceName: operatorhubio-catalog serviceNamespace: olm ```
I must have done something wrong. I'm really sorry =( I'm pulling it in in this sync PR: https://github.com/openshift/operator-framework-olm/pull/285
1, Create an OCP 4.11 which contains the fixed PR. mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-14-080015 -a .dockerconfigjson --commits|grep olm operator-lifecycle-manager https://github.com/openshift/operator-framework-olm 698c23184c1c3440dc2f591be7ecf3d99fb0d227 operator-registry https://github.com/openshift/operator-framework-olm 698c23184c1c3440dc2f591be7ecf3d99fb0d227 mac:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-14-080015 True False 12m Cluster version is 4.11.0-0.nightly-2022-04-14-080015 2, Create that issued catalogsource. mac:~ jianzhang$ oc create -f cs-issue.yaml catalogsource.operators.coreos.com/ibm-operator-catalog created mac:~ jianzhang$ cat cs-issue.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog namespace: openshift-marketplace spec: displayName: ibm-operator-catalog publisher: IBM Content sourceType: grpc image: docker.io/ibmcom/ibm-operator-catalog updateStrategy: registryPoll: interval: 45mError code 3, Check the marketplace-operator logs and the CatalogSource status. W0414 10:12:28.684940 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code" E0414 10:12:28.684970 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1alpha1.CatalogSource: failed to list *v1alpha1.CatalogSource: time: unknown unit "mError code" in duration "45mError code" time="2022-04-14T10:12:43Z" level=info msg="[status] Previous and current ClusterOperator Status are the same, the ClusterOperator Status will not be updated." mac:~ jianzhang$ oc get catalogsource ibm-operator-catalog -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2022-04-14T10:11:47Z" generation: 1 name: ibm-operator-catalog namespace: openshift-marketplace resourceVersion: "33761" uid: 358507b1-effe-456f-b6d6-5739acd5921f spec: displayName: ibm-operator-catalog image: docker.io/ibmcom/ibm-operator-catalog publisher: IBM Content sourceType: grpc updateStrategy: registryPoll: interval: 45mError code status: connectionState: address: ibm-operator-catalog.openshift-marketplace.svc:50051 lastConnect: "2022-04-14T10:12:10Z" lastObservedState: READY message: 'error parsing spec.updateStrategy.registryPoll.interval. Using the default value of 15m0s instead. Error: time: unknown unit "mError code" in duration "45mError code"' reason: InvalidIntervalError registryService: createdAt: "2022-04-14T10:11:47Z" port: "50051" protocol: grpc serviceName: ibm-operator-catalog serviceNamespace: openshift-marketplace I can see the error message in the status, LGTM, verify it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069