Bug 2098508 - Control-plane-machine-set-operator report panic
Summary: Control-plane-machine-set-operator report panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-20 02:55 UTC by sunzhaohua
Modified: 2022-08-10 11:18 UTC (History)
0 users

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:18:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-control-plane-machine-set-operator pull 48 0 None open Bug 2098508: Resolve panics and fix webhooks and logs and watches 2022-06-20 09:07:36 UTC
Github openshift cluster-control-plane-machine-set-operator pull 49 0 None open Bug 2098508: Remove duplicate service manifest 2022-06-21 08:14:13 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:18:56 UTC

Description sunzhaohua 2022-06-20 02:55:33 UTC
Description of problem:
CPMS name can be changed, spec.replicas can be changed, selector can be changed. control-plane-machine-set-operator show panic.

Version-Release number of selected component (if applicable):
Not in payload yet

How reproducible:
Always

Steps to Reproduce:
1. Create ControlPlaneMachineSet
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: "2022-06-17T09:54:40Z"
  generation: 2
  name: cluster
  namespace: openshift-machine-api
  resourceVersion: "108717"
  uid: a82d5430-c764-4cb3-a523-4da1d3d42707
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  strategy:
    type: RollingUpdate
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        aws:
        - placement:
            availabilityZone: us-east-2a
          subnet:
            filters:
            - name: tag:Name
              values:
              - weinliu4117-h6gsk-private-us-east-2a
            type: filters
        - placement:
            availabilityZone: us-east-2b
          subnet:
            filters:
            - name: tag:Name
              values:
              - weinliu4117-h6gsk-private-us-east-2b
            type: filters
        - placement:
            availabilityZone: us-east-2c
          subnet:
            filters:
            - name: tag:Name
              values:
              - weinliu4117-h6gsk-private-us-east-2c
            type: filters
        platform: AWS
      metadata: {}
      spec:
        providerSpec:
          value:
            ami:
              id: ami-01990fc3bdf30bc13
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:
                encrypted: true
                iops: 0
                kmsKey:
                  arn: ""
                volumeSize: 120
                volumeType: gp3
            credentialsSecret:
              name: aws-cloud-credentials
            deviceIndex: 0
            iamInstanceProfile:
              id: weinliu4117-h6gsk-master-profile
            instanceType: m6i.xlarge
            kind: AWSMachineProviderConfig 
            loadBalancers:
            - name: weinliu4117-h6gsk-int
              type: network
            - name: weinliu4117-h6gsk-ext
              type: network
            metadata:
              creationTimestamp: null
            metadataServiceOptions: {}
            placement:
              region: us-east-2
            securityGroups:
            - filters:
              - name: tag:Name
                values:
                - weinliu4117-h6gsk-master-sg
            tags:
            - name: kubernetes.io/cluster/weinliu4117-h6gsk
              value: owned
            userDataSecret:
              name: master-user-data
2. Change CPMS name, change spec.replicas, change selector
3.

Actual results:
CPMS name can be changed, spec.replicas can be changed, selector can be changed. control-plane-machine-set-operator show panic.

$ oc create -f ~/data/master/controlplanemachineset.yaml               
controlplanemachineset.machine.openshift.io/cluster created 
$ oc create -f ~/data/master/controlplanemachineset.yaml              
controlplanemachineset.machine.openshift.io/cluster1 created
$ oc get controlplanemachineset                                                                                                    
NAME       DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   AGE
cluster    3                                                   2m23s
cluster1   5                                                   67s


1.6554603373737745e+09    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference    {"controller": "controlplanemachineset", "controllerGroup": "machine.openshift.io", "controllerKind": "ControlPlaneMachineSet", "controlPlaneMachineSet": {"name":"cluster","namespace":"openshift-machine-api"}, "namespace": "openshift-machine-api", "name": "cluster", "reconcileID": "2e45d8a7-939f-480d-80af-c9b7c35548f1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x140812d]goroutine 402 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1560a80, 0x24bb970})
    /usr/lib/golang/src/runtime/panic.go:838 +0x207
github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset.(*ControlPlaneMachineSetReconciler).patchClusterOperatorConditions(0xc00025a000, {0x19a8c30, 0xc00049e240}, {{0x19aa660?, 0xc00049e270?}, 0x24dde40?}, 0xc0006a6340, {0xc0003c5560, 0x1, 0x1})
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset/cluster_operator.go:136 +0x64d
github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset.(*ControlPlaneMachineSetReconciler).updateClusterOperatorStatus(0xc00025a000?, {0x19a8c30, 0xc00049e240}, {{0x19aa660?, 0xc00049e270?}, 0x173b01a?}, 0xc0006a6000)
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset/cluster_operator.go:84 +0x768
github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset.(*ControlPlaneMachineSetReconciler).Reconcile(0xc00025a000, {0x19a8c30, 0xc00049e240}, {{{0x174b22d?, 0x10?}, {0x173b01a?, 0x413c87?}}})
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/controllers/controlplanemachineset/controller.go:135 +0x5cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x19a8b88?, {0x19a8c30?, 0xc00049e240?}, {{{0x174b22d?, 0x16787e0?}, {0x173b01a?, 0x409514?}}})
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000120000, {0x19a8b88, 0xc0004add80}, {0x15b9ea0?, 0xc00002c980?})
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000120000, {0x19a8b88, 0xc0004add80})
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
    /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:230 +0x325 


Expected results:
Change CPMS name, spec.replicas and selector can't be changed, control-plane-machine-set-operator should be normal.

Additional info:

Comment 2 sunzhaohua 2022-06-21 02:53:43 UTC
https://github.com/openshift/cluster-control-plane-machine-set-operator/blob/main/manifests/0000_31_control-plane-machine-set-operator_02_service.yaml and https://github.com/openshift/cluster-control-plane-machine-set-operator/blob/main/manifests/0000_31_control-plane-machine-set-operator_05_service.yaml are same, only port are different.

$ oc create -f ~/data/master/0000_31_control-plane-machine-set-operator_05_service.yaml
Error from server (AlreadyExists): error when creating "/Users/sunzhaohua/data/master/0000_31_control-plane-machine-set-operator_05_service.yaml": services "control-plane-machine-set-operator" already exists
$ oc create -f ~/data/master/controlplanemachineset.yaml                                                              
Error from server (InternalError): error when creating "/Users/sunzhaohua/data/master/controlplanemachineset.yaml": Internal error occurred: failed calling webhook "controlplanemachineset.machine.openshift.io": failed to call webhook: Post "https://control-plane-machine-set-operator.openshift-machine-api.svc:9443/validate-machine-openshift-io-v1-controlplanemachineset?timeout=10s": no service port 9443 found for service "control-plane-machine-set-operator"

1. Created resources in https://github.com/openshift/cluster-control-plane-machine-set-operator/tree/main/manifests except 0000_31_control-plane-machine-set-operator_02_service.yaml
2. Create controlplanemachineset can be created successful, log no panic. name,replicas and selector can't be changed.
apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  name: cluster
  namespace: openshift-machine-api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      metadata:
        labels:
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
          machine.openshift.io/cluster-api-cluster: zhsunaws-n52dn
      failureDomains:
        platform: AWS
        aws:
        - placement:
            availabilityZone: us-east-2a
          subnet:
            type: filters
            filters:
            - name: tag:Name
              values:
              - zhsunaws-n52dn-private-us-east-2a
        - placement:
            availabilityZone: us-east-2b
          subnet:
            type: filters
            filters:
            - name: tag:Name
              values:
              - zhsunaws-n52dn-private-us-east-2b
        - placement:
            availabilityZone: us-east-2c
          subnet:
            type: filters
            filters:
            - name: tag:Name
              values:
              - zhsunaws-n52dn-private-us-east-2c
      spec:
        providerSpec:
          value:
            ami:
              id: ami-01990fc3bdf30bc13
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:
                encrypted: true
                iops: 0
                kmsKey:
                  arn: ""
                volumeSize: 120
                volumeType: gp3
            credentialsSecret:
              name: aws-cloud-credentials
            deviceIndex: 0
            iamInstanceProfile:
              id: zhsunaws-n52dn-master-profile
            instanceType: m6i.xlarge
            kind: AWSMachineProviderConfig
            loadBalancers:
            - name: zhsunaws-n52dn-int
              type: network
            - name: zhsunaws-n52dn-ext
              type: network
            metadata:
              creationTimestamp: null
            metadataServiceOptions: {}
            placement:
              region: us-east-2
            securityGroups:
            - filters:
              - name: tag:Name
                values:
                - zhsunaws-n52dn-master-sg
            tags:
            - name: kubernetes.io/cluster/zhsunaws-n52dn
              value: owned
            userDataSecret:
              name: master-user-data
$ oc get controlplanemachineset                                             
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   AGE
cluster   3                                                   46s
                             
$ oc create -f ~/data/master/controlplanemachineset.yaml                      
Error from server (name: Invalid value: "cluster1": control plane machine set name must be cluster): error when creating "/Users/sunzhaohua/data/master/controlplanemachineset.yaml": admission webhook "controlplanemachineset.machine.openshift.io" denied the request: name: Invalid value: "cluster1": control plane machine set name must be cluster

$ oc edit controlplanemachineset cluster                                       
error: controlplanemachinesets.machine.openshift.io "cluster" could not be patched: admission webhook "controlplanemachineset.machine.openshift.io" denied the request: spec.replicas: Forbidden: control plane machine set replicas cannot be changed

$ oc edit controlplanemachineset cluster                                      
error: controlplanemachinesets.machine.openshift.io "cluster" could not be patched: admission webhook "controlplanemachineset.machine.openshift.io" denied the request: [spec.selector: Forbidden: control plane machine set selector is immutable, spec.template.machines_v1beta1_machine_openshift_io.metadata.labels: Invalid value: map[string]string{"machine.openshift.io/cluster-api-cluster":"zhsunaws-n52dn", "machine.openshift.io/cluster-api-machine-role":"master", "machine.openshift.io/cluster-api-machine-type":"master"}: selector does not match template labels]

Comment 4 sunzhaohua 2022-06-22 03:48:10 UTC
Checked the change, move to verified.

Comment 6 errata-xmlrpc 2022-08-10 11:18:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.