Description of problem: Create a machineset with invalid image, machine stuck in "Provisioning" phase Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-05-20-213928 How reproducible: Always Steps to Reproduce: 1.Create a machineset with invalid image, for example image: name: invalid type: name liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-n9-659pd-t5 created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n9-659pd-master-0 Running 7h55m huliu-n9-659pd-master-1 Running 7h55m huliu-n9-659pd-master-2 Running 7h55m huliu-n9-659pd-t5-q45k8 Provisioning 7m29s huliu-n9-659pd-worker-4747l Running 7h52m huliu-n9-659pd-worker-w4s9r Running 7h52m liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-n9-659pd-t5-q45k8 -o yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: creationTimestamp: "2022-05-25T09:37:39Z" finalizers: - machine.machine.openshift.io generateName: huliu-n9-659pd-t5- generation: 1 labels: machine.openshift.io/cluster-api-cluster: huliu-n9-659pd machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: huliu-n9-659pd-t5 name: huliu-n9-659pd-t5-q45k8 namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: huliu-n9-659pd-t5 uid: fdc7a156-2e88-44de-b738-81b93fdd433b resourceVersion: "186615" uid: eb1336f7-2955-4fd8-ac77-c8c2949f586a spec: lifecycleHooks: {} metadata: {} providerSpec: value: apiVersion: machine.openshift.io/v1 cluster: type: uuid uuid: 0005d9a4-8e4f-7c33-58d1-e9d0e2d48853 credentialsSecret: name: nutanix-credentials image: name: invalid type: name kind: NutanixMachineProviderConfig memorySize: 16Gi metadata: creationTimestamp: null subnets: - type: uuid uuid: ae6e2fd8-79fe-4a88-a0d0-7d66cc45bdb1 systemDiskSize: 120Gi userDataSecret: name: worker-user-data vcpuSockets: 4 vcpusPerSocket: 1 status: conditions: - lastTransitionTime: "2022-05-25T09:37:40Z" status: "True" type: Drainable - lastTransitionTime: "2022-05-25T09:37:40Z" message: Instance has not been created reason: InstanceNotCreated severity: Warning status: "False" type: InstanceExists - lastTransitionTime: "2022-05-25T09:37:40Z" status: "True" type: Terminable lastUpdated: "2022-05-25T09:37:40Z" phase: Provisioning providerStatus: conditions: - message: 'failed to create VM: Failed to find image by name "invalid". error: %!w(<nil>)' reason: MachineCreationFailed status: "False" type: MachineCreation - message: Machine instance is not ready reason: Machine instance is not ready status: "False" type: MachineInstanceReady liuhuali@Lius-MacBook-Pro huali-test % Actual results: Machine stuck in "Provisioning" phase, no InvalidConfiguration error. Expected results: Machine should in "Failed" phase Additional info: liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-8678477b8c-gfdt9 -c machine-controller |grep huliu-n9-659pd-t5-q45k8 I0525 09:37:39.660291 1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine I0525 09:37:39.670899 1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine I0525 09:37:39.670963 1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists I0525 09:37:39.671217 1 vm.go:190] Checking if VM with name "huliu-n9-659pd-t5-q45k8" exists. {"filter":"vm_name==huliu-n9-659pd-t5-q45k8"} {"api_version":"3.1","metadata":{"filter": "vm_name==huliu-n9-659pd-t5-q45k8", "total_matches": 0, "kind": "vm", "length": 0, "offset": 0},"entities":[]} E0525 09:37:40.007433 1 vm.go:202] Not Found VM by name "huliu-n9-659pd-t5-q45k8". error: VM_NOT_FOUND I0525 09:37:40.007453 1 controller.go:379] huliu-n9-659pd-t5-q45k8: setting phase to Provisioning and requeuing I0525 09:37:40.007467 1 controller.go:504] huliu-n9-659pd-t5-q45k8: going into phase "Provisioning" I0525 09:37:40.018433 1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine I0525 09:37:40.018456 1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists I0525 09:37:40.018565 1 vm.go:190] Checking if VM with name "huliu-n9-659pd-t5-q45k8" exists. {"filter":"vm_name==huliu-n9-659pd-t5-q45k8"} {"api_version":"3.1","metadata":{"filter": "vm_name==huliu-n9-659pd-t5-q45k8", "total_matches": 0, "kind": "vm", "length": 0, "offset": 0},"entities":[]} E0525 09:37:40.318424 1 vm.go:202] Not Found VM by name "huliu-n9-659pd-t5-q45k8". error: VM_NOT_FOUND I0525 09:37:40.318447 1 controller.go:386] huliu-n9-659pd-t5-q45k8: reconciling machine triggers idempotent create I0525 09:37:40.318455 1 actuator.go:76] huliu-n9-659pd-t5-q45k8: actuator creating machine I0525 09:37:40.318619 1 reconciler.go:41] huliu-n9-659pd-t5-q45k8: creating machine E0525 09:37:40.911558 1 reconciler.go:56] huliu-n9-659pd-t5-q45k8: error creating machine vm. error: Failed to find image by name "invalid". error: %!w(<nil>) I0525 09:37:40.911565 1 machine_scope.go:210] huliu-n9-659pd-t5-q45k8: Updating providerStatus I0525 09:37:40.911576 1 machine_scope.go:153] huliu-n9-659pd-t5-q45k8: patching machine E0525 09:37:40.930520 1 actuator.go:67] error: huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name "invalid". error: %!w(<nil>) W0525 09:37:40.930586 1 controller.go:388] huliu-n9-659pd-t5-q45k8: failed to create machine: huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name "invalid". error: %!w(<nil>) E0525 09:37:40.930635 1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name \"invalid\". error: %!w(<nil>)" "name"="huliu-n9-659pd-t5-q45k8" "namespace"="openshift-machine-api" I0525 09:37:40.930729 1 logr.go:252] events "msg"="Warning" "message"="huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name \"invalid\". error: %!w(<nil>)" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-n9-659pd-t5-q45k8","uid":"eb1336f7-2955-4fd8-ac77-c8c2949f586a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"186615"} "reason"="FailedCreate" I0525 09:37:40.931004 1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine I0525 09:37:40.931025 1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists Similar as https://bugzilla.redhat.com/show_bug.cgi?id=2062579
The fix is to add validation of the VM configuration fields in the machine.spec.providerSpec before calling the prism API to create a VM. If the validation fails, an InvalidMachineConfiguration error will return with all the VM configuration errors.
Verified on 4.11.0-0.nightly-2022-06-15-161625 Steps: 1.Create a machineset with invalid image liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-n19-xtt9d-1 created 2.Check the machine go into Failed phase, and shows InvalidConfiguration error liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-n19-xtt9d-1-ppvpf Failed 4s huliu-n19-xtt9d-master-0 Running 62m huliu-n19-xtt9d-master-1 Running 62m huliu-n19-xtt9d-master-2 Running 62m huliu-n19-xtt9d-worker-96w2l Running 57m huliu-n19-xtt9d-worker-x44fk Running 57m liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-n19-xtt9d-1-ppvpf -o yaml ... status: conditions: - lastTransitionTime: "2022-06-16T02:38:37Z" status: "True" type: Drainable - lastTransitionTime: "2022-06-16T02:38:37Z" message: Instance has not been created reason: InstanceNotCreated severity: Warning status: "False" type: InstanceExists - lastTransitionTime: "2022-06-16T02:38:37Z" status: "True" type: Terminable errorMessage: 'huliu-n19-xtt9d-1-ppvpf: failed in validating machine providerSpec: spec.providerSpec.value.image.name: Invalid value: "huliu-n19-xtt9d-rhcosqqq": Failed to find image with name "huliu-n19-xtt9d-rhcosqqq". error: Failed to find image by name "huliu-n19-xtt9d-rhcosqqq". error: %!w(<nil>)' errorReason: InvalidConfiguration lastUpdated: "2022-06-16T02:38:38Z" phase: Failed providerStatus: conditions: - message: 'huliu-n19-xtt9d-1-ppvpf: failed in validating machine providerSpec: spec.providerSpec.value.image.name: Invalid value: "huliu-n19-xtt9d-rhcosqqq": Failed to find image with name "huliu-n19-xtt9d-rhcosqqq". error: Failed to find image by name "huliu-n19-xtt9d-rhcosqqq". error: %!w(<nil>)' reason: MachineCreationFailed status: "False" type: MachineCreation - message: Machine instance is not ready reason: Machine instance is not ready status: "False" type: MachineInstanceReady
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069