Description of problem: Completed pods are not releasing or reusing IP address that was allocated when the pod was running. Version-Release number of selected component (if applicable): RHOCP 4.9 with OVNKubernetes. How reproducible: - Spawned many completed pods to fill the host subnet on the specific node. - When Subnet gets full, We get the following error. Actual results: - New pods can't be created with the following error: Warning ErrorAddingLogicalPort 3m32s (x52 over 53m) control plane failed to assign pod addresses for pod aaa_pod453 on node: master, err: the range is full Expected results: - Completed pods will not require an IP address for any IP of Network communication. - New pod must use the IP addresses from completed pods and run without any error. Additional info: This issue is not present with OpenshiftSDN as it is able to reuse IP from completed pods
sounds like I can reproduce this issue on 4.10 version 4.10.0-0.nightly-2022-04-19-145842 1. set the node max pods to 520 apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-max-pods spec: machineConfigPoolSelector: matchLabels: custom-kubelet: large-pods kubeletConfig: maxPods: 520 2. Create about 500 pods one the node with RC { "apiVersion": "v1", "kind": "List", "items": [ { "apiVersion": "v1", "kind": "ReplicationController", "metadata": { "labels": { "name": "max-pods" }, "name": "max-pods" }, "spec": { "replicas": 500, "template": { "metadata": { "labels": { "name": "max-pods" } }, "spec": { "containers": [ { "name": "max-pod", "image": "quay.io/openshifttest/nonexist" } ], "nodeName": "node-name" } } } } ] 3. there will a lot pods with 'OutOfpods' max-pods-zxz2w 0/1 OutOfpods 0 95m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zz5hr 0/1 OutOfpods 0 61m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zz86q 0/1 OutOfpods 0 105m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzdhg 0/1 OutOfpods 0 106m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzftf 0/1 OutOfpods 0 99m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzgl2 0/1 OutOfpods 0 100m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzjlv 0/1 OutOfpods 0 52m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzlpj 0/1 OutOfpods 0 82m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzm9h 0/1 OutOfpods 0 83m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzn7h 0/1 OutOfpods 0 97m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzpvk 0/1 OutOfpods 0 107m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzqgr 0/1 OutOfpods 0 94m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzqrc 0/1 OutOfpods 0 97m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzsfv 0/1 OutOfpods 0 108m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> max-pods-zzzsb 0/1 OutOfpods 0 64m <none> openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> $ oc get pod | grep OutOfpods | wc -l 9785 $ oc get pod | grep -v OutOfpods NAME READY STATUS RESTARTS AGE max-pods-4bg4d 0/1 ContainerCreating 0 13m $ oc describe pod max-pods-4bg4d Name: max-pods-4bg4d Namespace: g3ami Priority: 0 Node: openshift-qe-028.lab.eng.rdu2.redhat.com/10.8.1.181 Start Time: Wed, 20 Apr 2022 20:17:07 +0800 Labels: name=max-pods Annotations: openshift.io/scc: restricted Status: Pending IP: IPs: <none> Controlled By: ReplicationController/max-pods Containers: max-pod: Container ID: Image: quay.io/openshifttest/nonexist Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rnfdc (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-rnfdc: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] [g3ami/max-pods-4bg4d ae15d81c680c0b3e319ceeff8b95f43c4070cdb9cc9ae541b7b94909e0188f40] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] [g3ami/max-pods-4bg4d db8ef8efcf1468884f7dd2a8410dc12e2b4da4611f825f420aa35ff3d1f15f81] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning FailedCreatePodSandBox 8m5s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] [g3ami/max-pods-4bg4d e3b947b7b737042a536a145a8cef00c06fce83157646c24e7e033aa99872aba9] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning ErrorAddingLogicalPort 6m30s (x8 over 14m) controlplane failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full Warning FailedCreatePodSandBox 5m52s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] [g3ami/max-pods-4bg4d 0f22b1be69addc700684a01eb0f2070d11fc173941c52851e2aea4f44157aa3d] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning FailedCreatePodSandBox 3m40s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] [g3ami/max-pods-4bg4d deb1b60387e543427ef37ab36144908675d36e77f10f517b067a17e4bc6f3bf4] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning ErrorAddingLogicalPort 119s controlplane failed to assign pod addresses for pod g3ami_max-pods-4bg4d on node: openshift-qe-028.lab.eng.rdu2.redhat.com, err: range is full Warning FailedCreatePodSandBox 86s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_max-pods-4bg4d_g3ami_dde7499b-c532-4e83-80aa-320e9796e296_0(6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db): error adding pod g3ami_max-pods-4bg4d to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [g3ami/max-pods-4bg4d/dde7499b-c532-4e83-80aa-320e9796e296:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] [g3ami/max-pods-4bg4d 6dd85a4bc9e003b2baabae7a772b9774aab941234a5701c95a9dc58a99ae20db] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
Verified this bug on 4.11.0-0.nightly-2022-04-15-153812 1. there are some completed pod in openshift-operator-lifecycle-manager openshift-operator-lifecycle-manager collect-profiles-27508695-prrwb 0/1 Completed 0 38m 10.131.1.162 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> openshift-operator-lifecycle-manager collect-profiles-27508710-2nllh 0/1 Completed 0 23m 10.131.1.25 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> openshift-operator-lifecycle-manager collect-profiles-27508725-67gcz 0/1 Completed 0 8m45s 10.131.1.39 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> 2. after updated the max pods to 520 in node 3. Then apply pod yaml, then scale up 100->150>200>300>400>493 { "apiVersion": "v1", "kind": "List", "items": [ { "apiVersion": "v1", "kind": "ReplicationController", "metadata": { "labels": { "name": "max-pods" }, "name": "max-pods" }, "spec": { "replicas": 50, "template": { "metadata": { "labels": { "name": "max-pods" } }, "spec": { "containers": [ { "command": [ "/bin/true" ], "name": "max-pod", "image": "quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197" } ], "nodeName": "openshift-qe-028.lab.eng.rdu2.redhat.com" } } } } ] } 5. and make 510 ips all used $ oc get pod -A -o wide | grep 10.131 | wc -l 510 6. Then create one normal test pod on node oc get pod -n z1 -o wide z1 test-rc-xwvm8 1/1 Running 0 13m 10.131.1.39 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> We can see the pod ip is same with step 1 $ oc get pod -A -o wide | grep openshift-qe-028.lab.eng.rdu2.redhat.com | grep 10.131.1.39 openshift-operator-lifecycle-manager collect-profiles-27508725-67gcz 0/1 Completed 0 31m 10.131.1.39 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> z1 test-rc-xwvm8 1/1 Running 0 13m 10.131.1.39 openshift-qe-028.lab.eng.rdu2.redhat.com <none> <none> 7. Check the pod is working well $ oc rsh -n z1 test-rc-mmqgt ~ $ curl 10.131.1.39:8080 Hello OpenShift! ~ $ Move this to verified.
@trozet Please ignore my question in comment 15. ip will be released after deleting pods
Are there plans to backport this to 4.10?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days