Bug 2016175 - Pods get stuck in ContainerCreating state when attaching volumes fails on SNO clusters.
Summary: Pods get stuck in ContainerCreating state when attaching volumes fails on SNO...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Kir Kolyshkin
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-20 20:44 UTC by Jeff Uphoff
Modified: 2022-03-10 16:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:21:07 UTC
Target Upstream Version:
Embargoed:
imiller: needinfo-
ehashman: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github google cadvisor pull 2979 0 None open container/libcontainer: fix schedulerStatsFromProcs hogging memory and wrong stats 2021-10-26 23:09:06 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:21:22 UTC

Comment 14 Sunil Choudhary 2021-12-06 10:57:05 UTC
Verified on 2 separate clusters with build 4.10.0-0.ci-2021-12-05-113922 and 4.10.0-0.nightly-2021-12-03-213835

Collected pprof data, ran tests with pods running sleep processes continuously, then again collected pprof data. I do not see any significant increase in memory usage.

$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci-2021-12-05-113922   True        False         20m     Cluster version is 4.10.0-0.ci-2021-12-05-113922

$ oc get nodes -o wide
NAME                                        STATUS   ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-53-253.us-east-2.compute.internal   Ready    worker   97m    v1.22.3+ffbb954   10.0.53.253   <none>        Red Hat Enterprise Linux 8.4 (Ootpa)                            4.18.0-348.2.1.el8_5.x86_64    cri-o://1.22.1-8.rhaos4.9.gite059965.el8
ip-10-0-59-18.us-east-2.compute.internal    Ready    master   146m   v1.22.1+6859754   10.0.59.18    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-61-145.us-east-2.compute.internal   Ready    worker   97m    v1.22.3+ffbb954   10.0.61.145   <none>        Red Hat Enterprise Linux 8.4 (Ootpa)                            4.18.0-348.2.1.el8_5.x86_64    cri-o://1.22.1-8.rhaos4.9.gite059965.el8
ip-10-0-62-167.us-east-2.compute.internal   Ready    worker   137m   v1.22.1+6859754   10.0.62.167   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-67-101.us-east-2.compute.internal   Ready    worker   137m   v1.22.1+6859754   10.0.67.101   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-69-204.us-east-2.compute.internal   Ready    worker   137m   v1.22.1+6859754   10.0.69.204   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-74-12.us-east-2.compute.internal    Ready    master   146m   v1.22.1+6859754   10.0.74.12    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8
ip-10-0-77-27.us-east-2.compute.internal    Ready    master   145m   v1.22.1+6859754   10.0.77.27    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112040202-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8

$ oc get --raw /api/v1/nodes/ip-10-0-59-18.us-east-2.compute.internal:10250/proxy/debug/pprof/heap?sleep=60s > heap-pprof-old.out 

$ go tool pprof heap-pprof-old.out 
File: kubelet
Build ID: a9000aed487134ddc9148718abccf0500fd7e772
Type: inuse_space
Time: Dec 6, 2021 at 2:37pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 26.56MB, 60.55% of 43.87MB total
Showing top 10 nodes out of 297
      flat  flat%   sum%        cum   cum%
   13.03MB 29.69% 29.69%    13.03MB 29.69%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.newContainerStats
    3.02MB  6.89% 36.58%     3.02MB  6.89%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/summary.(*SamplesBuffer).Add
    2.42MB  5.51% 42.09%     2.42MB  5.51%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*ConfigMap).Unmarshal
       2MB  4.56% 46.65%        2MB  4.56%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.processLimitsFile
    1.10MB  2.50% 49.15%     1.60MB  3.64%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*Secret).Unmarshal
       1MB  2.28% 51.43%        1MB  2.28%  k8s.io/kubernetes/pkg/volume/secret.(*secretPlugin).NewMounter
       1MB  2.28% 53.71%        1MB  2.28%  runtime.allocm
       1MB  2.28% 55.99%    16.03MB 36.53%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.(*Handler).GetStats
       1MB  2.28% 58.27%     1.50MB  3.42%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*PodSpec).DeepCopyInto
       1MB  2.28% 60.55%        1MB  2.28%  k8s.io/kubernetes/vendor/k8s.io/utils/inotify.(*Watcher).readEvents


$ oc get --raw /api/v1/nodes/ip-10-0-59-18.us-east-2.compute.internal:10250/proxy/debug/pprof/heap?sleep=60s > heap-pprof.out

$ go tool pprof heap-pprof.out 
File: kubelet
Build ID: a9000aed487134ddc9148718abccf0500fd7e772
Type: inuse_space
Time: Dec 6, 2021 at 4:14pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 20697.95kB, 57.78% of 35819kB total
Showing top 10 nodes out of 280
      flat  flat%   sum%        cum   cum%
 7182.01kB 20.05% 20.05%  8206.21kB 22.91%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.newContainerStats
 3612.07kB 10.08% 30.14%  3612.07kB 10.08%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/summary.(*SamplesBuffer).Add
 2473.98kB  6.91% 37.04%  2473.98kB  6.91%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*ConfigMap).Unmarshal
 1536.27kB  4.29% 41.33%  1536.27kB  4.29%  k8s.io/kubernetes/vendor/golang.org/x/net/http2.(*clientConnReadLoop).handleResponse
 1121.67kB  3.13% 44.46%  1633.72kB  4.56%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*Secret).Unmarshal
    1025kB  2.86% 47.32%     1025kB  2.86%  runtime.allocm
 1024.36kB  2.86% 50.18%  1536.67kB  4.29%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*PodSpec).DeepCopyInto
 1024.20kB  2.86% 53.04%  1024.20kB  2.86%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.DiskStatsCopy
 1024.20kB  2.86% 55.90%  1024.20kB  2.86%  k8s.io/kubernetes/vendor/k8s.io/utils/inotify.(*Watcher).readEvents
  674.18kB  1.88% 57.78%   674.18kB  1.88%  k8s.io/kubernetes/pkg/kubelet/status.NewManager





$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-12-03-213835   True        False         141m    Cluster version is 4.10.0-0.nightly-2021-12-03-213835

$ oc get nodes -o wide
NAME                                                 STATUS   ROLES           AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
sunilc06410-rpjmg-master-0.c.openshift-qe.internal   Ready    master,worker   163m   v1.22.1+29f497c   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 410.84.202112031341-0 (Ootpa)   4.18.0-305.28.1.el8_4.x86_64   cri-o://1.23.0-89.rhaos4.10.git367232b.el8

$ oc get --raw /api/v1/nodes/sunilc06410-rpjmg-master-0.c.openshift-qe.internal:10250/proxy/debug/pprof/heap?sleep=60s > heap-pprof-old.out

$ go tool pprof heap-pprof-old.out 
File: kubelet
Build ID: 0930d0658c22b5744d061251bed2240c3fd4eb66
Type: inuse_space
Time: Dec 6, 2021 at 2:53pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 40.80MB, 55.23% of 73.86MB total
Showing top 10 nodes out of 404
      flat  flat%   sum%        cum   cum%
   16.53MB 22.38% 22.38%    17.03MB 23.06%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.newContainerStats
    6.05MB  8.18% 30.57%     6.05MB  8.18%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/summary.(*SamplesBuffer).Add
    4.50MB  6.10% 36.67%     4.50MB  6.10%  runtime.allocm
    3.54MB  4.80% 41.46%     3.54MB  4.80%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*ConfigMap).Unmarshal
    2.06MB  2.79% 44.26%     2.06MB  2.79%  k8s.io/kubernetes/vendor/google.golang.org/protobuf/internal/strs.(*Builder).AppendFullName
       2MB  2.71% 46.97%     4.50MB  6.10%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*PodSpec).Unmarshal
       2MB  2.71% 49.68%     2.50MB  3.39%  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).Unmarshal
    1.50MB  2.03% 51.71%     1.50MB  2.03%  runtime.malg
    1.50MB  2.03% 53.74%     1.50MB  2.03%  path.(*lazybuf).string
    1.11MB  1.50% 55.23%     1.11MB  1.50%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*Secret).Unmarshal


$ oc get --raw /api/v1/nodes/sunilc06410-rpjmg-master-0.c.openshift-qe.internal:10250/proxy/debug/pprof/heap?sleep=60s > heap-pprof.out

$ go tool pprof heap-pprof.out 
File: kubelet
Build ID: 0930d0658c22b5744d061251bed2240c3fd4eb66
Type: inuse_space
Time: Dec 6, 2021 at 4:04pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 36.68MB, 53.65% of 68.38MB total
Showing top 10 nodes out of 377
      flat  flat%   sum%        cum   cum%
   12.52MB 18.32% 18.32%    13.02MB 19.05%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.newContainerStats
    5.54MB  8.11% 26.42%     5.54MB  8.11%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/summary.(*SamplesBuffer).Add
    4.50MB  6.59% 33.01%     4.50MB  6.59%  runtime.allocm
    3.54MB  5.18% 38.19%     3.54MB  5.18%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*ConfigMap).Unmarshal
    2.06MB  3.02% 41.21%     2.06MB  3.02%  k8s.io/kubernetes/vendor/google.golang.org/protobuf/internal/strs.(*Builder).AppendFullName
       2MB  2.93% 44.14%     4.50MB  6.58%  k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*PodSpec).Unmarshal
       2MB  2.93% 47.06%     3.03MB  4.44%  k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/libcontainer.processLimitsFile
    1.50MB  2.19% 49.26%     1.50MB  2.19%  runtime.malg
    1.50MB  2.19% 51.45%        2MB  2.93%  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).Unmarshal
    1.50MB  2.19% 53.65%     1.50MB  2.19%  strings.(*Builder).grow

Comment 17 errata-xmlrpc 2022-03-10 16:21:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.