Bug 2108686 - rpm-ostreed: start limit hit easily
Summary: rpm-ostreed: start limit hit easily
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.12
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Colin Walters
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2108320
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-19 17:15 UTC by OpenShift BugZilla Robot
Modified: 2022-08-10 11:21 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:21:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 900 0 None Merged [release-4.11] Bug 2108686: Greatly raise `StartLimitBurst` for `rpm-ostreed.service` 2022-07-21 17:45:39 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:21:47 UTC

Description OpenShift BugZilla Robot 2022-07-19 17:15:25 UTC
+++ This bug was initially created as a clone of Bug #2108320 +++

See https://github.com/openshift/os/pull/898

A recent PR in the MCO openshift/machine-config-operator#3243
tipped things over the edge and we now see failures a lot more often.

For example, in https://bugzilla.redhat.com/show_bug.cgi?id=2104978

--- Additional comment from skumari on 2022-07-19 13:14:55 UTC ---

*** Bug 2108488 has been marked as a duplicate of this bug. ***

Comment 3 Michael Nguyen 2022-07-21 13:48:50 UTC
Verified on RHCOS 411.86.202207210724-0. This however has not made it into a nightly OCP build yet.

[core@cosa-devsh ~]$ cat test.sh
#!/bin/bash
set -euo pipefail
# https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
for x in $(seq 100); do rpm-ostree status >/dev/null; done
echo ok
[core@cosa-devsh ~]$ ./test.sh 
ok
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
● 1994ffeef78d96e6af89e03552214df06465d75e3b4f8a4eb37aa6582814c00e
                   Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z)
[core@cosa-devsh ~]$ systemctl status rpm-ostreed
● rpm-ostreed.service - rpm-ostree System Management Daemon
   Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static; vendor >
  Drop-In: /usr/lib/systemd/system/rpm-ostreed.service.d
           └─startlimit.conf
   Active: active (running) since Thu 2022-07-21 13:45:16 UTC; 52s ago
     Docs: man:rpm-ostree(1)
 Main PID: 2059 (rpm-ostree)
   Status: "clients=0; idle exit in 52 seconds"
    Tasks: 12 (limit: 5559)
   Memory: 8.1M
   CGroup: /system.slice/rpm-ostreed.service
           └─2059 /usr/bin/rpm-ostree start-daemon

Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.259 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: Allowing active client :1.261 (uid>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.261 unit:sess>
Jul 21 13:45:42 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: Allowing active client :1.263 (uid>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: client(id:cli dbus:1.263 unit:sess>
Jul 21 13:45:53 cosa-devsh rpm-ostree[2059]: In idle state; will auto-exit in 6>
[core@cosa-devsh ~]$ systemctl cat rpm-ostreed
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree
RequiresMountsFor=/boot

[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
# We have no business accessing /var/roothome or /var/home.  In general
# the ostree design clearly avoids touching those, but since systemd offers
# us easy tools to toggle on protection, let's use them.  In the future
# it'd be nice to do something like using DynamicUser=yes for the main service,
# and have a system rpm-ostreed-transaction.service that runs privileged
# but as a subprocess.
ProtectHome=true
# Explicitly list paths here which we should never access.  The initial
# entry here ensures that the skopeo process we fork won't interact with
# application containers.
InaccessiblePaths=/var/lib/containers
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload

# /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf
[Unit]
# Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commit>
# on older RHEL
StartLimitBurst=1000

Comment 4 Colin Walters 2022-07-21 14:27:52 UTC
I don't think we technically need to update the boot images for this - just machine-os-content.

The firstboot may be a bit less reliable, but we landed code to do retries in the MCO code.

Comment 5 Michael Nguyen 2022-07-21 17:46:03 UTC
From the summary in https://bugzilla.redhat.com/show_bug.cgi?id=2104978:

"So it looks like rpm-ostreed didn't start yet (but eventually was successful)"

Since it was eventually successful, I agree with Colin that machine-os-content should be enough since it self resolves.

Comment 9 Michael Nguyen 2022-07-22 18:40:35 UTC
Verified on 4.11.0-rc.5

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.5   True        False         107s    Cluster version is 4.11.0-rc.5
$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-mlj7g4t-72292-7jqbd-master-0         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-1         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-master-2         Ready    master   22m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t   Ready    worker   12m   v1.24.0+9546431
ci-ln-mlj7g4t-72292-7jqbd-worker-b-4zdxv   Ready    worker   12m   v1.24.0+9546431
$ oc debug node/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/ci-ln-mlj7g4t-72292-7jqbd-worker-a-plz2t-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.2
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c978380274ed551b3d6a8ca53ab2fc1408bfad00b8c235cc7dbe523dbc251d8
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202207210724-0 (2022-07-21T07:27:48Z)
sh-4.4# cat test.sh
#!/bin/bash
set -euo pipefail
# https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
for x in $(seq 100); do rpm-ostree status >/dev/null; done
echo ok
sh-4.4# chmod +x test.sh
sh-4.4# ./test.sh 
ok
sh-4.4# systemctl cat rpm-ostreed.service
# /usr/lib/systemd/system/rpm-ostreed.service
[Unit]
Description=rpm-ostree System Management Daemon
Documentation=man:rpm-ostree(1)
ConditionPathExists=/ostree
RequiresMountsFor=/boot

[Service]
Type=dbus
BusName=org.projectatomic.rpmostree1
# To use the read-only sysroot bits
MountFlags=slave
# We have no business accessing /var/roothome or /var/home.  In general
# the ostree design clearly avoids touching those, but since systemd offers
# us easy tools to toggle on protection, let's use them.  In the future
# it'd be nice to do something like using DynamicUser=yes for the main service,
# and have a system rpm-ostreed-transaction.service that runs privileged
# but as a subprocess.
ProtectHome=true
# Explicitly list paths here which we should never access.  The initial
# entry here ensures that the skopeo process we fork won't interact with
# application containers.
InaccessiblePaths=/var/lib/containers
NotifyAccess=main
ExecStart=/usr/bin/rpm-ostree start-daemon
ExecReload=/usr/bin/rpm-ostree reload

# /usr/lib/systemd/system/rpm-ostreed.service.d/startlimit.conf
[Unit]
# Work around for lack of https://github.com/coreos/rpm-ostree/pull/3523/commits/0556152adb14a8e1cdf6c5d6f234aacbe8dd4e3f
# on older RHEL
StartLimitBurst=1000
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

Comment 10 errata-xmlrpc 2022-08-10 11:21:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.