Bug 1990506

Summary: Missing udev rules in initramfs for /dev/disk/by-id/scsi-* symlinks
Product: OpenShift Container Platform Reporter: Michal Dekan <mdekan>
Component: RHCOSAssignee: Renata Ravanelli <rravanel>
Status: CLOSED ERRATA QA Contact: HuijingHei <hhei>
Severity: high Docs Contact: Bob Furu <bfuru>
Priority: medium    
Version: 4.8CC: apaladug, bgilbert, cverna, dornelas, hhei, jlebon, jligon, miabbott, mrussell, nstielau, smilner, travier
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 4.10 Doc Type: Bug Fix
Doc Text:
Cause: Missing udev rules in initramfs for /dev/disk/by-id/scsi-* symlinks Consequence: When using /dev/disk/by-id/scsi-* symlinks in the Ignition config, boot of the installed system failed at the Ignition stage because the symlink was not present. Fix: This bux fix adds the 63-scsi-sg3_symlink.rules for scsi rules in dracut. Result: As a result this issue no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-12 04:37:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2027501    
Bug Blocks:    

Description Michal Dekan 2021-08-05 14:45:14 UTC
---

OCP Version at Install Time: 48.84.202107202156-0
RHCOS Version at Install Time: 48.84.202107202156-0
Platform: bare metal
Architecture: x86_64


What are you trying to do? What is your use case?

Bare metal UPI OCP 4.8 deployment with custom ignition file due to 2 worker node groups (one group of workers has 2 disks attached, the other one has 3 disks attached)


What happened? What went wrong or what did you expect?

Idea is to generate 2 ignition files using butane (one for worker with 2 disks and the other one for the worker with 3 disks)


What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node.

1) Generating ignition files with installer

!/bin/bash

## Clean old clusterconfig
rm -Rf clusterconfig
mkdir clusterconfig
cp backup/install-config.yaml clusterconfig/

## Create Manifests
./openshift-install create manifests --dir clusterconfig/

## Set masters as unschedulable
sed -i 's/mastersSchedulable\: true/mastersSchedulable\: false/g'  clusterconfig/manifests/cluster-scheduler-02-config.yml

## OPTIONAL MACHINECONFIGS
#cp backup/98-var-partition-worker.yaml clusterconfig/openshift/
cp backup/98-var-partition-master.yaml clusterconfig/openshift/
#cp backup/98-var-partition-infra.yaml clusterconfig/openshift/

## Create Ignition Files
./openshift-install create ignition-configs --dir clusterconfig/

###4.6
## Remove old ignition files and replace with new
rm -f /var/www/html/openshift/ocp-datahub/*.ign
cp clusterconfig/*.ign /var/www/html/openshift/ocp-datahub/
chcon -R system_u:object_r:httpd_sys_content_t:s0 /var/www/html/
chown -R apache: /var/www/html/openshift/

./openshift-install version
./openshift-install 4.8.2
built from commit a5ddd2dd6c72d8a5ea0a5f17acd8b964b6a3d1be
release image quay.io/openshift-release-dev/ocp-release@sha256:0e82d17ababc79b10c10c5186920232810aeccbccf2a74c691487090a2c98ebc


2) Generating customer worker.ign with merge+local feature which includes default worker.ign generated by openshift installer in step 1) with butane

cat test_ocp.bu
variant: openshift
version: 4.8.0
metadata:
  name: 98-worker-var-partition
  labels:
    machineconfiguration.openshift.io/role: worker
ignition:
  config:
    merge:
    - local: worker_default.ign
storage:
  disks:
    - device: /dev/sdb
      wipe_table: true
      partitions:
        - number: 1
          label: var
  filesystems:
    - path: /var
      device: /dev/disk/by-partlabel/var
      format: xfs
      wipe_filesystem: true
      label: var
      with_mount_unit: true
systemd:
  units:
    - name: var.mount
      enabled: true
      contents: |
        [Unit]
        Before=local-fs.target
        [Mount]
        Where=/var
        What=/dev/disk/by-partlabel/var
        [Install]
        WantedBy=local-fs.target


podman run --rm --tty --interactive               --security-opt label=disable                      --volume ${PWD}:/pwd --workdir /pwd              quay.io/coreos/butane:release  --pretty --strict test_ocp.bu --files-dir . --raw > worker.ign

3) Transfer generated worker.ign to the http server which is then passed to the coreos-installer, leads to worker node which cannot start service kubelet-auto-node-size.service because /usr/local/sbin/dynamic-system-reserved-calc.sh does not exist:

[core@worker-001 ~]$ journalctl -u kubelet-auto-node-size.service

-- Reboot --
Aug 05 13:56:22worker-001 systemd[1]: Starting Dynamically sets the system reserved for the kubelet...
Aug 05 13:56:22worker-001 bash[3093]: /bin/bash: /usr/local/sbin/dynamic-system-reserved-calc.sh: No such file or directory
Aug 05 13:56:22worker-001 systemd[1]: kubelet-auto-node-size.service: Main process exited, code=exited, status=127/n/a
Aug 05 13:56:22worker-001 systemd[1]: kubelet-auto-node-size.service: Failed with result 'exit-code'.
Aug 05 13:56:22worker-001 systemd[1]: Failed to start Dynamically sets the system reserved for the kubelet.
Aug 05 13:56:22worker-001 systemd[1]: kubelet-auto-node-size.service: Consumed 1ms CPU time

Script is included in mc  00-worker:

oc get mc 00-worker -o yaml | grep -B10 dynamic-system-reserved-calc.sh
        path: /etc/modules-load.d/iptables.conf
      - contents:
          source: data:,NODE_SIZING_ENABLED%3Dfalse%0ASYSTEM_RESERVED_MEMORY%3D1Gi%0ASYSTEM_RESERVED_CPU%3D500m
        mode: 420
        overwrite: true
        path: /etc/node-sizing-enabled.env
      - contents:
          source: data:,%23!%2Fbin%2Fbash%0Aset%20-e%0ANODE_SIZES_ENV%3D%24%7BNODE_SIZES_ENV%3A-%2Fetc%2Fnode-sizing.env%7D%0Afunction%20dynamic_memory_sizing%20%7B%0A%20%20%20%20total_memory%3D%24(free%20-g%7Cawk%20'%2F%5EMem%3A%2F%7Bprint%20%242%7D')%0A%20%20%20%20%23%20total_memory%3D8%20test%20the%20recommended%20values%20by%20modifying%20this%20value%0A%20%20%20%20recommended_systemreserved_memory%3D0%0A%20%20%20%20if%20((%24total_memory%20%3C%3D%204))%3B%20then%20%23%2025%25%20of%20the%20first%204GB%20of%20memory%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24total_memory%200.25%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D1%0A%20%20%20%20%20%20%20%20total_memory%3D%24((total_memory-4))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_memory%20%3C%3D%204))%3B%20then%20%23%2020%25%20of%20the%20next%204GB%20of%20memory%20(up%20to%208GB)%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%20%24(echo%20%24total_memory%200.20%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%200.80%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D%24((total_memory-4))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_memory%20%3C%3D%208))%3B%20then%20%23%2010%25%20of%20the%20next%208GB%20of%20memory%20(up%20to%2016GB)%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%20%24(echo%20%24total_memory%200.10%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%200.80%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D%24((total_memory-8))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_memory%20%3C%3D%20112))%3B%20then%20%23%206%25%20of%20the%20next%20112GB%20of%20memory%20(up%20to%20128GB)%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%20%24(echo%20%24total_memory%200.06%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%206.72%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_memory%3D%24((total_memory-112))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_memory%20%3E%3D%200))%3B%20then%20%23%202%25%20of%20any%20memory%20above%20128GB%0A%20%20%20%20%20%20%20%20recommended_systemreserved_memory%3D%24(echo%20%24recommended_systemreserved_memory%20%24(echo%20%24total_memory%200.02%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20fi%0A%20%20%20%20echo%20%22SYSTEM_RESERVED_MEMORY%3D%24%7Brecommended_systemreserved_memory%7DGi%22%3E%3E%20%24%7BNODE_SIZES_ENV%7D%0A%7D%0Afunction%20dynamic_cpu_sizing%20%7B%0A%20%20%20%20total_cpu%3D%24(getconf%20_NPROCESSORS_ONLN)%0A%20%20%20%20recommended_systemreserved_cpu%3D0%0A%20%20%20%20if%20((%24total_cpu%20%3C%3D%201))%3B%20then%20%23%206%25%20of%20the%20first%20core%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24total_cpu%200.06%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%0A%20%20%20%20%20%20%20%20total_cpu%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D0.06%0A%20%20%20%20%20%20%20%20total_cpu%3D%24((total_cpu-1))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_cpu%20%3C%3D%201))%3B%20then%20%23%201%25%20of%20the%20next%20core%20(up%20to%202%20cores)%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24recommended_systemreserved_cpu%20%24(echo%20%24total_cpu%200.01%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_cpu%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24recommended_systemreserved_cpu%200.01%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_cpu%3D%24((total_cpu-1))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_cpu%20%3C%3D%202))%3B%20then%20%23%200.5%25%20of%20the%20next%202%20cores%20(up%20to%204%20cores)%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24recommended_systemreserved_cpu%20%24(echo%20%24total_cpu%200.005%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_cpu%3D0%0A%20%20%20%20else%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24recommended_systemreserved_cpu%200.01%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20%20%20%20%20total_cpu%3D%24((total_cpu-2))%0A%20%20%20%20fi%0A%20%20%20%20if%20((%24total_cpu%20%3E%3D%200))%3B%20then%20%23%200.25%25%20of%20any%20cores%20above%204%20cores%0A%20%20%20%20%20%20%20%20recommended_systemreserved_cpu%3D%24(echo%20%24recommended_systemreserved_cpu%20%24(echo%20%24total_cpu%200.0025%20%7C%20awk%20'%7Bprint%20%241%20*%20%242%7D')%20%7C%20awk%20'%7Bprint%20%241%20%2B%20%242%7D')%0A%20%20%20%20fi%0A%20%20%20%20echo%20%22SYSTEM_RESERVED_CPU%3D%24%7Brecommended_systemreserved_cpu%7D%22%3E%3E%20%24%7BNODE_SIZES_ENV%7D%0A%7D%0Afunction%20dynamic_ephemeral_sizing%20%7B%0A%20%20%20%20echo%20%22Not%20implemented%20yet%22%0A%7D%0Afunction%20dynamic_pid_sizing%20%7B%0A%20%20%20%20echo%20%22Not%20implemented%20yet%22%0A%7D%0Afunction%20dynamic_node_sizing%20%7B%0A%20%20%20%20rm%20-f%20%24%7BNODE_SIZES_ENV%7D%0A%20%20%20%20dynamic_memory_sizing%0A%20%20%20%20dynamic_cpu_sizing%0A%20%20%20%20%23dynamic_ephemeral_sizing%0A%20%20%20%20%23dynamic_pid_sizing%0A%7D%0Afunction%20static_node_sizing%20%7B%0A%20%20%20%20rm%20-f%20%24%7BNODE_SIZES_ENV%7D%0A%20%20%20%20echo%20%22SYSTEM_RESERVED_MEMORY%3D%241%22%20%3E%3E%20%24%7BNODE_SIZES_ENV%7D%0A%20%20%20%20echo%20%22SYSTEM_RESERVED_CPU%3D%242%22%20%3E%3E%20%24%7BNODE_SIZES_ENV%7D%0A%7D%0A%0Aif%20%5B%20%241%20%3D%3D%20%22true%22%20%5D%3B%20then%0A%20%20%20%20dynamic_node_sizing%0Aelif%20%5B%20%241%20%3D%3D%20%22false%22%20%5D%3B%20then%0A%20%20%20%20static_node_sizing%20%242%20%243%0Aelse%0A%20%20%20%20echo%20%22Unrecongnized%20command%20line%20option.%20Valid%20options%20are%20%5C%22true%5C%22%20or%20%5C%22false%5C%22%22%0Afi%0A
        mode: 493
        overwrite: true
        path: /usr/local/sbin/dynamic-system-reserved-calc.sh
--



If you're having problems booting/installing RHCOS, please provide:
- the full contents of the serial console showing disk initialization, network configuration, and Ignition stage (see https://access.redhat.com/articles/7212 for information about configuring your serial console)
- Ignition JSON
- output of `journalctl -b`

Comment 2 Micah Abbott 2021-08-05 15:08:27 UTC
The journal provided doesn't show the Ignition stages running, so it is not possible to determine why the `dynamic-system-reserved-calc.sh` is not present on the system. It looks like the system has already completed install and has rebooted.

Please capture the output from the console during the first boot of the node, showing the Ignition stages executing.

Comment 3 Michal Dekan 2021-08-05 16:21:21 UTC
(In reply to Micah Abbott from comment #2)
> The journal provided doesn't show the Ignition stages running, so it is not
> possible to determine why the `dynamic-system-reserved-calc.sh` is not
> present on the system. It looks like the system has already completed
> install and has rebooted.
> 
> Please capture the output from the console during the first boot of the
> node, showing the Ignition stages executing.

Was kinda under the impression that this will be needed, struggling to get the logs from the console .... 

Last thing i can see on the serial console is a grub menu, after that, i cannot see anything ...


racadm>> console com2

        Press the spacebar to pause...

        KEY MAPPING FOR CONSOLE REDIRECTION:

        Use the <ESC><1> key sequence for <F1>
        Use the <ESC><2> key sequence for <F2>
        Use the <ESC><3> key sequence for <F3>
        Use the <ESC><0> key sequence for <F10>
        Use the <ESC><!> key sequence for <F11>
        Use the <ESC><@> key sequence for <F12>

        Use the <ESC><Ctrl><M> key sequence for <Ctrl><M>
        Use the <ESC><Ctrl><H> key sequence for <Ctrl><H>
        Use the <ESC><Ctrl><I> key sequence for <Ctrl><I>
        Use the <ESC><Ctrl><J> key sequence for <Ctrl><J>

        Use the <ESC><X><X> key sequence for <Alt><x>, where x is any letter
        key, and X is the upper case of that key

        Use the <ESC><R><ESC><r><ESC><R> key sequence for <Ctrl><Alt><Del>


F2  = System Setup
F10 = Lifecycle Controller
F11 = Boot Manager
F12 = PXE Boot
IPMI: Boot to  

Initializing Serial ATA devices...


Avago Technologies MPT SAS3 BIOS
MPT3BIOS-8.37.02.00 (2020.03.02)
Copyright 2000-2020 Avago Technologies. All rights reserved

               
PCI  ENCL LUN VENDOR   PRODUCT          PRODUCT      SIZE \
SLOT SLOT NUM NAME     IDENTIFIER       REVISION     NVDATA
---- ---- --- -------- ---------------- ------------ ----------
  0           Dell Inc Dell SAS HBA     16.00.11.00   0E:01:00:39
  0    7   0  ATA      SSDSC2KG480G8R   DL69         447.1 GiB
  0    8   0  ATA      SSDSC2KG480G8R   DL69         447.1 GiB
2 supportable devices are presented for system boot selection!

Jumping to grub editing entry:

load_video                                                                     
set gfxpayload=keep                                                            
insmod gzio
linux ($root)/ostree/rhcos-4db93642d1d8298df8c1c9b655ba64df137a7572854a8dbc6ad\
c8ce6431a9cef/vmlinuz-4.18.0-305.10.2.el8_4.x86_64 random.trust_cpu=on console\
=tty0 console=ttyS0,115200n8 ignition.platform.id=metal  ostree=/ostree/boot.0\
/rhcos/4db93642d1d8298df8c1c9b655ba64df137a7572854a8dbc6adc8ce6431a9cef/0 root\
=UUID=19ac6124-7c6e-4fba-bda9-7d9408cd7897 rw rootflags=prjquota
initrd ($root)/ostree/rhcos-4db93642d1d8298df8c1c9b655ba64df137a7572854a8dbc6a\
dc8ce6431a9cef/initramfs-4.18.0-305.10.2.el8_4.x86_64.img

This is Dell PowerEdge R640 with iDRAC9, followed iDRAC settings from https://andrewladlow.co.uk/2019/09/28/idrac-serial-console-over-ssh/ it didn't help, when grub menu is gone (after entering to grub menu entry edit section and pressing ctrl+x),i don't see any output.

Comment 6 Michal Dekan 2021-08-05 17:38:48 UTC
Edited grub menu entry and replaced console=tty0 console=ttyS0,115200n8 with console=tty1 console=ttyS1,115200n8 can see the output now, provided in the attachment.

Comment 8 Micah Abbott 2021-08-05 20:51:03 UTC
In the first log, we can see the Ignition stages successfully running and the script was written to disk, service written to disk, and service enabled:

```
[  OK     47.154218] ignition[2332]: INFO     : files: createFilesystemsFiles: createFiles: op(1d): [started]  writing file "/sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc"
0m] Reached targ[   47.171268] ignition[2332]: INFO     : files: createFilesystemsFiles: createFiles: op(1d): [finished] writing file "/sysroot/var/usrlocal/sbin/dynamic-system-res"

[   47.650502] ignition[2332]: INFO     : files: op(2c): [started]  processing unit "kubelet-auto-node-size.service"
         Startin[   47.660915] ignition[2332]: INFO     : files: op(2c): op(2d): [started]  writing unit "kubelet-auto-node-size.service" at "/sysroot/etc/systemd/system/kubelet-au"
g Cleaning Up an[   47.678617] ignition[2332]: INFO     : files: op(2c): op(2d): [finished] writing unit "kubelet-auto-node-size.service" at "/sysroot/etc/systemd/system/kubelet-au"
d Shutting Down [   47.696278] ignition[2332]: INFO     : files: op(2c): [finished] processing unit "kubelet-auto-node-size.service"

[   48.602430] ignition[2332]: INFO     : files: op(49): [started]  setting preset to enabled for "kubelet-auto-node-size.service"
[   48.615155] ignition[2332]: INFO     : files: op(49): [finished] setting preset to enabled for "kubelet-auto-node-size.service"
```

The system does eventually enter the real root and displays a login prompt; this is indicative of a successful first boot.

The second log shows the system coming up and then in the middle of starting the network, the logs are broken?


The `worker.ign` attached is missing the complete Ignition config that is being served to the node; the bulk of the config is being served from the cluster at `api-int.datahub-ocp4.prod.psi.redhat.com:22623/config/worker`


Where is the Ignition snippet that configures the `kubelet-auto-node-size.service`?  Or machine config YAML that defines it?

Could you hop on the failing node and do `systemctl cat kubelet-auto-node-size.service` and perhaps `ls -latrZ /usr/local/sbin`?


It's still not clear based on the data provided what is going wrong.

Comment 9 Michal Dekan 2021-08-05 21:33:17 UTC
[root@worker-dh-001 ~]# systemctl cat kubelet-auto-node-size.service
# /etc/systemd/system/kubelet-auto-node-size.service
[Unit]
Description=Dynamically sets the system reserved for the kubelet
Wants=network-online.target
After=network-online.target ignition-firstboot-complete.service
Before=kubelet.service crio.service
[Service]
# Need oneshot to delay kubelet
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/node-sizing-enabled.env
ExecStart=/bin/bash /usr/local/sbin/dynamic-system-reserved-calc.sh ${NODE_SIZING_ENABLED} ${SYSTEM_RESERVED_MEMORY} ${SYSTEM_RESERVED_CPU}
[Install]
RequiredBy=kubelet.service

[root@worker-dh-001 ~]# ls -latrZ /usr/local/sbin
total 4
drwxr-xr-x. 11 root root system_u:object_r:var_t:s0  114 Jul 16 16:03 ..
-rwxr-xr-x.  1 root root system_u:object_r:bin_t:s0 3003 Jul 16 16:03 set-valid-hostname.sh
drwxr-xr-x.  2 root root system_u:object_r:bin_t:s0   35 Jul 16 16:03 .

I've booted with rd.debug and can see that the script dynamic-system-reserved-calc.sh was actually created, however its missing ^^^ in preceeding ls output:

[  105.335206] ignition[2259]: INFO     : files: createFilesystemsFiles: createFiles: op(1d): [started]  writing file "/sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh"
[  105.340782] ///usr/lib/dracut-lib.sh@291(getargnum): return
[  105.355243] ignition[2259]: INFO     : files: createFilesystemsFiles: createFiles: op(1d): [finished] writing file "/sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh"

Comment 14 Jonathan Lebon 2021-08-06 14:20:19 UTC
If you boot with `rd.break` and do `ls /sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh` before switchroot, is it there?

Also `lsblk` would help after the machine is fully booted to sanity-check mounts.

Aside: note that you don't need to specify a separate `var.mount` in your Butane config if using `with_mount_unit: true`. It doesn't seem like you require any additional options, so you can let Butane generate the unit for you.

Comment 15 Michal Dekan 2021-08-16 11:07:26 UTC
This is when booted with rd.break when system was already provisioned using ignition file generated by butane (script is missing)

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


switch_root:/# ls /sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh
ls: cannot access '/sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh': No such file or directory
switch_root:/# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 447.1G  0 disk 
|-sda1        8:1    0     1M  0 part 
|-sda2        8:2    0   127M  0 part 
|-sda3        8:3    0   384M  0 part 
`-sda4        8:4    0 446.6G  0 part /sysroot/sysroot
sdb           8:16   0 447.1G  0 disk 
`-sdb1        8:17   0 447.1G  0 part 
nvme0n1     259:0    0   1.5T  0 disk 
`-nvme0n1p1 259:1    0   1.5T  0 part 
switch_root:/# 

This is when system was PXE booted and sda disk got formatted and system got booted for the first time from disk (ignition file got applied and script is there):

[   46.877631] ignition[2338]: INFO     : files: op(4f): [started]  setting preset to enabled for "kubelet-auto-node-size.service"
[   46.889221] ignition[2338]: INFO     : files: op(4f): [finished] setting preset to enabled for "kubelet-auto-node-size.service"
Press Enter for [   46.900849] ignition[2338]: INFO     : files: op(50): [started]  setting preset to enabled for "kubelet.service"
emergency shell [   46.912510] ignition[2338]: INFO     : files: op(50): [finished] setting preset to enabled for "kubelet.service"
or wait 5 minute[   46.924176] ignition[2338]: INFO     : files: op(51): [started]  setting preset to enabled for "machine-config-daemon-firstboot.service"
s for reboot.   [   46.937925] ignition[2338]: INFO     : files: op(51): [finished] setting preset to enabled for "machine-config-daemon-firstboot.service"
[   46.951827] systemd[1]: Started Reload Configuration from the Real Root.
[   46.959862] dracut-pre-pivot[2451]: Warning: Break before switch_root
[   46.966392] d[   46.966488] ignition[2338]: INFO     : files: op(52): [started]  relabeling 65 patterns
racut-pre-pivot[[   46.975960] ignition[2338]: DEBUG    : files: op(52): executing: "setfiles" "-vF0" "-r" "/sysroot" "/sysroot/etc/selinux/targeted/contexts/files/file_contexts" ""
2451]: Warning: [   46.992504] ignition[2338]: INFO     : files: op(52): [finished] relabeling 65 patterns
Break before swi[   47.001995] ignition[2338]: INFO     : files: files passed
tch_root
[   47.008952] ignition[2338]: INFO     : Ignition finished successfully
[   47.016347] systemd[1]: Reached target Initrd File Systems.
[   47.022056] systemd[1]: Reached target Initrd Default Target.
[   47.027923] systemd[1]: Starting dracut pre-pivot and cleanup hook...
[   47.034484] systemd[1]: Starting Setup Virtual Console...

switch_root:/# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 447.1G  0 disk 
|-sda1        8:1    0     1M  0 part 
|-sda2        8:2    0   127M  0 part 
|-sda3        8:3    0   384M  0 part 
`-sda4        8:4    0 446.6G  0 part /sysroot/sysroot
sdb           8:16   0 447.1G  0 disk 
`-sdb1        8:17   0 447.1G  0 part 
nvme0n1     259:0    0   1.5T  0 disk 
`-nvme0n1p1 259:1    0   1.5T  0 part /sysroot/var
switch_root:/# 

[   47.040006] systemd[1]: Stopped target Initrd Default Target.
[   47.045883] systemd[1]: Stopped target Ignition Complete.
[   47.051418] systemd[1]: Stopped target Ignition Boot Disk Setup.
[   47.057542] systemd[1]: systemd-vconsole-setup.service: Succeeded.
[   47.063836] systemd[1]: Started Setup Virtual Console.
[   47.069099] systemd[1]: Starting Dracut Emergency Shell...
Press Enter for emergency shell or wait 4 minutes 45 seconds for reboot.      

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


switch_root:/# ls /sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh
/sysroot/var/usrlocal/sbin/dynamic-system-reserved-calc.sh
switch_root:/#

Comment 16 Jonathan Lebon 2021-08-17 20:51:24 UTC
Right yeah, the `rd.break` on subsequent boots won't work for this test, because it's only in the first boot that Ignition runs and that mountpoints are still active in the switchroot shell.

Looking at your second paste of `lsblk` though in the switchroot shell of the PXE boot, it looks like `nvme0n1p1` was mounted at `/var`, and not `sdb1`. Yet, by your Butane config I suspect you want `/dev/sdb1` mounted at `/var`, not the NVMe device.

So I think what's likely happening here is that `/dev/nvme0n1p1` also has partition label `var` (maybe from a previous installation attempt when that was the configuration?), and then in the real root `What=/dev/disk/by-partlabel/var` is racy and might sometime point to the NVMe device (where nothing was written) instead of `/dev/sdb1`.

Can you try either changing all instances of `/dev/disk/by-partlabel/var` to `/dev/sdb1` in the MC, or alternatively extend the MC to nuke all filesystems and/or partitions on the NVMe device (using `wipe_table: true`) ?

Comment 17 Michal Dekan 2021-08-18 15:54:35 UTC
Correct, i've removed sdb1 and nvme0n1p1 partitions from inside CoreOS (cfdisk) and re-provision using worker.ign generated by butane shared in this bz and all worked fine:

1) provision worker-001 with 3 disks - sda, sdb, nvme0n1

- sda rootfs
- sdb not used
- nvme0n1 /var

{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "data:,%7B%22ignition%22%3A%7B%22config%22%3A%7B%22merge%22%3A%5B%7B%22source%22%3A%22https%3A%2F%2Fapi-int.XXXX.XXXX.XXXX.redhat.com%3A22623%2Fconfig%2Fworker%22%7D%5D%7D%2C%22security%22%3A%7B%22tls%22%3A%7B%22certificateAuthorities%22%3A%5B%7B%22source%22%3A%22data%3Atext%2Fplain%3Bcharset%3Dutf-8%3Bbase64%2CLS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJQ3hWeGpnSkFPN2N3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl4TURnd05UQTVNelExT0ZvWApEVE14TURnd016QTVNelExT0Zvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUF1YTN5aDM1NE5FMFIKSktpR1lCZ2R3VUYzNVJBSW9wd2pKQW1LVXNPYzc4WDFKcnZCK3luZlVMbkZyazdoeEYxVXp6ekZmRGcxSnN6SQpaV1RRczN3azdDVHBocS90VWxxRkVyblE5c0FNNjBJZGYzRmRkMHJUMVFLN2RhelNxbGl6b2czRFFrYVFRT2tVCitqclJQZUJRd3JWejBuMml1S2dKR2l2ZVlJVzY5TWxWQWpXc3NQWDV6RDFmZjZBNTNsYXlRYU1wTlBjb1Z0dWIKZkhMZDZJV2RXOFNBM2Jnelg4R2V5YU90Z3RMT0QzNHJmS3pPRzFNU2dYRkhxZXNIbTl4OG5QSkJ1aXd5VnhVVAplMDhwU0Vib1ZJZko5WlJqSmlZWEJNM3hma2NHdi9yQm1ZVERUdjhyaFQ1OTZqMzBHZjZObnVoN1JpdzgwTDRkCkpGRFNwNkROSXdJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVXBkM1RTUGJTUEUrSXVFYVVXL0NHS21CdDFiVXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUdIZGR1T2l2alJveThCM3VwU3dGMHhxSmM5NzROSitDT3AvUG5YL0l5MUM2Y0V2UEZ4QTJQelBLa2NOCldKSW1mcy8reDJwMEdIY0cwTDFjQ0dNeXl6TmIzNUdRNkpjYUx2c0N6dUpqWHhoeDAwQm82VlgyUXV6TzNocW0KNXNDMzhIWGoyMEI4Tyt6dXMzRjE4RlI1WFFiSGJxcVZRYmZ0Wi9aaHZRQ0cvUkYybDduSDR1bTZtT1BScm1kKwoycHlqeW4yL3dwVFRFUjRZYm5CNi9jR0JLK2FDeHhaVlRuazc5Q1V3U1lzMjJGNDFlOUhSMTAvNE56dlh3cUFEClZlTmRVTWxLOWwvcHZ1dDc2N1FBOWRhZXkxSnlLdmlZTHZhajRvYnlnUWhOR0JCVWtnY0FwaElnWHFIQS85YXIKeFZSRnI2OEtLTy93OUVkYW9FVUtXYzFiMzZnPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg%3D%3D%22%7D%5D%7D%7D%2C%22version%22%3A%223.2.0%22%7D%7D"
        }
      ]
    },
    "version": "3.2.0"
  },
  "storage": {
    "disks": [
      {
        "device": "/dev/nvme0n1",
        "partitions": [
          {
            "label": "var",
            "number": 1
          }
        ],
        "wipeTable": true
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/var",
        "format": "xfs",
        "label": "var",
        "path": "/var",
        "wipeFilesystem": true
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var\nWhat=/dev/disk/by-partlabel/var\n[Install]\nWantedBy=local-fs.target\n",
        "enabled": true,
        "name": "var.mount"
      }
    ]
  }
}

2) provision worker-004...006

- sda rootfs
- sdb /var
- nvme0n1 - not present

Have used the same worker.ign shown above, just changed the device name in the storage section from /dev/nvme0n1 to /dev/sdb, so far all works, deployed 4.8.2 like this, will try upgrade to 4.8.4 tomorrow to see how it behaves.

Comment 18 Michal Dekan 2021-08-26 16:15:13 UTC
OK, i have another problem now, when using custom ignition file, however this one is not related exactly to custom ignition files...

Master nodes on this bare metal system have one sas controller attached with 2 slots for the disks. SAS controller is attached to pci slot...

When i use sda/sdb in the ignition config, the name for the disks is not persistent across the boots, first come first serves, so sometimes disk in slot 7 gets sda, sometimes the disk in slot 6 gets detected first and gets sda ....

In an attempt to solve this i've used disk-by-id which are unique for each disk, so now i have dedicated ignition file for each master...

control-dh-001.ign                                                                                                                                     
control-dh-002.ign                                                                                                                                         
control-dh-003.ign 

cat control-dh-001.ign


{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "data:,%7B%22ignition%22%3A%7B%22config%22%3A%7B%22merge%22%3A%5B%7B%22source%22%3A%22https%3A%2F%2Fapi-int.XXXXXXX.redhat.com%3A22623%2Fconfig%2Fmaster%22%7D%5D%7D%2C%22security%22%3A%7B%22tls%22%3A%7B%22certificateAuthorities%22%3A%5B%7B%22source%22%3A%22data%3Atext%2Fplain%3Bcharset%3Dutf-8%3Bbase64%2CLS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lJTzlqaGpKU1pVZTB3RFFZSktvWklodmNOQVFFTEJRQXdKakVTTUJBR0ExVUUKQ3hNSmIzQmxibk5vYVdaME1SQXdEZ1lEVlFRREV3ZHliMjkwTFdOaE1CNFhEVEl4TURneU5qRTBOVGd4TVZvWApEVE14TURneU5ERTBOVGd4TVZvd0pqRVNNQkFHQTFVRUN4TUpiM0JsYm5Ob2FXWjBNUkF3RGdZRFZRUURFd2R5CmIyOTBMV05oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUF5OHFRbFFlTXJvV0MKTXhjc1hvRWhJSTFKOTJLazUvRjgxYkJhc1hQeTNTZTN6bnhzWjlnTlkwR1N1ZE9uNGxGNE9hQllmUUtncGd4Swprd005YnloMjFYSmVnYmNaVXBrQWtaYzEvNTQzVVFQTG9pUzhFdzZ3U0plTkxBSzRqTVdmNGwvN1dYR0F6NFhqCjNYd0NvRUp4S0pxQWNsS0x5cytLZ2ttdkpEQlR2WWFJKy9CNWw0WVhaVzBxd01tQWtPSGhMTS8vSlhoRWdCNWEKcktsblZzZHdQbVFjVjAvWjg3Ym1rSVZLcFR3MFAwZmZhQUo3OFNrL2FTR3U5b3ZlbTNjWGpxNFZXd0hQYVdwOQpIWXc0SHVJQUtIQ2pqUmxBSnF2K2phMmxZaWJHS2pWRnFJSWdsZkFnV2hlQW1VZFdrblVKTHZJMy9kS3ZDN3MzCnhiYVhobmdybFFJREFRQUJvMEl3UURBT0JnTlZIUThCQWY4RUJBTUNBcVF3RHdZRFZSMFRBUUgvQkFVd0F3RUIKL3pBZEJnTlZIUTRFRmdRVVcyM0ZibVpRRFZHTmh0d1Q1aFNBZCt5cVlMVXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUZQaWpDMks3WVppNjkzaTUzTGs2T002YnM1eXNOb3JhdjUzU3FrODJ1bjJqSUNVbDllM2w3THNUb0FMCkVhM2k3eFZtT3ZYWS91UmtmMEhaTE9ubDJmMVNEN01aVXlCekZCVjVPUXpXcFFRWGpXajlnMTUzcUdMOTRUS2QKSnJrMnA1TE5FendKOEFFOENIZmRnVFgzS0Fmc2wxcDdGUmZNRXZzRVpWaHNEMm82NU9uUWNwUGlGeE1XSE9yVQpieU9CS3N5TXpUOEtOVmdRV0N6V3JMNW0ydnR6alhSd3cxekJ2VVBsVHB6TVFOajk3U0dwSUtlNExFcTdETTV3Cm5ZMk5TdGYyd0pKdURDeXBwWnVpVU5FMXJnTi9mZHIzS3M0ajcrejVvcll3N2UzS3Rwc0xhVzUvYnU0V2Z6YmcKOCtvNk9UblFPQ2VPRW1nc3B6ejVqa1pzZ3g4PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg%3D%3D%22%7D%5D%7D%7D%2C%22version%22%3A%223.2.0%22%7D%7D"
        }
      ]
    },
    "version": "3.2.0"
  },
  "storage": {
    "disks": [
      {
        "device": "/dev/disk/by-id/scsi-355cd2e41530ac5de",
        "partitions": [
          {
            "label": "var",
            "number": 1
          }
        ],
        "wipeTable": true
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/var",
        "format": "xfs",
        "label": "var",
        "path": "/var",
        "wipeFilesystem": true
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var\nWhat=/dev/disk/by-partlabel/var\n[Install]\nWantedBy=local-fs.target\n",
        "enabled": true,
        "name": "var.mount"
      }
    ]
  }
}

However when ignition tries to create partition on the disk identified by "by-id" it times out, hitting timeout limit 1m 30s:


  35.506140] ignition[1704]: Adding "root-ca" to list of CAs
[   35.511826] ignition[1704]: disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/by-id/scsi-355cd2e41530ac5de]
[ T[  125.712806] systemd[1]: dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device: Job dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device/start timed out.
IME ] Timed [  125.727414] systemd[1]: Timed out waiting for device dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device.
out waiting for device dev-di…-scsi\x2d355cd2e41530ac5de.device.
[  125.746196] systemd[1]: dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device: Job dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device/start failed with result 'timeout'.
[  125.761201] systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
[FAILED[  125.770391] ignition[1704]: disks failedFull config:
] Failed to [  125.776647] ignition[1704]: {
start Ignition ([  125.781001] ignition[1704]:   "ignition": {
disks).
[  125.786584] ignition[1704]:     "config": {
[  125.791541] ignition[1704]:       "merge": [
See 'systemctl s[  125.795936] ignition[1704]:         {
tatus ignition-d[  125.801023] ignition[1704]:           "verification": {}
isks.service' fo[  125.807716] ignition[1704]:         },
r details.
[  125.812847] ignition[1704]:         {
[DEPEND[  125.817622] ignition[1704]: disks: createPartitions: op(1): [failed]   waiting for devices [/dev/disk/by-id/scsi-355cd2e41530ac5de]: device unit dev-disk-by\x2did-scsi\x2d355ct
] Dependency[  125.837161] systemd[1]: ignition-disks.service: Failed with result 'exit-code'.
 failed for Igni[  125.845834] ignition[1704]:           "source": "https://api-int.XXXXXXXX.redhat.com:22623/config/master",
tion Complete.
[  125.857996] ignition[1704]:           "verification": {}
[  125.864661] ignition[1704]:         }
[DEPEND ignition[1704]:       ],
[0m] Dependency [  125.873513] ignition[1704]:       "replace": {
failed for Initr[  125.879352] ignition[1704]:         "verification": {}
d Default Target[  125.885889] ignition[1704]:       }
.
[  125.890765] ignition[1704]:     },
[  125.894426] ignition[1704]:     "proxy": {},
[  125.898797] ignition[1704]:     "security": {
[DEPEND ignition[1704]:       "tls": {
[0m] Dependency [  125.908874] ignition[1704]:         "certificateAuthorities": [
failed for Ignit[  125.916193] ignition[1704]:           {
ion OSTree: Moun[  125.921449] ignition[1704]:             "verification": {}
t (firstboot) /s[  125.928302] ignition[1704]:           }
ysroot.
[  125.933534] ignition[1704]:         ]
[  125.937970] ignition[1704]:       }
[  125.941565] ignition[1704]:     },
[  OK    125.945095] ignition[1704]:     "timeouts": {},
0m] Stopped targ[  125.951036] ignition[1704]:     "version": "3.3.0-experimental"
et Timers.
[  125.958349] ignition[1704]:   },
[  125.962626] ignition[1704]:   "passwd": {
[  OK    125.966765] ignition[1704]:     "users": [
0m] Stopped Forw[  125.972268] ignition[1704]:       {
ard Password Req[  125.977244] ignition[1704]: Ignition failed: create partitions failed: failed to wait on disks devs: device unit dev-disk-by\x2did-scsi\x2d355cd2e41530ac5de.device tit
uests to Clevis [  125.993901] systemd[1]: Failed to start Ignition (disks).
Directory Watch.[  126.000658] ignition[1704]:         "gecos": "CoreOS Admin",

Comment 20 Michal Dekan 2021-08-27 12:44:24 UTC
There is an inconsistency between coreos-installer and dracut environment in the udev rules handling for the disks .....

1) Booting from PXE, specifying scsi id for the disk -  coreos.inst.install_dev=/dev/disk/by-id/scsi-355cd2e41530aad4c and using custom ignition file generated by butane control-dh-001.ign (content shared in comment 18)

kernel http://example.host.com/openshift/ocp-XXX/rhcos-4.8.2-x86_64-live-kernel-x86_64 coreos.live.rootfs_url=http://example.host.com/openshift/ocp-XXX/rhcos-4.8.2-x86_64-live-rootfs.x86_64.img coreos.inst.install_dev=/dev/disk/by-id/scsi-355cd2e41530aad4c coreos.inst.ignition_url=http://example.host.com/openshift/ocp-XXX/control-dh-001.ign ip=bond0:dhcp bond=bond0:eno1,eno2:mode=802.3ad,miimon=100,lacp_rate=fast rd.neednet=1 nameserver=10.11.5.19 nameserver=10.5.30.160
initrd http://example.host.com/openshift/ocp-XXX/rhcos-4.8.2-x86_64-live-initramfs.x86_64.img
boot

Installation will pass, but when system boots for the first time after the installation, disks are not available inside ramdisk under  ls -la /dev/disk/by-id/scsi-* ...

Udev rules for scsi disks inside ramdisk are defined - line 12


  1 :/# grep -e "sd\*" /lib/udev/rules.d/*.rules
  2 /lib/udev/rules.d/40-redhat.rules:KERNEL=="sd*", SUBSYSTEMS=="ccw", DRIVERS=="zfcp", ENV{.ID_ZFCP_BUS}="1"
  3 /lib/udev/rules.d/40-redhat.rules:KERNEL=="sd*[!0-9]", SUBSYSTEMS=="scsi", ENV{.ID_ZFCP_BUS}=="1", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-path/ccw-$attr{hba_id}-zfcp-$attr{wwpn}:$attr
    {"
  4 /lib/udev/rules.d/40-redhat.rules:KERNEL=="sd*[0-9]", SUBSYSTEMS=="scsi", ENV{.ID_ZFCP_BUS}=="1", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-path/ccw-$attr{hba_id}-zfcp-$attr{wwpn}:$
    a"
  5 /lib/udev/rules.d/60-block.rules:ACTION!="remove", SUBSYSTEM=="block", KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*|mmcblk*|dasd*", OPTIONS+="watch"
  6 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL!="loop*|mmcblk*[0-9]|msblk*[0-9]|mspblk*[0-9]|nvme*|sd*|sr*|vd*|xvd*|bcache*|cciss*|dasd*|ubd*|scm*|pmem*|nbd*", GOTO="persistent_s
    t"
  7 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{vendor}=="ATA", IMPORT{program}="ata_id --export $devnode"
  8 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", ATTRS{type}=="5", ATTRS{scsi_level}=="[6-9]*", IMPORT{program}="ata_id 
    -"
  9 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", ATTR{removable}=="0", SUBSYSTEMS=="usb", IMPORT{program}="ata_id --export $devnode"
 10 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="usb", IMPORT{builtin}="usb_id"
 11 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $devnode", ENV{ID_BUS}="scsi"
 12 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
 13 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
 14 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[!0-9]|sr*", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}"
 15 /lib/udev/rules.d/60-persistent-storage.rules:KERNEL=="sd*[0-9]", ATTRS{ieee1394_id}=="?*", SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}-part%n"
 16 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SCSI_INQUIRY}!="?*", IMPORT{program}="/usr/bin/sg_inq --export --inhex=/sys/block/$kernel/device/inquiry --raw", 
    E"
 17 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SCSI}!="1", IMPORT{program}="/usr/bin/sg_inq --export $tempnode", ENV{ID_SCSI}="1"
 18 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SCSI}=="1", ENV{ID_SCSI_INQUIRY}=="1", IMPORT{program}="/usr/bin/sg_inq --export --inhex=/sys/block/$kernel/devic
    e"
 19 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SCSI}=="1", ENV{ID_SCSI_INQUIRY}!="1", IMPORT{program}="/usr/bin/sg_inq --export --page=sn $tempnode"
 20 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]", ENV{ID_SCSI}=="1", ENV{ID_SCSI_INQUIRY}=="1", IMPORT{program}="/usr/bin/sg_inq --export --inhex=/sys/block/$kernel/device/vp
    d"
 21 /lib/udev/rules.d/61-scsi-sg3_id.rules:KERNEL=="sd*[!0-9]|sr*", ENV{ID_SCSI}=="1", ENV{ID_SCSI_INQUIRY}!="1", IMPORT{program}="/usr/bin/sg_inq --export --page=di $tempnode"
 22 /lib/udev/rules.d/62-multipath.rules:KERNEL!="sd*|dasd*|nvme*", GOTO="end_mpath"
 23 :/#

Rule for scsi disks is probably not getting applied because those are ata disks ....

The ATA rules are before the generic scsi ones and so they run first and they populate ENV{ID_SERIAL} with what ata_id returned (they also fill ID_BUS with "ata" instead of "scsi"), then the first rule for scsi runs:

KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $devnode", ENV{ID_BUS}="scsi"
ENV{ID_SERIAL}!="?*" no longer matches  - it is not empty any more :/

2) Observing disk-id inside ramdisk, i can see this:

:/# ls -la /dev/disk/by-id/      
ata-SSDSC2KG480G8R_PHYG040105X3480BGN
ata-SSDSC2KG480G8R_PHYG040105X3480BGN-part1
ata-SSDSC2KG480G8R_PHYG040105X3480BGN-part2
ata-SSDSC2KG480G8R_PHYG040105X3480BGN-part3
ata-SSDSC2KG480G8R_PHYG040105X3480BGN-part4
ata-SSDSC2KG480G8R_PHYG040200XE480BGN
ata-SSDSC2KG480G8R_PHYG040200XE480BGN-part1
wwn-0x55cd2e41530aad4c
wwn-0x55cd2e41530aad4c-part1
wwn-0x55cd2e41530aad4c-part2
wwn-0x55cd2e41530aad4c-part3
wwn-0x55cd2e41530aad4c-part4
wwn-0x55cd2e41530ac5de
wwn-0x55cd2e41530ac5de-part1

3) Removing all the partitions inside ramdisk (to make sure i'm starting from scratch and installer does format the disk when sdX is not specified) on /dev/sda and /dev/sdb disks using cfdisk and then PXE boot with coreos.inst.install_dev=/dev/disk/by-id/wwn-0x55cd2e41530aad4c - grub cannot boot after pxe installation:

Booting from Hard drive C:
..
error: ../../grub-core/kern/disk.c:258:no such partition.
Entering rescue mode...
grub rescue>

Comment 21 Anand Paladugu 2021-08-30 14:17:03 UTC
Team,  Any update ?   

As stated above, if we are suggesting to scsi id (to circumvent the sdX first come first serve problem),  it looks like it can only help scsi disks.  How are ATA disks to be handled ?

Thx

Anand

Comment 23 Timothée Ravier 2021-08-31 15:11:06 UTC
To make progress here as we do not have access to your hardware, we need the output from `ls -la /dev/disk/by-id/` and `fdisk -l` and `blkid` from each environment (liveiso, initramfs) to be able to compare and understand where the issue is.

Comment 24 Timothée Ravier 2021-08-31 15:15:08 UTC
Please also try to make sure to remove all RHCOS EFI boot entries from your firmware and wipe the beginning of both disks before the installation.

Comment 25 Benjamin Gilbert 2021-09-01 03:56:03 UTC
The output of `udevadm info <device-path>` from each environment would also be helpful.

As an aside, have you checked whether /dev/disk/by-path contains any useful symlinks?  That might allow you to avoid per-machine Ignition configs.

Comment 27 Michal Dekan 2021-09-07 12:28:40 UTC
Indeed, when i've used /dev/disk/by-id/wwn-0x55cd2e41530ac5de as install dev (sda device at that time) grub got installed successfully - grub menu appeared and ignition file got applied.

Then i've rebooted and edited CoreOS entry and booted with rd.break

for the initramfs:

switch_root:/# ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 320 Sep  7 12:16 .
drwxr-xr-x 8 root root 160 Sep  7 12:16 ..
lrwxrwxrwx 1 root root   9 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040105X3480BGN -> ../../sdb
lrwxrwxrwx 1 root root  10 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040105X3480BGN-part1 -> ../../sdb1
lrwxrwxrwx 1 root root   9 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040200XE480BGN -> ../../sda
lrwxrwxrwx 1 root root  10 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040200XE480BGN-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040200XE480BGN-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040200XE480BGN-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Sep  7 12:16 ata-SSDSC2KG480G8R_PHYG040200XE480BGN-part4 -> ../../sda4
lrwxrwxrwx 1 root root   9 Sep  7 12:16 wwn-0x55cd2e41530aad4c -> ../../sdb
lrwxrwxrwx 1 root root  10 Sep  7 12:16 wwn-0x55cd2e41530aad4c-part1 -> ../../sdb1
lrwxrwxrwx 1 root root   9 Sep  7 12:16 wwn-0x55cd2e41530ac5de -> ../../sda
lrwxrwxrwx 1 root root  10 Sep  7 12:16 wwn-0x55cd2e41530ac5de-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Sep  7 12:16 wwn-0x55cd2e41530ac5de-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Sep  7 12:16 wwn-0x55cd2e41530ac5de-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Sep  7 12:16 wwn-0x55cd2e41530ac5de-part4 -> ../../sda4

switch_root:/# fdisk -l
sh: fdisk: command not found


switch_root:/# udevadm info /dev/disk/by-id/wwn-0x55cd2e41530aad4c
P: /devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/block/sdb
N: sdb
S: disk/by-id/ata-SSDSC2KG480G8R_PHYG040105X3480BGN
S: disk/by-id/wwn-0x55cd2e41530aad4c
S: disk/by-path/pci-0000:18:00.0-sas-phy3-lun-0
E: DEVLINKS=/dev/disk/by-id/ata-SSDSC2KG480G8R_PHYG040105X3480BGN /dev/disk/by-path/pci-0000:18:00.0-sas-phy3-lun-0 /dev/disk/by-id/wwn-0x55cd2e41530aad4c
E: DEVNAME=/dev/sdb
E: DEVPATH=/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/block/sdb
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=0
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_MODEL=SSDSC2KG480G8R
E: ID_MODEL_ENC=SSDSC2KG480G8R\x20\x20
E: ID_PART_TABLE_TYPE=gpt
E: ID_PART_TABLE_UUID=18b94a42-548e-4099-8c4d-b8372d4f7082
E: ID_PATH=pci-0000:18:00.0-sas-phy3-lun-0
E: ID_PATH_TAG=pci-0000_18_00_0-sas-phy3-lun-0
E: ID_REVISION=DL69
E: ID_SCSI=1
E: ID_SCSI_INQUIRY=1
E: ID_SERIAL=SSDSC2KG480G8R_PHYG040105X3480BGN
E: ID_SERIAL_SHORT=PHYG040105X3480BGN
E: ID_TYPE=disk
E: ID_VENDOR=ATA
E: ID_VENDOR_ENC=ATA\x20\x20\x20\x20\x20
E: ID_WWN=0x55cd2e41530aad4c
E: ID_WWN_WITH_EXTENSION=0x55cd2e41530aad4c
E: MAJOR=8
E: MINOR=16
E: SCSI_IDENT_LUN_NAA_REG=55cd2e41530aad4c
E: SCSI_IDENT_SERIAL=PHYG040105X3480BGN
E: SCSI_MODEL=SSDSC2KG480G8R
E: SCSI_MODEL_ENC=SSDSC2KG480G8R\x20\x20
E: SCSI_REVISION=DL69
E: SCSI_TPGS=0
E: SCSI_TYPE=disk
E: SCSI_VENDOR=ATA
E: SCSI_VENDOR_ENC=ATA\x20\x20\x20\x20\x20
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=13208214

switch_root:/# 

switch_root:/# udevadm info /dev/disk/by-id/wwn-0x55cd2e41530ac5de
P: /devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda
N: sda
S: disk/by-id/ata-SSDSC2KG480G8R_PHYG040200XE480BGN
S: disk/by-id/wwn-0x55cd2e41530ac5de
S: disk/by-path/pci-0000:18:00.0-sas-phy7-lun-0
E: DEVLINKS=/dev/disk/by-id/ata-SSDSC2KG480G8R_PHYG040200XE480BGN /dev/disk/by-path/pci-0000:18:00.0-sas-phy7-lun-0 /dev/disk/by-id/wwn-0x55cd2e41530ac5de
E: DEVNAME=/dev/sda
E: DEVPATH=/devices/pci0000:17/0000:17:00.0/0000:18:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda
E: DEVTYPE=disk
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=0
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_MODEL=SSDSC2KG480G8R
E: ID_MODEL_ENC=SSDSC2KG480G8R\x20\x20
E: ID_PART_TABLE_TYPE=gpt
E: ID_PART_TABLE_UUID=7f5c6179-32ea-4cd1-bad1-1b4e5cebbcc8
E: ID_PATH=pci-0000:18:00.0-sas-phy7-lun-0
E: ID_PATH_TAG=pci-0000_18_00_0-sas-phy7-lun-0
E: ID_REVISION=DL69
E: ID_SCSI=1
E: ID_SCSI_INQUIRY=1
E: ID_SERIAL=SSDSC2KG480G8R_PHYG040200XE480BGN
E: ID_SERIAL_SHORT=PHYG040200XE480BGN
E: ID_TYPE=disk
E: ID_VENDOR=ATA
E: ID_VENDOR_ENC=ATA\x20\x20\x20\x20\x20
E: ID_WWN=0x55cd2e41530ac5de
E: ID_WWN_WITH_EXTENSION=0x55cd2e41530ac5de
E: MAJOR=8
E: MINOR=0
E: SCSI_IDENT_LUN_NAA_REG=55cd2e41530ac5de
E: SCSI_IDENT_SERIAL=PHYG040200XE480BGN
E: SCSI_MODEL=SSDSC2KG480G8R
E: SCSI_MODEL_ENC=SSDSC2KG480G8R\x20\x20
E: SCSI_REVISION=DL69
E: SCSI_TPGS=0
E: SCSI_TYPE=disk
E: SCSI_VENDOR=ATA
E: SCSI_VENDOR_ENC=ATA\x20\x20\x20\x20\x20
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=13209795

switch_root:/#

Comment 28 Michal Dekan 2021-09-07 15:14:59 UTC
Part of the console log from the system, where install dev with disk-by-id/wwn was symlinked to sdb device:

[  OK  ] Reached target Network is Online.
         Starting CoreOS Installer...
[  189.237972] coreos-installer-[  189.292421]  sdb:
service[2333]: coreos-installer [  189.344006]  sdb:
install /dev/disk/by-id/wwn-0x55cd2e41530ae1f2 --ignition-url http://example.host.com/openshift/ocp-datahub/control-dh-003.ign --insecure-ignition --firstboo0
[  189.715348] coreos-installer-service[2333]: Installing Red Hat Enterprise Linux CoreOS 48.84.202107202156-0 (Ootpa) x86_64 (512-byte sectors)
[  190.321015] coreos-installer-service[2333]: Read disk 182.1 MiB/3.7 GiB (4%)
[  191.320965] coreos-installer-service[2333]: Read disk 365.6 MiB/3.7 GiB (9%)
[  192.320749] coreos-installer-service[2333]: Read disk 557.0 MiB/3.7 GiB (14%)
[  193.321022] coreos-installer-service[2333]: Read disk 768.6 MiB/3.7 GiB (20%)
[  194.321182] coreos-installer-service[2333]: Read disk 980.9 MiB/3.7 GiB (26%)
[  195.321161] coreos-installer-service[2333]: Read disk 1.2 GiB/3.7 GiB (31%)
[  196.321111] coreos-installer-service[2333]: Read disk 1.3 GiB/3.7 GiB (36%)
[  197.321103] coreos-installer-service[2333]: Read disk 1.5 GiB/3.7 GiB (40%)
[  198.321228] coreos-installer-service[2333]: Read disk 1.6 GiB/3.7 GiB (43%)
[  199.321696] coreos-installer-service[2333]: Read disk 1.8 GiB/3.7 GiB (47%)
[  200.321656] coreos-installer-service[2333]: Read disk 1.9 GiB/3.7 GiB (51%)
[  201.321926] coreos-installer-service[2333]: Read disk 2.0 GiB/3.7 GiB (54%)
[  202.322248] coreos-installer-service[2333]: Read disk 2.1 GiB/3.7 GiB (58%)
[  203.322150] coreos-installer-service[2333]: Read disk 2.3 GiB/3.7 GiB (61%)
[  204.322330] coreos-installer-service[2333]: Read disk 2.4 GiB/3.7 GiB (65%)
[  205.322376] coreos-installer-service[2333]: Read disk 2.5 GiB/3.7 GiB (67%)
[  206.323324] coreos-installer-service[2333]: Read disk 2.6 GiB/3.7 GiB (70%)
[  207.323529] coreos-installer-service[2333]: Read disk 2.7 GiB/3.7 GiB (73%)
[  208.324576] coreos-installer-service[2333]: Read disk 2.8 GiB/3.7 GiB (75%)
[  209.326337] coreos-installer-service[2333]: Read disk 2.9 GiB/3.7 GiB (78%)
[  210.327056] coreos-installer-service[2333]: Read disk 3.0 GiB/3.7 GiB (82%)
[  211.327145] coreos-installer-service[2333]: Read disk 3.1 GiB/3.7 GiB (83%)
[  212.327114] coreos-installer-service[2333]: Read disk 3.2 GiB/3.7 GiB (87%)
[  213.327038] coreos-installer-service[2333]: Read disk 3.3 GiB/3.7 GiB (90%)
[  214.327415] coreos-installer-service[2333]: Read disk 3.4 GiB/3.7 GiB (92%)
[  215.327601] coreos-installer-service[2333]: Read disk 3.5 GiB/3.7 GiB (94%)
[  216.327575] coreos-installer-service[2333]: Read disk 3.5 GiB/3.7 GiB (95%)
[  217.327712] coreos-installer-service[2333]: Read disk 3.6 GiB/3.7 GiB (97%)
[  218.327614] coreos-installer-service[2333]: Read disk 3.6 GiB/3.7 GiB (98%)
[  218.494214] coreos-installer-service[2333]: Read disk 3.7 GiB/3.7 GiB (100%)
[  218.579250] coreos-installer-service[2333]: Read disk 3.7 GiB/3.7 GiB (100%)
[  218.928490] GPT:Primary header thinks Alt. header is not at the end of the disk.
[  219.017685] GPT:7718911 != 937703087
[  219.061119] GPT:Alternate GPT header not at the end of the disk.
[  219.133671] GPT:7718911 != 937703087
[  219.177023] GPT: Use GNU Parted to correct GPT errors.
[  219.239106]  sdb: sdb1 sdb2 sdb3 sdb4
[  219.537840] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)
[  219.632627] coreos-installer-[  219.662263] GPT:Primary header thinks Alt. header is not at the end of the disk.
service[2333]: W[  219.755209] GPT:7718911 != 937703087
riting Ignition [  219.814574] GPT:Alternate GPT header not at the end of the disk.
config
[  219.903073] GPT:7718911 != 937703087
[  219.954131] GPT: Use GNU Parted to correct GPT errors.
[  219.954140]  sdb: sdb1 sdb2 sdb3 sdb4
[  OK  ] Started CoreOS Installer.
[  219.953713] coreos-installer-service[2333]: Writing first-boot kernel arguments
[  OK  ] Reached target CoreOS Installer Target.
[  220.198271] coreos-installer-service[2333]: Install complete.
[  OK  ] Started Reboot after CoreOS Installer.
[  OK  ] Reached target Finalize CoreOS Installer Target.
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped target Finalize CoreOS Installer Target.

System is unable to boot because its pointing to hd0,gpt3 partition - aka that's what sda would be:

Booting from Hard drive C:
..
error: ../../grub-core/kern/disk.c:258:no such partition.
Entering rescue mode...

Indeed, no such partition in ls:

grub rescue> ls
(hd0) (hd1) (hd1,gpt4) (hd1,gpt3) (hd1,gpt2) (hd1,gpt1)

Correct partition is this one - (hd1,gpt3) aka sdb used by coreos-installer above ^^

grub rescue> ls (hd1,gpt3)
(hd1,gpt3): Filesystem is ext2.

Grub is trying to boot from (hd0,gpt3) but there is no such partition according to ls above^^

grub rescue> set

prefix=(hd0,gpt3)/grub2
root=hd0,gpt3
grub rescue>

Grub was able to boot from the disk after these steps:

grub rescue> set prefix=(hd1,gpt3)/grub2
grub rescue> set root=hd1,gpt3
grub rescue> insmod normal
grub rescue> normal

Grub menu appeared, system booted and ignition config got applied ...

Comment 29 Benjamin Gilbert 2021-09-07 22:36:06 UTC
This bug now contains descriptions of three different problems (in comment 0, comment 18, and comment 27).  In general, please report separate problems as separate BZs to ease tracking.  In this case, let's continue tracking the second problem here; please file a separate BZ for the most recent problem of GRUB trying to boot from the wrong disk.

Comment 30 Jonathan Lebon 2021-09-08 15:11:27 UTC
OK, a lot going on here.

So to summarize:

1. when using /dev/disk/by-id/scsi-* symlinks in the Ignition config, boot of the installed system fails at the Ignition stage because the symlink is not present
2. when using the /dev/disk/by-id/wwn-* symlink to install to /dev/sdb, boot of the installed system fails at the GRUB stage

For 1, I think likely we're just missing some udev rules to create the symlink in the initramfs. And... looking at the diff of /usr/lib/udev/rules.d now between the initramfs and the real root, looks like it's 63-scsi-sg3_symlink.rules missing. It contains:

    # 2: IEEE Registered
    ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_REG}"
    ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_REG}-part%n"

which I think is what creates the symlink we want here.

It's not in the FCOS' initramfs either. Will look at adding it.

For 2, is it possible that the problem there is that GRUB is installed on both disks? When iterating on these tests, apart from the GPT partitions, did you also wipe the disks' MBRs?

Assuming that's the issue, I think you should be able to use the wwn-* links as they do exist in both the real root and the initramfs. But still, I'll look at adding the missing udev rules to FCOS and RHCOS.

As Benjamin mentioned, please open a separate BZ for any additional issues.

Comment 33 Renata Ravanelli 2021-11-29 20:14:45 UTC
The PR with the fixed got merged in 21/10/20. The fix is already available at the latest 4.10 version.

Comment 34 RHCOS Bug Bot 2021-11-29 20:25:28 UTC
The fix for this bug will not be delivered to customers until it lands in an updated bootimage.  That process is tracked in bug 2027501, which has status ASSIGNED.  Moving this bug back to POST.

Comment 35 RHCOS Bug Bot 2021-11-30 12:47:09 UTC
This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 36 Renata Ravanelli 2021-11-30 14:15:13 UTC
To validate it using qemu, you need to create the qemu disk as scsi.
Here is the info from the tests I did:

qemu-kvm -m 2048M -accel kvm -smp cores=4 -fw_cfg name=opt/com.coreos/config,file=/root/cosa.ign -drive file=/tmp/rhcos.qcow2 -net nic,model=virtio -net user,hostfwd=tcp::2222-:22 -nographic -device virtio-scsi-pci,id=scsi -drive file=cosa_disk,if=none,id=hd2 -device scsi-hd,drive=hd2 


[core@localhost ~]$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0    5G  0 disk
`-sda1   8:1    0    5G  0 part /var
sdb      8:16   0   16G  0 disk
|-sdb1   8:17   0    1M  0 part
|-sdb2   8:18   0  127M  0 part
|-sdb3   8:19   0  384M  0 part /boot
`-sdb4   8:20   0 15.5G  0 part /sysroot


[core@localhost ~]$ ls -la /dev/disk/by-id/
total 0
drwxr-xr-x. 2 root root 480 Oct 19 01:13 .
drwxr-xr-x. 8 root root 160 Oct 19 01:13 ..
lrwxrwxrwx. 1 root root   9 Oct 19 01:15 ata-QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 ata-QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 ata-QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 ata-QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 ata-QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx. 1 root root   9 Oct 19 01:15 scsi-0ATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-0ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-0ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-0ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-0ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx. 1 root root   9 Oct 19 01:15 scsi-0QEMU_QEMU_HARDDISK_hd2 -> ../../sda
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-0QEMU_QEMU_HARDDISK_hd2-part1 -> ../../sda1
lrwxrwxrwx. 1 root root   9 Oct 19 01:15 scsi-1ATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-1ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-1ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-1ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-1ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx. 1 root root   9 Oct 19 01:15 scsi-SATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-SATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-SATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-SATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 scsi-SATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4


[core@localhost ~]$ ls -la /dev/disk/by-partlabel/
total 0
drwxr-xr-x. 2 root root 140 Oct 19 01:13 .
drwxr-xr-x. 8 root root 160 Oct 19 01:13 ..
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 BIOS-BOOT -> ../../sdb1
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 EFI-SYSTEM -> ../../sdb2
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 boot -> ../../sdb3
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 root -> ../../sdb4
lrwxrwxrwx. 1 root root  10 Oct 19 01:15 var -> ../../sda1


core@localhost ~]$  grep -e "scsi*" /lib/udev/rules.d/*.rules | grep 63
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_SERIAL}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_SERIAL}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REGEXT}=="?*", ENV{DEVTYPE}=="disk", SYMLIN"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REGEXT}=="?*", ENV{DEVTYPE}=="partition", S"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="partition", SYML"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_EXT}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_EXT}=="?*", ENV{DEVTYPE}=="partition", SYML"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_EUI64}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="d"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_EUI64}=="?*", ENV{DEVTYPE}=="partition", SYMLIN"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAME}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="di"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAME}=="?*", ENV{DEVTYPE}=="partition", SYMLINK"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_T10}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="dis"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_T10}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_LOCAL}=="?*", ENV{DEVTYPE}=="disk", SYMLINK"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_LOCAL}=="?*", ENV{DEVTYPE}=="partition", SY"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_VENDOR}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+=""
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_VENDOR}=="?*", ENV{DEVTYPE}=="partition", SYMLI"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781a-1e82-4818-a1c3-63d806ec15bb}", ENV{fabri"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781b-1e82-4818-a1c3-63d806ec15bb}", ENV{fabri"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781c-1e82-4818-a1c3-63d806ec15bb}", ENV{fabri"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781d-1e82-4818-a1c3-63d806ec15bb}", ENV{fabri"

Comment 37 HuijingHei 2021-12-03 13:44:10 UTC
Test with RHCOS 410.84.202112012203, result is passed
Test with latest RHCOS 4.8(not fixed), can reproduce the issue. 

Steps:
1) Prepare test.ign
$ cat test.ign 
{
  "ignition": {
    "version": "3.2.0"
  },
  "storage": {
    "disks": [
      {
        "device": "/dev/disk/by-id/scsi-0ATA_QEMU_HARDDISK_QM00002",
        "partitions": [
          {
            "label": "var",
            "number": 1
          }
        ],
        "wipeTable": true
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/var",
        "format": "xfs",
        "label": "var",
        "path": "/var",
        "wipeFilesystem": true
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var\nWhat=/dev/disk/by-partlabel/var\n[Install]\nWantedBy=local-fs.target\n",
        "enabled": true,
        "name": "var.mount"
      },
      {
        "dropins": [
          {
            "contents": "[Service]\n# Override Execstart in main unit\nExecStart=\n# Add new Execstart with `-` prefix to ignore failure`\nExecStart=-/usr/sbin/agetty --autologin core --noclear %I $TERM\n",
            "name": "autologin-core.conf"
          }
        ],
        "name": "serial-getty"
      }
    ]
  }
}


2) Start VM with rhcos qcow2 image and another disk
qemu-kvm -m 2048M -accel kvm -smp 4 -fw_cfg name=opt/com.coreos/config,file=./test.ign -net nic,model=virtio -net user,hostfwd=tcp::2222-:22 -nographic -drive file=./rhcos.qcow2 -device virtio-scsi-pci,id=scsi0 -drive file=cosa_disk -device virtio-scsi-pci,id=scsi1

3) Check VM can boot up and ignition apply successfully

=============================================
- Test with latest rhcos-48.84.202112022303-0, ignition apply failed as expected

Dec 03 13:12:33 ignition[663]: disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/by-id/scsi-0ATA_QEMU_HARDDISK_QM00002]
Dec 03 13:14:03 systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
Dec 03 13:14:03 systemd[1]: ignition-disks.service: Failed with result 'exit-code'.
Dec 03 13:14:03 systemd[1]: Failed to start Ignition (disks).

:/# ls /dev/disk/by-id/ -al
total 0
drwxr-xr-x 2 root root 200 Dec  3 13:12 .
drwxr-xr-x 8 root root 160 Dec  3 13:12 ..
lrwxrwxrwx 1 root root   9 Dec  3 13:12 ata-QEMU_DVD-ROM_QM00003 -> ../../sr0
lrwxrwxrwx 1 root root   9 Dec  3 13:12 ata-QEMU_HARDDISK_QM00001 -> ../../sda
lrwxrwxrwx 1 root root  10 Dec  3 13:12 ata-QEMU_HARDDISK_QM00001-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Dec  3 13:12 ata-QEMU_HARDDISK_QM00001-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Dec  3 13:12 ata-QEMU_HARDDISK_QM00001-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Dec  3 13:12 ata-QEMU_HARDDISK_QM00001-part4 -> ../../sda4
lrwxrwxrwx 1 root root   9 Dec  3 13:12 ata-QEMU_HARDDISK_QM00002 -> ../../sdb
lrwxrwxrwx 1 root root  10 Dec  3 13:12 ata-QEMU_HARDDISK_QM00002-part1 -> ../../sdb1

=============================================
- Test with latest rhcos-410.84.202112012203-0, ignition apply successfully

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0   16G  0 disk 
|-sda1   8:1    0    1M  0 part 
|-sda2   8:2    0  127M  0 part 
|-sda3   8:3    0  384M  0 part /boot
`-sda4   8:4    0 15.5G  0 part /sysroot
sdb      8:16   0    4G  0 disk 
`-sdb1   8:17   0    4G  0 part /var
sr0     11:0    1 1024M  0 rom  
[core@ibm-p8-kvm-03-guest-02 ~]$ ls -la /dev/disk/by-id/
total 0
drwxr-xr-x. 2 root root 620 Dec  3 13:20 .
drwxr-xr-x. 8 root root 160 Dec  3 13:20 ..
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 ata-QEMU_DVD-ROM_QM00003 -> ../../sr0
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 ata-QEMU_HARDDISK_QM00001 -> ../../sda
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 ata-QEMU_HARDDISK_QM00001-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 ata-QEMU_HARDDISK_QM00001-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 ata-QEMU_HARDDISK_QM00001-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 ata-QEMU_HARDDISK_QM00001-part4 -> ../../sda4
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 ata-QEMU_HARDDISK_QM00002 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 ata-QEMU_HARDDISK_QM00002-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00001 -> ../../sda
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sda4
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00002 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-0ATA_QEMU_HARDDISK_QM00002-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00001 -> ../../sda
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sda4
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00002 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-1ATA_QEMU_HARDDISK_QM00002-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00001 -> ../../sda
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00001-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00001-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00001-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00001-part4 -> ../../sda4
lrwxrwxrwx. 1 root root   9 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00002 -> ../../sdb
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 scsi-SATA_QEMU_HARDDISK_QM00002-part1 -> ../../sdb1
[core@ibm-p8-kvm-03-guest-02 ~]$ ls -la /dev/disk/by-partlabel/
total 0
drwxr-xr-x. 2 root root 140 Dec  3 13:20 .
drwxr-xr-x. 8 root root 160 Dec  3 13:20 ..
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 BIOS-BOOT -> ../../sda1
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 EFI-SYSTEM -> ../../sda2
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 boot -> ../../sda3
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 root -> ../../sda4
lrwxrwxrwx. 1 root root  10 Dec  3 13:20 var -> ../../sdb1

$ grep -e "scsi*" /lib/udev/rules.d/*.rules | grep 63
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_SERIAL}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-S$env{SCSI_VENDOR}_$env{SCSI_MODEL}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_SERIAL}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-S$env{SCSI_VENDOR}_$env{SCSI_M"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REGEXT}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_RE"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REGEXT}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_N"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_REG}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_REG}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_EXT}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_EXT}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_EXT}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_EUI64}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-2$env{SCSI_IDENT_LUN_EUI64}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_EUI64}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-2$env{SCSI_IDENT_LUN_EUI64}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAME}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-8$env{SCSI_IDENT_LUN_NAME}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAME}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-8$env{SCSI_IDENT_LUN_NAME}-p"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_T10}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-1$env{SCSI_IDENT_LUN_T10}"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_T10}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-1$env{SCSI_IDENT_LUN_T10}-par"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_LOCAL}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NAA_LOC"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_NAA_LOCAL}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-3$env{SCSI_IDENT_LUN_NA"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_VENDOR}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-0$env{SCSI_VENDOR}_$env{SCSI_MO"
/lib/udev/rules.d/63-scsi-sg3_symlink.rules:ENV{SCSI_IDENT_LUN_VENDOR}=="?*", ENV{DEVTYPE}=="partition", SYMLINK+="disk/by-id/scsi-0$env{SCSI_VENDOR}_$env{SC"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781a-1e82-4818-a1c3-63d806ec15bb}", ENV{fabric_scsi_controller}="scsi0", GOTO="azure_datadis"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781b-1e82-4818-a1c3-63d806ec15bb}", ENV{fabric_scsi_controller}="scsi1", GOTO="azure_datadis"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781c-1e82-4818-a1c3-63d806ec15bb}", ENV{fabric_scsi_controller}="scsi2", GOTO="azure_datadis"
/lib/udev/rules.d/66-azure-storage.rules:ATTRS{device_id}=="{f8b3781d-1e82-4818-a1c3-63d806ec15bb}", ENV{fabric_scsi_controller}="scsi3", GOTO="azure_datadis"

Comment 38 HuijingHei 2021-12-03 13:57:49 UTC
Another question, do we have plan to backport this 4.9 or 4.8 ?

Comment 39 RHCOS Bug Bot 2021-12-09 05:25:25 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2027501 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 41 HuijingHei 2021-12-14 14:49:26 UTC
Hi Jonathan, I failed to reproduce the issue with rhcos-48.84.202112092303-0(not include the fixed patch) with command in Comment 36, as the fixed patch is now include in 4.10, do you have some suggestions? Thanks!

Additionally, I asked help from KVM qe, and he said the qemu command in Comment 37 is not correct, but I do not why it can reproduce the issue.

Comment 42 Jonathan Lebon 2021-12-14 16:11:45 UTC
(In reply to HuijingHei from comment #41)
> Hi Jonathan, I failed to reproduce the issue with
> rhcos-48.84.202112092303-0(not include the fixed patch) with command in
> Comment 36, as the fixed patch is now include in 4.10, do you have some
> suggestions? Thanks!

Can you retry the steps in comment 36, but interrupting GRUB to add `rd.break`. Then from the `rd.break` shell, you can do e.g. `ls -la /dev/disk/by-id/`. You should see that on 4.8 some of the symlinks are missing compared to doing this on 4.10.

It's normal that in the real root the symlinks are always present in both 4.8 and 4.10. This RHBZ was about including the rules in the initrd so they also show up there.

> Additionally, I asked help from KVM qe, and he said the qemu command in
> Comment 37 is not correct, but I do not why it can reproduce the issue.

Hmm, it looks OK to me, but I rarely directly use the QEMU cmdline these days.
Another easy way to get SCSI disks added is to add a multipath disk. E.g.:

```
$ cosa run -c --add-disk 1G:mpath --kargs rd.break
...
switch_root:/# ls -la /dev/disk/by-id/scsi-*
lrwxrwxrwx 1 root root 9 Dec 14 16:05 /dev/disk/by-id/scsi-0NVME_VirtualMultipath_disk1 -> ../../sda
lrwxrwxrwx 1 root root 9 Dec 14 16:05 /dev/disk/by-id/scsi-3c5cadec86ff5ebf9 -> ../../sda
lrwxrwxrwx 1 root root 9 Dec 14 16:05 /dev/disk/by-id/scsi-SNVME_VirtualMultipath_disk1 -> ../../sda
```

Comparing against 4.8:

```
$ cosa run --qemu-image rhcos-4.8.14-x86_64-qemu.x86_64.qcow2 -c --add-disk 1G:mpath --kargs rd.break
...
switch_root:/# ls -la /dev/disk/by-id/scsi-*
lrwxrwxrwx 1 root root 9 Dec 14 16:05 /dev/disk/by-id/scsi-3d6fb0870b8a01c1f -> ../../sdb
```

Comment 43 HuijingHei 2021-12-15 13:00:14 UTC
(In reply to Jonathan Lebon from comment #42)
> Can you retry the steps in comment 36, but interrupting GRUB to add
> `rd.break`. Then from the `rd.break` shell, you can do e.g. `ls -la
> /dev/disk/by-id/`. You should see that on 4.8 some of the symlinks are
> missing compared to doing this on 4.10.
> 
> It's normal that in the real root the symlinks are always present in both
> 4.8 and 4.10. This RHBZ was about including the rules in the initrd so they
> also show up there.

Thanks Jonathan for your reply!
1) With `rd.break`, some symlinks are missing in 4.8, but shown in 4.10. Change bug status to verified.
2) I will try with your cosa command, and see how to write auto script

===========================
switch_root:/# cat /etc/os-release 
VERSION="410.84.202112062002-0 dracut-049-135.git20210121.el8"

switch_root:/# ls /dev/disk/by-id/ -l
total 0
lrwxrwxrwx 1 root root  9 Dec 15 08:31 ata-QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 15 08:31 ata-QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 15 08:31 ata-QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 15 08:31 ata-QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Dec 15 08:31 ata-QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Dec 15 08:31 scsi-0ATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-0ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-0ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-0ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-0ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Dec 15 08:31 scsi-0QEMU_QEMU_HARDDISK_hd2 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-0QEMU_QEMU_HARDDISK_hd2-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Dec 15 08:31 scsi-1ATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-1ATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-1ATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-1ATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-1ATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Dec 15 08:31 scsi-SATA_QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-SATA_QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-SATA_QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-SATA_QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Dec 15 08:31 scsi-SATA_QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
      
===========================
switch_root:/# cat /etc/os-release 
VERSION="48.84.202112142303-0 dracut-049-135.git20210121.el8"

switch_root:/# ls /dev/disk/by-id/* -l
lrwxrwxrwx 1 root root  9 Dec 15 08:41 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 15 08:41 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 15 08:41 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Dec 15 08:41 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Dec 15 08:41 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part4 -> ../../sdb4
lrwxrwxrwx 1 root root  9 Dec 15 08:41 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_hd2 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 15 08:41 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_hd2-part1 -> ../../sda1

Comment 46 errata-xmlrpc 2022-03-12 04:37:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056