OCP Version at Install Time: v4.7.0 RHCOS Version at Install Time: rhcos-47.83.202102090044-0 Platform: bare metal What are you trying to do? What is your use case? - Scaling up worker nodes to an OCP cluster (agnostic platform) with the worker as baremetal nodes (Cisco UCS servers with SAN multipath). What happened? What went wrong or what did you expect? - After enabling the multipathing, as the machine boots with the coreos-parameter the coreos-installer service fails while it is reading the image initially and writing it back to the disk with error [1]: [Edit]: The multipathing is being enabled as part of Day2 operations (using machine-config) and not via kernel command line parameters. [1] ~~~ [ 4905.276845] Buffer I/O error on dev sde, logical block 826485, lost async page write [ 4905.285754] Buffer I/O error on dev sde, logical block 826486, lost async page write [ 4905.294674] Buffer I/O error on dev sde, logical block 826487, lost async page write [ 4905.303622] blk_update_request: I/O error, dev sde, sector 6596160 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.315884] blk_update_request: I/O error, dev sde, sector 6610288 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.328142] blk_update_request: I/O error, dev sde, sector 6614000 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.340401] blk_update_request: I/O error, dev sde, sector 6611952 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.352657] blk_update_request: I/O error, dev sde, sector 6612080 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.364915] blk_update_request: I/O error, dev sde, sector 6612208 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.377171] blk_update_request: I/O error, dev sde, sector 6612336 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.389447] blk_update_request: I/O error, dev sde, sector 6605424 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 4905.401692] blk_update_request: I/O error, dev sde, sector 6605936 op 0x1:(WRITE) flags 0x4800 phys_seg 1 prio class 0 [ 5117.424531] blk_update_request: I/O error, dev sde, sector 6939488 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class ~~~ Attaching the latest serial console logs for reference. CU has tried multiple methods to boot up the node with multipathing enabled, however, they encounter similar issues.
Multipath should not be turned on at install time. It must be enabled via machine config as documented (https://github.com/openshift/openshift-docs/pull/30697 adds more clarity on this point). So you can drop the rd.driver.pre=dm_multipath kernel argument (which AFAICT doesn't actually turn on multipath because multipathd doesn't do anything if /etc/multipath.conf is missing). Is /dev/sde the only path in the set? If not, can you try another path instead? Is it possible that /dev/sde is not an active path?
Had a chat with Benjamin Marzinski about this issue, and got some feedback: 1. What RHCOS is doing is completely fine (installing in non-multipath mode first and then turning it on later) 2. The errors seem to come from the underlying storage device, so even if multipath *were* enabled, it wouldn't have retried the request on a different path. So IOW at this point, this seems to have nothing to do with multipath at all and is more related to the SCSI device(s) themselves. I'm not sure why RHEL would install fine but not RHCOS. I wonder if it just manifests itself more easily on RHCOS because of the heavy I/O from OCP.
Reviewed the attached case and agree with comment #7. From the system logs cmds are timing out on the storage device: --- sd 1:0:2:0: [sdc] tag#178 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s sd 1:0:2:0: [sdc] tag#178 CDB: Read(10) 28 00 04 3a 3d 58 00 00 08 00 blk_update_request: I/O error, dev sdc, sector 70925656 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Further looking into one of the sosreports attached to the case, device sdc is an active/non-optimized path: sdc: sysfs prio = 10 3600a098038304241465d4b67502d3455 dm-0 NETAPP,LUN C-Mode size=200G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:0:0 sda 8:0 active ready running | `- 1:0:1:0 sdb 8:16 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:2:0 sdc 8:32 active ready running `- 1:0:3:0 sdd 8:48 active ready running I think point #1 is causing the issue since paths aren't yet properly grouped and it seems io is going to a non-optimized path and not getting serviced within the cmd timeout. Sending the io to a non-optimized path maybe causing some trespass scenario delaying the completion state. --- 1. What RHCOS is doing is completely fine (installing in non-multipath mode first and then turning it on later)
Can we have the customer choose one of these two paths at install time: E: DEVLINKS=/dev/disk/by-path/fc-0x20000025b502009e-0x205000a09841d273-lun-0 /dev/disk/by-id/wwn-0x600a098038304241465d4b67502d3455 /dev/disk/by-id/scsi-3600a098038304241465d4b67502d3455 /dev/disk/by-path/pci-0000:86:00.0-fc-0x205000a09841d273-lun-0 E: DEVNAME=/dev/sda E: DEVLINKS=/dev/disk/by-path/fc-0x20000025b502009e-0x204e00a09841d273-lun-0 /dev/disk/by-path/pci-0000:86:00.0-fc-0x204e00a09841d273-lun-0 /dev/disk/by-id/scsi-3600a098038304241465d4b67502d3455 /dev/disk/by-id/wwn-0x600a098038304241465d4b67502d3455 E: DEVNAME=/dev/sdb And *not* one of these (which are the non-optimized paths): E: DEVLINKS=/dev/disk/by-id/wwn-0x600a098038304241465d4b67502d3455 /dev/disk/by-path/pci-0000:86:00.0-fc-0x205400a09841d273-lun-0 /dev/disk/by-path/fc-0x20000025b502009e-0x205400a09841d273-lun-0 /dev/disk/by-id/scsi-3600a098038304241465d4b67502d3455 E: DEVNAME=/dev/sdc E: DEVLINKS=/dev/disk/by-id/scsi-3600a098038304241465d4b67502d3455 /dev/disk/by-path/fc-0x20000025b502009e-0x205200a09841d273-lun-0 /dev/disk/by-id/wwn-0x600a098038304241465d4b67502d3455 /dev/disk/by-path/pci-0000:86:00.0-fc-0x205200a09841d273-lun-0 E: DEVNAME=/dev/sdd
(In reply to Jonathan Lebon from comment #7) > Had a chat with Benjamin Marzinski about this issue, and got some feedback: > 1. What RHCOS is doing is completely fine (installing in non-multipath mode > first and then turning it on later) > 2. The errors seem to come from the underlying storage device, so even if > multipath *were* enabled, it wouldn't have retried the request on a > different path. Looking into the kernel code, this error would have been retried on other paths, if multipath was running on top of this device, but it wasn't. Multipath doesn't appear to be configured to run (multipathd does start in the initramfs, but there is no /etc/multipath.conf so it does nothing, and multipathd doesn't appear to get started after the pivot-root). So perhaps the issue is that multipath is not running, and non-optimized path appeared first and was used by the system, instead of the optimized path. In this case, configuring multipath so that it claims this device before the rest of the system starts using it may solve the issue. > So IOW at this point, this seems to have nothing to do with multipath at all > and is more related to the SCSI device(s) themselves. > > I'm not sure why RHEL would install fine but not RHCOS. I wonder if it just > manifests itself more easily on RHCOS because of the heavy I/O from OCP.
Thanks all, At this point it seems like we need to support multipath right from the first boot in order to support it correctly. I'll look at reviving the patchset at https://github.com/coreos/fedora-coreos-config/pull/815 and backporting it to 4.7.
I just finished discussing this with Jon and Jonathan. Essentially, without multipath enabled the user needs to identify and select the correct path for their storage configuration to install on. Device names/paths are not guaranteed between reboots, and in this particular instance with this particular SAN configuration sda & sdb are the usable paths, whereas sdc & sdd are not. (Sorry for the oversimplification.) The fact that device/path names are not guaranteed actually helps explain why the results from the various tests are so inconsistent/confusing. Installing to a non-optimized path would result in an errors when running coreos-installer. Installing RHEL 8 most likely worked because multipath support is enabled in anaconda and the initial install target was the multipath device. Even once a usable [optimized] path is chosen as the install target, upon reboot there is still an issue of which device/path will be used as root during first boot. There will be a race between all of the underlying paths, and if one of the unusable [non-optimized] paths is chosen the system will fail to boot. I believe this explains some of the random boot failures that were seen during testing. We see a couple of short-term solutions here: 1. Unshare the non-optimized paths to these systems. This eliminates the possibility that they will be used as the install and/or boot device. A node can be provisioned as expected and end up with a "day-2" multipath configuration without hitting the above issues. At any point after that the non-optimized paths can be shared/exported back to the systems. 2. Boot the Live installer with 'rd.multipath=default' which will enable multipathd in the live install environment. Then the multipath device can be used as the install target without worrying about which individual path to use. Upon reboot/first boot, multipath is not enable (as expected), so the chance of booting off one of the non-optimized paths is still there. If that happens, rebooting the system again so that it eventually boots off an optimized path should work. At that point first boot should continue as normal. Longer term, as Jonathan said, we'll need to look at adding support for installing onto a multipath device from the start.
Hi Jatan, We're working on fixing multipath support in RHCOS. Would the customers be open to test a 4.8 bootimage scratch build?
Shouldn't have moved this to MODIFIED. Moving back to POST. Bot will pick it up once it's referenced in the bootimage bump PR.
The associated boot image bump was merged and the BZ was verified. This can be included as part of OCP 4.8 GA.
Verified on RHCOS 48.84.202106091622-0 Enabled multipath day-1 with ISO qemu-img create -f qcow2 -o nocow=on /tmp/disk765592878 20G qemu-nbd --format qcow2 --cache unsafe --discard unmap --socket /tmp/disk765592878.socket --share 2 /tmp/disk765592878 & qemu-system-x86_64 -machine accel=kvm -cpu host -m 4096 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic -drive file=/var/lib/libvirt/images/rhah/rhcos-48.84.202106091622-0-live.x86_64.iso,format=raw,if=none,readonly=on,id=installiso -device ide-cd,drive=installiso,bootindex=3 -device virtio-scsi-pci,id=scsi_mpath10 -device scsi-hd,bus=scsi_mpath10.0,drive=mpath10,vendor=NVME,product=VirtualMultipath,wwn=17603150149934919455,serial=primary-disk,bootindex=1 -drive if=none,id=mpath10,format=raw,file=nbd:unix:/tmp/disk765592878.socket,media=disk,auto-read-only=off,cache=unsafe -device virtio-scsi-pci,id=scsi_mpath11 -device scsi-hd,bus=scsi_mpath11.0,drive=mpath11,vendor=NVME,product=VirtualMultipath,wwn=17603150149934919455,serial=primary-disk,bootindex=2 -drive if=none,id=mpath11,format=raw,file=nbd:unix:/tmp/disk765592878.socket,media=disk,auto-read-only=off,cache=unsafe boot into live-iso sudo coreos-installer install /dev/sda --ignition-url=<url> --append-karg rd.multipath=default --append-karg root=/dev/disk/by-label/dm-mpath-root systemctl reboot
RHCOS 48.84.202106091622-0 is the boot image for registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-07-02-143616
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438