*** Bug 1957615 has been marked as a duplicate of this bug. ***
This bug is where most of the discussion of the problem has been happening, but since the comments are private I wanted to capture the status publicly so we can duplicate other bugs to this one. Here are the main points: - This configuration was never officially supported, but in some cases it may have happened to work due to quirks of the old implementation. - In 4.6 a change was made that caused behavior in multiple nic scenarios to be more consistent, but broke some environments that may have been working on 4.5. - There were far more environments that had problems with the old behavior so we can't just revert the change, but there is a workaround. - Documentation of the workaround is not yet available (but is forthcoming). Please let us know if you need details in the meantime.
*** Bug 1897743 has been marked as a duplicate of this bug. ***
(In reply to comment #58) > What are the next steps for this BZ? 1. We need a better solution for the future. It seems to me that the fix is to make UPI work more like IPI; the node IP should be the IP from the interface that has the most direct route to the other nodes, not the interface that has the default route (assuming those are different). Currently, for UPI, if the MCO knows the apiserver IP at install time, it will write it into the nodeip-configuration service, and so we will pick the IP with the most direct route to the apiserver (just like in the IPI case). However, in some cases (notably vSphere UPI), the MCO does not know the apiserver IP at install time and thus cannot do this. From what I can tell though, the apiserver IP is still *known* (eg, to the installer) in this case, it's just not recorded anywhere that the MCO can see at install time. So if we fix the install-time config plumbing so that the MCO always has access to the configured apiserver IP, then it should always be able to pass that to nodeip-configuration, and we should always get the right IP. (So this probably requires one or more additions to openshift/api, plus changes to openshift/installer to fill in the new API, and changes to openshift/machine-config-operator to consume the new API. It should not require any changes to openshift/baremetal-runtimecfg.) (It is theoretically possible that "most direct route to the apiserver" is not identical to "most direct route to the other nodes" in some UPI configurations, though this seems like it would require a pretty weird network configuration... Maybe it would be better to pick "the interface that has the most direct route to the `machineNetwork`", but I'm not sure `machineNetwork` is guaranteed to be set/correct for all UPI platforms...) 2. We need a better solution for existing customers until we have the better future solution The workaround we've discussed here *works*, but it's ugly, and has customer-specific bits which make it hard to document and to provide to other customers running into the same bug. If we think that "pick the IP from the interface with the most direct route to the other nodes" should work for everyone, then the next-best workaround would be to provide a standardized way to get that, so that instead of needing a complex customer-specific MachineConfig like in comment 42, they'd just have to write something like: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-upi-node-ip-override spec: config: ignition: version: 3.1.0 storage: files: - path: /etc/default/nodeip-configuration contents: source: data:,KUBE_APISERVER_HINT=192.168.162.3 mode: 0644 overwrite: true and then we provide a more complicated MachineConfig that they install verbatim (eg, "curl http://access.redhat.com/... | oc apply -f -") which will read the file created by the MachineConfig above and pass it to `baremetal-runtimecfg node-ip` so that it will pick the corresponding local IP on the same network. (The next possible improvement after this would be to merge the "more complicated MachineConfig" into the existing nodeip-configuration service and then backport it, so you can just create the "99-upi-node-ip-override" MachineConfig without needing to manually create the other more complicated MachineConfig as well: --- a/templates/common/on-prem/units/nodeip-configuration.service.yaml +++ b/templates/common/on-prem/units/nodeip-configuration.service.yaml @@ -26,4 +26,5 @@ contents: | node-ip \ set --retry-on-failure \ + $KUBE_APISERVER_HINT \ {{ onPremPlatformAPIServerInternalIP . }}; \ do \ @@ -32,4 +33,5 @@ contents: | ExecStart=/bin/systemctl daemon-reload + EnvironmentFile=-/etc/default/nodeip-configuration {{if .Proxy -}} EnvironmentFile=/etc/mco/proxy.env But this is kind of "backporting a new feature" so people may not want to do it, unless we think it's going to be a long time before we can have the proper future fix available.)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days