Bug 2045087 - Failed to apply sriov policy on intel nics
Summary: Failed to apply sriov policy on intel nics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.11.0
Assignee: Sebastian Scheinkman
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 2047734 (view as bug list)
Depends On:
Blocks: 2053945
TreeView+ depends on / blocked
 
Reported: 2022-01-25 15:19 UTC by Sebastian Scheinkman
Modified: 2022-08-10 10:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2053945 2053947 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:43:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github k8snetworkplumbingwg sriov-network-operator pull 233 0 None open Implement a rebind to default driver as a w/a 2022-01-25 15:21:55 UTC
Github openshift sriov-network-operator pull 629 0 None open Bug 2045087: Sync master 10 02 22 2022-02-13 13:58:55 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:44:05 UTC

Description Sebastian Scheinkman 2022-01-25 15:19:08 UTC
Description of problem:

The sriov operator is not able to finish the virtual functions configuration and the sriov-config-daemon is stuck in a loop.


unbinding and binding again the iavf driver to the vf allows the operator to finish the configuration.

Logs:

5 vfs are configured by only 4 are available as kernel nics

ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:57:80 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 92:d2:9d:d8:ad:3a brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:57:81 brd ff:ff:ff:ff:ff:ff
4: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:43:d0 brd ff:ff:ff:ff:ff:ff
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:43:d1 brd ff:ff:ff:ff:ff:ff
6: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:ce brd ff:ff:ff:ff:ff:ff
7: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:cf brd ff:ff:ff:ff:ff:ff
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:cf:75:9c:60:67 brd ff:ff:ff:ff:ff:ff
9: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 32:8d:3c:5a:2c:03 brd ff:ff:ff:ff:ff:ff
10: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f6:b0:90:d6:f2:66 brd ff:ff:ff:ff:ff:ff
11: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 8a:0f:31:55:53:25 brd ff:ff:ff:ff:ff:ff
17: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:ce brd ff:ff:ff:ff:ff:ff
18: 471d5ff557e40cb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
    link/ether d2:1e:53:04:2d:00 brd ff:ff:ff:ff:ff:ff link-netns 0d67ab9f-9165-47c9-9126-0a717546621e
19: 9976d0f8b156f38@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 52:8a:b2:d0:43:d5 brd ff:ff:ff:ff:ff:ff link-netns 6f92407f-6f8e-4c7e-98b3-51cc7562f4f6
20: 5a98a23c9cc8632@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether f2:0a:d3:21:2a:c9 brd ff:ff:ff:ff:ff:ff link-netns 04a14e55-a18d-47f9-b26a-de375ce6cbc4
21: 339f1e89447fc57@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 2a:1d:65:d1:21:32 brd ff:ff:ff:ff:ff:ff link-netns 5516899a-63cf-4c41-806b-483d2711cef7
27: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9a:d5:8f:b6:42:bc brd ff:ff:ff:ff:ff:ff
28: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:22:18:c7:23:98 brd ff:ff:ff:ff:ff:ff
29: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:d2:9d:d8:ad:3a brd ff:ff:ff:ff:ff:ff
30: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:02:06:6e:30:67 brd ff:ff:ff:ff:ff:ff


dmesg and sysfs:

ls -la /sys/bus/pci/devices/0000\:3b\:02.1/net/
ls: cannot access '/sys/bus/pci/devices/0000:3b:02.1/net/': No such file or directory


[root@cnfdd5 core]# dmesg | grep 0000:3b:02.1
[  336.586302] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[  336.592352] pci 0000:3b:02.1: enabling Extended Tags
[  336.597653] pci 0000:3b:02.1: Adding to iommu group 156
[  336.622048] iavf 0000:3b:02.1: enabling device (0000 -> 0002)
[  336.716570] iavf 0000:3b:02.1: Device is still in reset (-16), retrying
[  337.839372] iavf 0000:3b:02.1: Invalid MAC address 00:00:00:00:00:00, using random
[  337.848651] iavf 0000:3b:02.1: Multiqueue Enabled: Queue pair count = 4
[  337.965036] iavf 0000:3b:02.1: MAC address: 92:67:37:fd:42:25
[  337.972234] iavf 0000:3b:02.1: GRO is enabled
[  338.004527] iavf 0000:3b:02.1 ens1f0v1: renamed from eth0
[  338.195468] iavf 0000:3b:02.1: Reset warning received from the PF
[  338.211038] iavf 0000:3b:02.1: Scheduling reset task
[  353.590034] pci 0000:3b:02.1: Removing from iommu group 156
[  366.167547] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[  366.174623] pci 0000:3b:02.1: enabling Extended Tags
[  366.180037] pci 0000:3b:02.1: Adding to iommu group 156
[  366.185485] iavf 0000:3b:02.1: enabling device (0000 -> 0002)
[  366.265071] iavf 0000:3b:02.1: Device is still in reset (-16), retrying

workaround example:
echo "0000:3b:02.1" > /sys/bus/pci/drivers/iavf/unbind 
[root@cnfdd5 core]# echo "0000:3b:02.1" > /sys/bus/pci/drivers/iavf/bind
[root@cnfdd5 core]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:57:80 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 92:d2:9d:d8:ad:3a brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether d2:73:2c:3c:ed:f8 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 8a:02:06:6e:30:67 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether 9a:d5:8f:b6:42:bc brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 4     link/ether f2:22:18:c7:23:98 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:57:81 brd ff:ff:ff:ff:ff:ff
4: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:43:d0 brd ff:ff:ff:ff:ff:ff
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 40:a6:b7:17:43:d1 brd ff:ff:ff:ff:ff:ff
6: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:ce brd ff:ff:ff:ff:ff:ff
7: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:cf brd ff:ff:ff:ff:ff:ff
8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:cf:75:9c:60:67 brd ff:ff:ff:ff:ff:ff
9: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 32:8d:3c:5a:2c:03 brd ff:ff:ff:ff:ff:ff
10: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f6:b0:90:d6:f2:66 brd ff:ff:ff:ff:ff:ff
11: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 8a:0f:31:55:53:25 brd ff:ff:ff:ff:ff:ff
17: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:55:e4:ce brd ff:ff:ff:ff:ff:ff
18: 471d5ff557e40cb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
    link/ether d2:1e:53:04:2d:00 brd ff:ff:ff:ff:ff:ff link-netns 0d67ab9f-9165-47c9-9126-0a717546621e
19: 9976d0f8b156f38@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 52:8a:b2:d0:43:d5 brd ff:ff:ff:ff:ff:ff link-netns 6f92407f-6f8e-4c7e-98b3-51cc7562f4f6
20: 5a98a23c9cc8632@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether f2:0a:d3:21:2a:c9 brd ff:ff:ff:ff:ff:ff link-netns 04a14e55-a18d-47f9-b26a-de375ce6cbc4
21: 339f1e89447fc57@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP mode DEFAULT group default 
    link/ether 2a:1d:65:d1:21:32 brd ff:ff:ff:ff:ff:ff link-netns 5516899a-63cf-4c41-806b-483d2711cef7
27: ens1f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9a:d5:8f:b6:42:bc brd ff:ff:ff:ff:ff:ff
28: ens1f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:22:18:c7:23:98 brd ff:ff:ff:ff:ff:ff
29: ens1f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 92:d2:9d:d8:ad:3a brd ff:ff:ff:ff:ff:ff
30: ens1f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:02:06:6e:30:67 brd ff:ff:ff:ff:ff:ff
33: ens1f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:73:2c:3c:ed:f8 brd ff:ff:ff:ff:ff:ff


error from the sriov-config-daemon:

I0125 13:50:37.826247    7635 utils.go:222] configSriovDevice(): config interface 0000:17:00.0 with &{0000:17:00.0 64 9000 eno1   [{sriov_nics vfio-pci 0-63 sriov-network-policy 9000 false}]}
I0125 13:50:37.828858    7635 utils.go:573] setVfAdminMac(): VF 0000:17:02.0
I0125 13:50:37.828985    7635 utils.go:555] vfIsReady(): VF device 0000:17:02.0
E0125 13:50:37.829027    7635 utils.go:562] vfIsReady(): unable to get VF link for device 0000:17:02.0, "Link not found"
I0125 13:50:37.928783    7635 writer.go:132] setNodeStateStatus(): syncStatus: InProgress, lastSyncError: timed out waiting for the condition
E0125 13:50:38.830125    7635 utils.go:562] vfIsReady(): unable to get VF link for device 0000:17:02.0, "Link not found"
E0125 13:50:39.829407    7635 utils.go:562] vfIsReady(): unable to get VF link for device 0000:17:02.0, "Link not found"

Comment 2 Marius Cornea 2022-01-31 09:02:46 UTC
*** Bug 2047734 has been marked as a duplicate of this bug. ***

Comment 8 elevin 2022-02-14 11:45:24 UTC
Server Version: 4.10.0-fc.2
=========================================================
We wanted to create 60 VFs, but 3 of them are stuck.
NodePolicy fixed that.
=========================================================
ip l 2>/dev/null | grep ens7f0v | wc -l
57

Apply nodePolicy

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  linkType: eth
  needVhostNet: true
  nicSelector:
    pfNames: ["ens7f0#0-29"]
  nodeSelector:
    kubernetes.io/hostname: "helix13.lab.eng.tlv2.redhat.com"
  numVfs: 60
  priority: 99
  resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  linkType: eth
  nicSelector:
    pfNames: ["ens7f0#30-59"]
  nodeSelector:
    kubernetes.io/hostname: "helix13.lab.eng.tlv2.redhat.com"
  numVfs: 60
  priority: 99
  resourceName: sriov_nic_1

 oc logs sriov-network-config-daemon-vnhpt -c sriov-network-config-daemon -n openshift-sriov-network-operator | grep RebindVfToDefaultDriver
I0214 11:37:02.502335 3475607 utils.go:754] RebindVfToDefaultDriver(): VF 0000:86:05.1
W0214 11:37:02.562319 3475607 utils.go:763] RebindVfToDefaultDriver(): workaround implemented for VF 0000:86:05.1
I0214 11:37:13.605509 3475607 utils.go:754] RebindVfToDefaultDriver(): VF 0000:86:05.2
W0214 11:37:13.665633 3475607 utils.go:763] RebindVfToDefaultDriver(): workaround implemented for VF 0000:86:05.2
I0214 11:37:24.707337 3475607 utils.go:754] RebindVfToDefaultDriver(): VF 0000:86:05.3
W0214 11:37:24.767222 3475607 utils.go:763] RebindVfToDefaultDriver(): workaround implemented for VF 0000:86:05.3


sh-4.4# ip l 2>/dev/null | grep ens7f0v | wc -l
30

Comment 10 errata-xmlrpc 2022-08-10 10:43:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.