Bug 2005926 - PTP operator NodeOutOfPTPSync rule is using max offset from the master instead of openshift_ptp_clock_state metrics
Summary: PTP operator NodeOutOfPTPSync rule is using max offset from the master instea...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Aneesh Puttur
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-20 13:46 UTC by Aneesh Puttur
Modified: 2022-03-10 16:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:12:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ptp-operator pull 142 0 None open Bug 2005926: Prometheus rules to use new metrics 2021-09-29 13:15:40 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:12:37 UTC

Description Aneesh Puttur 2021-09-20 13:46:33 UTC
Description of problem:
PTP Operator has defined Prometheus rules to define out Of sync alters on max master offset . 

1. When  fast event enabled, we should use sync state metrics "openshift_ptp_clock_state" to define  NodeOutOfSync alert.

2. When fast event is not enabled , we should use "openshift_ptp_offset_from_system" to defined NodeOutOfSync alerts 



Version-Release number of selected component (if applicable):
4.9/4.10

Comment 1 Sebastian Scheinkman 2021-10-04 10:28:55 UTC
Hi Aneesh,

The problem using the `openshift_ptp_clock_state` is that on telco if the sync is higher than +-100 the synchronization should be marked as out of sync.

One option is to change the NodeOutOfSync to use the `clock_state` and create a new one NodeHighPtpOffsetSync

WDYT?

Comment 3 obochan 2022-01-06 16:05:03 UTC
@aputtur

Comment 4 obochan 2022-01-06 16:18:17 UTC
@aputtur@aputtur 

We review the issue described int he bug, and tried to verified it according to the steps detailed above.

1. When  fast event enabled, we should use sync state metrics "openshift_ptp_clock_state" to define  NodeOutOfSync alert.

This mean when pull the metrics using curl when fast event is enabled we should see only the openshift_ptp_clock_state metric.

2. When fast event is not enabled , we should use "openshift_ptp_offset_from_system" to defined NodeOutOfSync alerts 

This mean when pull the metrics using curl when fast event is enabled we should see only the openshift_ptp_offset_from_system metric.


what we currently see is :

when pulling the metrics when fast event is enabled.

[marzianor@localhost ptp]$ oc -n openshift-ptp exec linuxptp-daemon-vvr4b -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_clock"
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1

[marzianor@localhost ptp]$ oc -n openshift-ptp exec linuxptp-daemon-vvr4b -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_offset"
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -19
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -12
openshift_ptp_offset_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -12

when pulling the metrics when fast event is disabled

[obochan@obochan ptp]$ oc -n openshift-ptp exec linuxptp-daemon-g96xb -c linuxptp-daemon-container -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_clock"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2237    0  2237    0     0  2184k      0 --:--:-- --:--:-- --:--:-- 2184k
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
[obochan@obochan ptp]$ oc -n openshift-ptp exec linuxptp-daemon-g96xb -c linuxptp-daemon-container -- curl 127.0.0.1:9091/metrics   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns 
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 87
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 405
# HELP openshift_ptp_frequency_adjustment_ns 
# TYPE openshift_ptp_frequency_adjustment_ns gauge
openshift_ptp_frequency_adjustment_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -2372
openshift_ptp_frequency_adjustment_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -77826
# HELP openshift_ptp_interface_role 0 = PASSIVE, 1 = SLAVE, 2 = MASTER, 3 = FAULTY, 4 = UNKNOWN
# TYPE openshift_ptp_interface_role gauge
openshift_ptp_interface_role{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_max_offset_ns 
# TYPE openshift_ptp_max_offset_ns gauge
openshift_ptp_max_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -5
openshift_ptp_max_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -5
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 44
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
100  2239    0  2239    0     0  2186k      0 --:--:-- --:--:-- --:--:-- 2186k


as you could see above on both examples (enable and disable) you see the clock_state metrics but you, but according to what we understand when the event is disabled we should have a the offset event.

Please advise if that is the way we want it to work, or the issue wasn't fix accordingly.

Comment 5 obochan 2022-01-06 16:18:18 UTC
@aputtur@aputtur 

We review the issue described int he bug, and tried to verified it according to the steps detailed above.

1. When  fast event enabled, we should use sync state metrics "openshift_ptp_clock_state" to define  NodeOutOfSync alert.

This mean when pull the metrics using curl when fast event is enabled we should see only the openshift_ptp_clock_state metric.

2. When fast event is not enabled , we should use "openshift_ptp_offset_from_system" to defined NodeOutOfSync alerts 

This mean when pull the metrics using curl when fast event is enabled we should see only the openshift_ptp_offset_from_system metric.


what we currently see is :

when pulling the metrics when fast event is enabled.

[marzianor@localhost ptp]$ oc -n openshift-ptp exec linuxptp-daemon-vvr4b -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_clock"
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1

[marzianor@localhost ptp]$ oc -n openshift-ptp exec linuxptp-daemon-vvr4b -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_offset"
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -19
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -12
openshift_ptp_offset_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -12

when pulling the metrics when fast event is disabled

[obochan@obochan ptp]$ oc -n openshift-ptp exec linuxptp-daemon-g96xb -c linuxptp-daemon-container -- curl 127.0.0.1:9091/metrics | grep "openshift_ptp_clock"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2237    0  2237    0     0  2184k      0 --:--:-- --:--:-- --:--:-- 2184k
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
[obochan@obochan ptp]$ oc -n openshift-ptp exec linuxptp-daemon-g96xb -c linuxptp-daemon-container -- curl 127.0.0.1:9091/metrics   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns 
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 87
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 405
# HELP openshift_ptp_frequency_adjustment_ns 
# TYPE openshift_ptp_frequency_adjustment_ns gauge
openshift_ptp_frequency_adjustment_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -2372
openshift_ptp_frequency_adjustment_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -77826
# HELP openshift_ptp_interface_role 0 = PASSIVE, 1 = SLAVE, 2 = MASTER, 3 = FAULTY, 4 = UNKNOWN
# TYPE openshift_ptp_interface_role gauge
openshift_ptp_interface_role{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_max_offset_ns 
# TYPE openshift_ptp_max_offset_ns gauge
openshift_ptp_max_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -5
openshift_ptp_max_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -5
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 44
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
100  2239    0  2239    0     0  2186k      0 --:--:-- --:--:-- --:--:-- 2186k


as you could see above on both examples (enable and disable) you see the clock_state metrics but you, but according to what we understand when the event is disabled we should have a the offset event.

Please advise if that is the way we want it to work, or the issue wasn't fix accordingly.

Comment 7 obochan 2022-01-10 14:36:25 UTC
Ok, i understand so according to what you say the logs shows the correct information.

it mean openshift_ptp_offset_from_system was changed to openshift_ptp_offset_ns can you confirm it?

Ofer.

Comment 9 obochan 2022-01-12 06:33:58 UTC
Ok That bug is on verified - i could do duplicate and reopen the first bug - please advise how you want to deal with it.

Comment 11 obochan 2022-01-19 07:51:19 UTC
 reopen this issue https://bugzilla.redhat.com/show_bug.cgi?id=2019198 or we want to deal with the convention name as agreed at comment 6 and 7.

Please advise.

Comment 12 obochan 2022-01-23 13:25:55 UTC
from what i could see when i change to the ptpopertatorconfig and disabled the events the metrics stopped , is that expected behavior are the metric only enabled when sidecar is enabled(events).

Please advise , what is the expected behavior, you can see the below the out of the 2 options enabled/disabled

[obochan@obochan ptp]$ cat /tmp/event_disable
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns 
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 83
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 407
# HELP openshift_ptp_frequency_adjustment_ns 
# TYPE openshift_ptp_frequency_adjustment_ns gauge
openshift_ptp_frequency_adjustment_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -2392
openshift_ptp_frequency_adjustment_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -77616
# HELP openshift_ptp_interface_role 0 = PASSIVE, 1 = SLAVE, 2 = MASTER, 3 = FAULTY, 4 = UNKNOWN
# TYPE openshift_ptp_interface_role gauge
openshift_ptp_interface_role{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_max_offset_ns 
# TYPE openshift_ptp_max_offset_ns gauge
openshift_ptp_max_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 3
openshift_ptp_max_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 3
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="master",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 3
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 3
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 166
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
white_check_mark
eyes
raised_hands





1:58
[obochan@obochan ptp]$ cat /tmp/event_enable 
# HELP cne_api_events_published Metric to get number of events published by the rest api
# TYPE cne_api_events_published gauge
cne_api_events_published{address="/cluster/node/cnfde7.ptp.lab.eng.bos.redhat.com/ptp",status="success"} 44
# HELP cne_api_publishers Metric to get number of publishers
# TYPE cne_api_publishers gauge
cne_api_publishers{status="active"} 1
# HELP cne_events_ack Metric to get number of events produced
# TYPE cne_events_ack gauge
cne_events_ack{status="success",type="/cluster/node/cnfde7.ptp.lab.eng.bos.redhat.com/ptp"} 44
# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 1
openshift_ptp_clock_state{iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns 
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 82
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 406
openshift_ptp_delay_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 406
# HELP openshift_ptp_frequency_adjustment_ns 
# TYPE openshift_ptp_frequency_adjustment_ns gauge
openshift_ptp_frequency_adjustment_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -2395
openshift_ptp_frequency_adjustment_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -77590
openshift_ptp_frequency_adjustment_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} -77590
# HELP openshift_ptp_interface_role 0 = PASSIVE, 1 = SLAVE, 2 = MASTER, 3 = FAULTY, 4 =  UNKNOWN
# TYPE openshift_ptp_interface_role gauge
openshift_ptp_interface_role{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 1
# HELP openshift_ptp_max_offset_ns 
# TYPE openshift_ptp_max_offset_ns gauge
openshift_ptp_max_offset_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} 78
openshift_ptp_max_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 99
openshift_ptp_max_offset_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 99
# HELP openshift_ptp_offset_ns 
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="ens5fx",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="ptp4l"} -8
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 16
openshift_ptp_offset_ns{from="phc",iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",process="phc2sys"} 16
# HELP openshift_ptp_threshold 
# TYPE openshift_ptp_threshold gauge
openshift_ptp_threshold{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",threshold="HoldOverTimeout"} 60
openshift_ptp_threshold{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",threshold="MaxOffsetThreshold"} 100
openshift_ptp_threshold{iface="ens5f1",node="cnfde7.ptp.lab.eng.bos.redhat.com",threshold="MinOffsetThreshold"} -100
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 11
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
[obochan@obochan ptp]$

Comment 13 obochan 2022-01-25 07:24:27 UTC
When events are enabled the   NodeOutOfSync alert is generated via :

alert: NodeOutOfPtpSync
expr: openshift_ptp_clock_state
  != 1
for: 2m
labels:
  severity: warning
annotations:
  message: |
    {{ $labels.iface }} is not in sync 

When the events are disabled NodeOutOfSync alert is generated via :

alert: HighPtpSyncOffset
expr: openshift_ptp_offset_ns
  > 100 or openshift_ptp_offset_ns < -100
for: 2m
labels:
  severity: warning
annotations:
  message: |
    All nodes should have ptp sync offset lower then 100

Comment 14 obochan 2022-01-25 12:35:34 UTC
to validate the alert while the events are disabled we had to change the Prometheus thresholds of NodeOutOfSync alert

[obochan@obochan ptp]$ oc edit prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules
prometheusrule.monitoring.coreos.com/ptp-rules edited
[obochan@obochan ptp]$ oc edit prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules
prometheusrule.monitoring.coreos.com/ptp-rules edited
[obochan@obochan ptp]$ oc get prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2022-01-24T14:11:34Z"
  generation: 8
  labels:
    prometheus: k8s
    role: alert-rules
  name: ptp-rules
  namespace: openshift-ptp
  ownerReferences:
  - apiVersion: ptp.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: PtpOperatorConfig
    name: default
    uid: 4cda9c30-aa26-48af-8a92-52ac2baf826e
  resourceVersion: "744823"
  uid: b27b772b-6c5f-4d86-bbb9-010327aa571f
spec:
  groups:
  - name: ptp.rules
    rules:
    - alert: HighPtpSyncOffset
      annotations:
        message: |
          All nodes should have ptp sync offset lower then 100
      expr: |
        openshift_ptp_offset_ns > 2 or openshift_ptp_offset_ns < -2
      for: 2m
      labels:
        severity: warning

HighPtpSyncOffset (2 active)
alert: HighPtpSyncOffset
expr: openshift_ptp_offset_ns
  > 2 or openshift_ptp_offset_ns < -2
for: 2m
labels:
  severity: warning
annotations:
  message: |
    All nodes should have ptp sync offset lower then 100

Comment 15 obochan 2022-01-25 12:36:03 UTC
to validate the alert while the events are disabled we had to change the Prometheus thresholds of NodeOutOfSync alert

[obochan@obochan ptp]$ oc edit prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules
prometheusrule.monitoring.coreos.com/ptp-rules edited
[obochan@obochan ptp]$ oc edit prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules
prometheusrule.monitoring.coreos.com/ptp-rules edited
[obochan@obochan ptp]$ oc get prometheusrules.monitoring.coreos.com -n openshift-ptp ptp-rules -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2022-01-24T14:11:34Z"
  generation: 8
  labels:
    prometheus: k8s
    role: alert-rules
  name: ptp-rules
  namespace: openshift-ptp
  ownerReferences:
  - apiVersion: ptp.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: PtpOperatorConfig
    name: default
    uid: 4cda9c30-aa26-48af-8a92-52ac2baf826e
  resourceVersion: "744823"
  uid: b27b772b-6c5f-4d86-bbb9-010327aa571f
spec:
  groups:
  - name: ptp.rules
    rules:
    - alert: HighPtpSyncOffset
      annotations:
        message: |
          All nodes should have ptp sync offset lower then 100
      expr: |
        openshift_ptp_offset_ns > 2 or openshift_ptp_offset_ns < -2
      for: 2m
      labels:
        severity: warning

HighPtpSyncOffset (2 active)
alert: HighPtpSyncOffset
expr: openshift_ptp_offset_ns
  > 2 or openshift_ptp_offset_ns < -2
for: 2m
labels:
  severity: warning
annotations:
  message: |
    All nodes should have ptp sync offset lower then 100

Comment 19 errata-xmlrpc 2022-03-10 16:12:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.