book
Article ID: 316651
calendar_today
Updated On:
Issue/Introduction
Symptoms:
Few Physical Switch stops sending Telemetry data and gets reconnected but there are instances whereby we may notice unreachable/reachable events and don’t see telemetry clear event for these switches. The telemetry stay’s in provisioned state
Step to reproduce:
- configure SFD and deploy
- leave the SFD in Idle State overnight
- Observed: Few Physical Switch stops sending Telemetry data and gets reconnected but there are instances whereby we may notice unreachable/reachable events and don’t see telemetry clear event for these switches. The telemetry stay’s in provisioned state
Steps to verify
- Verify if telemetry is enabled
sc2-t9-s9100-s1# show running-configuration telemetry
!
telemetry
enable
!
destination-group sfd-collector
destination 10.173.225.182 50001
!
subscription-profile sfd-collector-profile
sensor-group oc-bgp
sensor-group oc-device
sensor-group oc-environment
sensor-group oc-interface
sensor-group oc-lag
sensor-group oc-system
destination-group sfd-collector
encoding gpb
transport grpc no-tls
- Run packet capture
root@ sc2-t9-s9100-s1:~# tcpdump -vvv -i eth0 port 50001
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes0 packets captured
0 packets received by filter
0 packets dropped by kernel We can noticed there are no data being sent or received
- Login into root mode of the switch and check logs
root@sc2-t9-s9100-s1:~# journalctl -f | grep tele
Mar 21 17:27:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:27:43 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:43 sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Resolution
Restarting telemetry services
sc2-t9-s9100-s1:~# systemctl restart app-telemetry.service
Additional Information
To Verify if the issue is resolved
root@sc2-t9-s9100-s1:~# journalctl -f | grep telemetry
Mar 21 17:53:27 s sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:756, Encoding system current status data
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-SYS-OC] common/ta_protobuf_system.cpp:set_sys_cpu_info:508, Total cpu total_util 9.28 user 5.94 system 3.11 iowait 0.06 softirq 0.17
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:784, Encoding BGP oper prfx cntrs data
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:779, Encoding BGP oper peer count data
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:712, Encoding base-pas device data
Mar 21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:769, Encoding interface data 0
On SFD
opt/vmware/nfc/logs/telemetry-collector-service#grep sc2-t9-s9100-s1 service.log
Binary file service.log matches