Telemetry Data from Physical Switch gets disconnected and stuck in provisioned state after leaving SFD in Idle state
search cancel

Telemetry Data from Physical Switch gets disconnected and stuck in provisioned state after leaving SFD in Idle state

book

Article ID: 316651

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:

Few Physical Switch stops sending Telemetry data and gets reconnected but there are instances whereby we may notice unreachable/reachable events and don’t see telemetry clear event for these switches. The telemetry stay’s in provisioned  state

Step to reproduce:

  • configure SFD and deploy
  • leave the SFD in Idle State overnight
  • Observed: Few Physical Switch stops sending Telemetry data and gets reconnected but there are instances whereby we may notice unreachable/reachable events and don’t see telemetry clear event for these switches. The telemetry stay’s in provisioned  state


Steps to verify
 
  1. Verify if telemetry is enabled

sc2-t9-s9100-s1# show running-configuration telemetry
!
telemetry
enable
!
destination-group sfd-collector
destination 10.173.225.182 50001
!
subscription-profile sfd-collector-profile
sensor-group oc-bgp
sensor-group oc-device
sensor-group oc-environment
sensor-group oc-interface
sensor-group oc-lag
sensor-group oc-system
destination-group sfd-collector
encoding gpb
transport grpc no-tls

 
  1. Run packet capture

root@ sc2-t9-s9100-s1:~# tcpdump -vvv -i eth0 port 50001
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

            We can noticed there are no data being sent or received
 
  1. Login into root mode of the switch and check logs


root@sc2-t9-s9100-s1:~# journalctl -f | grep tele
Mar 21 17:27:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:27:43
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:42
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure
Mar 21 17:28:43
sc2-t9-s9100-s1 app_telemetry[1842]: [APP_TELEMETRY:TA-TRANS-GRPC], Send data failed due to lock acquire failure


 


Resolution

Restarting telemetry services

sc2-t9-s9100-s1:~# systemctl restart app-telemetry.service

Additional Information


To Verify if the issue is resolved


root@sc2-t9-s9100-s1:~# journalctl -f | grep telemetry
Mar  21 17:53:27 s sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-SCHEDULER] infra/ta_scheduler.cpp:scheduler_handler:298, Timer interval 15000 sub name sfd-collector-profile wait time 15000 elpased time 15000
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-CPS-UTILS] utils/ta_cps_utils.cpp:ta_cps_get_system_uptime:753, The uptime is 2758
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:756, Encoding system current status data
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-SYS-OC] common/ta_protobuf_system.cpp:set_sys_cpu_info:508, Total cpu total_util 9.28 user 5.94 system 3.11 iowait 0.06 softirq 0.17
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:784, Encoding BGP oper prfx cntrs data
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:779, Encoding BGP oper peer count data
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:712, Encoding base-pas device data
Mar  21 17:53:27 sc2-t9-s9100-s1 app_telemetry[11332]: [APP_TELEMETRY:TA-ENCODE-GPB-OC] common/ta_encode_protobuf_oc.cpp:encode_telemetry_data:769, Encoding interface data 0


On SFD

opt/vmware/nfc/logs/telemetry-collector-service#grep sc2-t9-s9100-s1 service.log
Binary file service.log matches