Traceflow fails with Error "Traceflow request timeout. Please retry."
search cancel

Traceflow fails with Error "Traceflow request timeout. Please retry."

book

Article ID: 375417

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

User cannot successfully initiate any new Traceflow, Newly initiated Traceflow will fail with Error "Traceflow request timeout. Please retry."

From nsxapi logs

INFO Traceflow-Timeout-Thread TraceflowObservationServiceImpl 5108 POLICY [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Creating observation in policy XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXdc04 from config TraceflowConfig{segmentPortPath='/infra/segments/ns-arb-ft-dev-int-segment/ports/default:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXfa', timeout=10, packetData=null, isTransient=true, epPath=null, startTime=1723095439855}[policyPath=/infra/traceflows/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX3d8c, markedForDelete=false]
WARN Traceflow-Timeout-Thread TraceflowCacheManager 5108 MONITORING [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Encountered exception when running timeout executor
java.lang.NullPointerException: null
        at com.vmware.nsx.management.policy.traceflow.service.common.TraceflowServiceUtils.toNatRulePath(TraceflowServiceUtils.java:371) ~[?:?]
        at com.vmware.nsx.management.policy.traceflow.service.common.TraceflowServiceUtils.convertUuidToPolicyPaths(TraceflowServiceUtils.java:295) ~[?:?]
        at com.vmware.nsx.management.policy.traceflow.service.impl.TraceflowObservationServiceImpl.convertAndCreatePolicyObservation(TraceflowObservationServiceImpl.java:81) ~[?:?]

Environment

All NSX version before 4.2.1 

Cause

If there is a NAT rule without the intent path in the Tracepath/Traceflow 
Traceflow fails when there are no intent path for a NAT rule and all subsequent Traceflow will still be pending as the Traceflow with NAT cannot be processed. Thereby, user cannot successfully initiate any new Traceflows.

Resolution

Fixed Version/Resolution: 4.2.1 


Workaround: 
The workaround should be:
1. Remove all the traceflows via the policy API before user trying to initiate a new traceflow.
   a. Get all the traceflows via RESTAPI GET https://<mgr-ip>/policy/api/v1/infra/traceflows. User can get the each id of the traceflows from the response of the RESTAPI.
   b. Delete all the traceflows listed in the previous RESTAPI via RESTAPI DELETE https://<mgr-ip>/policy/api/v1/infra/traceflows/<traceflow-id>.
2. Now, user can try the traceflow via UI normally.