Smarts SAM/ IP: Toposync taking long time to complete; dmctl command is getting Remote command execution timeout
book
Article ID: 331794
calendar_today
Updated On:
Products
VMware Smart Assurance
Issue/Introduction
Symptoms:
Unable to use normal functionality e.g. dmctl commands are returning "Remote command execution timeout" - this is caused, in this case, because the toposync is using all resources.
Topology Sync uses DXA, and if FastDXA is enabled with DXA, then FastDXA will be used instead of the regular DXA. The issue reported here is as a result of the way FastDXA is implemented.
One of the differences between DXA and FastDXA is the length of time the repository is locked for during the toposync transaction. Fast DXA locks the repository during a transaction whereas DXA is implemented entirely using an asl script, whereby the topology is sync'ed over instance by instance.
When FastDXA is enabled, the topology sync is done mostly by a single function call to getTopologyData(), which is implemented in C++ (NOTE: the call to this function is invoked and prepared in asl, but the invocation process only counts a small part of the entire cost of topology sync).
According to the code, one code branch of getTopologyData() is guarded in a MR_Transaction when the topology data is read out from the IP repository and this can take some time to complete. During the MR_Transaction, the thread holds a WRITE repository lock that blocks any other threads that need access to the repository during the transaction and this is where the contention comes from.
Regular DXA (or so-called normal DXA) is implemented using an asl script, and the topology is sync'ed instance by instance individually.There is no MR_Transcation that guards the entire sync process, therefore less risk of thread contention during topology sync.
Environment
VMware Smart Assurance - SMARTS
Cause
This is caused by FastDXA using a lock on the repository during an entire Topology Sync.
Resolution
Topology Sync using FastDXA will changed so that the single toposync transaction is broken down into smaller transactions, thus allowing the repository to be freed up and used by the polling threads etc. during the time taken to complete the toposync.
This fix will be implemented in a future patch release of SAM 9.4. Please contact Customer Support for details on ETA.
As a workaround in the meantime, if this contention is experienced, you can disable FastDXA.