We have our FDM process load balanced with a solution that was setup and implemented by CA/Broadcom.
We use LAC created APIs to call FDM process on different app server than GTDM/Portal. We have these FDM app servers restarting every night due to FDM Java processes not closing or shutting down after the FDM job is complete ("zombie" processes). We typically see fewer than a dozen zombies accumulate in a day, but on Wednesday, we had several hundred zombies on two different load balancing app servers.
These processes consumed all of the available RAM on the application servers, and while FDM was getting called over and over, there was no RAM to use so they weren't being started. Because there is no logging until FDM actually starts, we were not seeing error messages (or logs). We found the problem when attempting to run a batch file (that kicks off FDM) in CMD line, instead of via API call.
This is the first time this has happened, but we cannot have this happen again.
Release : 4.6
Component : CA Test Data Manager FDM on an LAC distributed Server Farm
I don't believe you have ever experienced this with FDM before in environments outside of LAC. As you stated, HUNDREDS of FDMs are being instantiated, and it is not FDM doing this.
Therefore, this does not seem to be an FDM issue but a LAC issue.
The LAC issue is fixed.
The issue was with the WAR file that was provided to us when the original LAC config was done in partnership with CA. There were 3 dll files that did not get created from the WAR file though it did in DEV. Not sure what got into our PROD VMs but we were able to figure out that and copy the files from DEV to PROD and now all connections and processes are working.