Open Powershell Session / Import Modules Hangs on Action
book
Article ID: 375719
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
When utilizing the action 'openSession' to open a powershell session on a PowerShell Host, you notice intermittent hanging where this request never completes or fails.
The action is in "com.vmware.library.powershell"
This can cause builds to fail from time to time and is totally intermittent.
Screenshot below is one example where it is stuck for 17 hours and never times out. It will also from time to time get stuck on "Install/Import modules."
Adversely, this action can time out as well and fail to complete.
Environment
Aria Automation 8.16.0
Cause
When analyzing the logs specifically the thread dump, it looks like there is a concurrency issue when caching the WinRM sessions.
To collect a Thread dump you can run the following command while the workflow / action is waiting: kubectl -n prelude exec -it $(kubectl get pod -n prelude -l app=vco-app -o jsonpath="{.items[0].metadata.name}") -c vco-server-app -- bash -c "jstack -l 1 > /usr/lib/vco/app-server/logs/td1.out"
The dump will be available at /data/vco/usr/lib/vco/app-server/logs/td1.out
Resolution
A workaround for this issue can be accomplished by performing the following:
1. Implement the timeouts for the PowerShell host. This can be done by using the "Update a PowerShell host" workflow. (Please be aware these values are set in milliseconds and not seconds.)
2. Set the "com.vmware.o11n.plugin.powershell.session.expirationTimeInSecond" to 43200 (From the System Properties section of the Control Center). This will increase the internal for vRO timeout of unclosed PowerShell sessions from 2 hours to 12.
Regarding read and idle timeouts, they should be based on customer environment and network availability. The customer could start with 30,000 for idle and 60,000 for read i.e. 30 seconds for idle timeout and 60 seconds for read timeout. If more time is needed you can increase the values to 300,000 for idle, and 600,000 for read. i.e. 5 minutes for idle, and 10 minutes for read.
Another workaround would be to extract the openSession action in a separate workflow which is started Asynchronously, and waits for in the parent workflow. If it does not complete without a certain timeout, just cancel it and try again.