Telegraf agent fails to upgrade with error message:
Agent Operation Failed: Please check the health of the Cloud Proxy and the Salt service. Retry the action if components are healthy.
/storage/log/vcops/log/vcops-bridge.log reports:
2024-10-09T08:08:01,621+0000 ERROR [ServerConnection on port 10000 Thread 17141] com.vmware.vcops.bridge.server.UCPManager.bootstrapUcpAgent_aroundBody10 - Unable to perform operation: contentupgrade, Exception Detail: vcId=_ vcIp=- vmMor=<ResourceID for target Agent>
java.lang.NullPointerException: null
at com.vmware.vcops.bridge.server.UCPManager.bootstrapUcpAgent_aroundBody10(UCPManager.java:1195) ~[vcops-bridge-server-1.0-SNAPSHOT.jar:?]
at com.vmware.vcops.bridge.server.UCPManager.bootstrapUcpAgent_aroundBody11$advice(UCPManager.java:96) ~[vcops-bridge-server-1.0-SNAPSHOT.jar:?]
at com.vmware.vcops.bridge.server.UCPManager.bootstrapUcpAgent(UCPManager.java:1) ~[vcops-bridge-server-1.0-SNAPSHOT.jar:?]
at com.vmware.vcops.bridge.server.DataRetrieverServer.bootstrapUcpAgent(DataRetrieverServer.java:10869) ~[vcops-bridge-server-1.0-SNAPSHOT.jar:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
at com.vmware.vcops.platform.gemfire.GemfireFunction.invokeHandlerMethod(GemfireFunction.java:112) ~[alive_platform.jar:?]
at com.vmware.vcops.platform.gemfire.GemfireFunction.execute(GemfireFunction.java:60) ~[alive_platform.jar:?]
at com.vmware.vcops.platform.gemfire.GemfireFunctionHandler$FunctionHandler.execute(GemfireFunctionHandler.java:368) ~[alive_platform.jar:?]
at com.vmware.vcops.platform.gemfire.GemfireFunctionHandler$TopGemfireFunction.execute(GemfireFunctionHandler.java:165) ~[alive_platform.jar:?]
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteFunction70.executeFunctionLocally(ExecuteFunction70.java:401) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.internal.cache.tier.sockets.command.ExecuteFunction70.cmdExecute(ExecuteFunction70.java:262) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:191) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:895) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:1109) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1391) ~[gemfire-core-10.0.1.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:707) ~[gemfire-core-10.0.1.jar:?]
at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:124) ~[gemfire-logging-10.0.1.jar:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
2024-10-09T08:08:01,623+0000 ERROR [ServerConnection on port 10000 Thread 17141] com.vmware.vcops.bridge.server.BridgeTracerAspect.processBridgeResult - Agent Operation Failed: Please check the health of the Cloud Proxy and the Salt service. Retry the action if components are healthy. null
/storage/log/vcops/log/web.log also reports:
2024-10-09T08:08:01,502+0000 INFO [ajp-nio-127.0.0.1-8009-exec-1379] com.vmware.vcops.ui.action.ucp.UcpAgentManagementAction.startOrStopAgent - startOrStopAgent action started
2024-10-09T08:08:01,624+0000 INFO [ajp-nio-127.0.0.1-8009-exec-1379] com.vmware.vcops.ui.action.ucp.UcpAgentManagementAction.startOrStopAgent - startOrStopAgent action ended
2024-10-09T08:08:01,624+0000 ERROR [ajp-nio-127.0.0.1-8009-exec-1379] com.vmware.vcops.ui.util.PreResultInterceptor.processErrors - functionName = bootstrapUcpAgent, succeededPartially = false, errorMessage = Agent Operation Failed: Please check the health of the Cloud Proxy and the Salt service. Retry the action if components are healthy. null
2024-10-09T08:08:01,624+0000 INFO [ajp-nio-127.0.0.1-8009-exec-1379] com.vmware.vcops.ui.util.PreResultInterceptor.processErrors - Component: TODO
Url: /ui/ucpAgentManagement.action
Params: mainAction=startOrStopAgent (
Bridge Client function 'bootstrapUcpAgent' - Oct 09 08:08:01:502 - 121ms (
Bridge Server function 'bootstrapUcpAgent [node: ops05]' - Oct 09 08:08:01:504 - 119ms
checkIfUserHasPrivileges - Oct 09 08:08:01:504 - 1ms.
Aria Operations 8.18.x
An incorrect startOrStopAgent action is initiated when starting an upgrade of telegraf agent, resulting close to instant upgrade failure.
This issue is under investigation. Issue has mainly been observed on agents running on physical servers
Workaround for upgrading agents, use API instead of UI to complete the upgrade.
Locate the Resource ID's for affected agents:
Upgrade using API:
{
"contextResourceIDs" : [ "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX", "YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYYYY" ]
}
Note that example ResouceID'd has been replaced with X's and Y's in the example above. You must enter the ID located earlier. Do not remove anything outside the square-brackets, if only one ResourceID, remove the comma and second ResourceID.
Based on example from URL above, and single agent:
{
"contextResourceIDs" : [ "ZZZZZZZZ-ZZZZ-ZZZZ-ZZZZ-ZZZZZZZZZZZZ" ]
}