VCD tasks slow after failover was triggered

search cancel

VCD tasks slow after failover was triggered

book

Article ID: 315583

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Cluster failover of VCD cells was triggered due to maintenance.
Tasks become slow due to some sub tasks taking longer than earlier.
Reboot of the cluster seems to resolve the issue temporarily, however the issue may reoccur.
The jms-debug.log shows error similar to
- yyyy-mm-dd hh:mm:ss,ms | DEBUG | Thread-2 (ActiveMQ-client-netty-threads) | ActiveMQClientProtocolManager | Notifying <Node ID> going down
  ...
  yyyy-mm-dd hh:mm:ss,ms | DEBUG | e=<Node ID>-<trace>) | RemotingServiceImpl | RemotingServiceImpl::removing succeeded connection ID <connection ID>, we now have N connections
  yyyy-mm-dd hh:mm:ss,ms | DEBUG | e=<Node ID>-<trace>) | ServerSessionImpl | deleting temporary queue notif.<id>.ActiveMQServerImpl_name=<MQServer_Implementation_id>
  yyyy-mm-dd hh:mm:ss,ms | DEBUG | e=<Node ID>-<trace>) | PageSubscriptionCounterImpl | Subscription N delete, keepZero=false
The cell-runtime.log or vcloud-container-debug.log may show issues with JDBC connectivity as below:
- yyyy-mm-dd hh:mm:ss,ms | ERROR | Thread-0 (ActiveMQ-scheduled-threads) | VCDBroadcastEndpoint | Error during broadcast for local cell: <cell-id> |
  org.postgresql.util.PSQLException: The connection attempt failed.
- Caused by: java.net.SocketTimeoutException: connect timed out
- Caused by: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Cannot open connection
- Caused by: org.hibernate.exception.JDBCConnectionException: Cannot open connection

Environment

VMware Cloud Director 10.x

Cause

When a node either in maintenance or once which was recently failed over, is restored back, the Artemis cluster cannot reconnect it due to incomplete/ incorrect clean up.

Resolution

This issue has been fixed in VMware Cloud Director 10.5.1 release.
To workaround this issue:
1. Set the mq.discovery.generationDrift to 3600.
  /opt/vmware/vcloud-director/bin/cell-management-tool manage-config -n mq.discovery.generationDrift -v 3
2. Perform a shutdown and start of the vmware-vcd services on ALL cells in the environment in order for the change to be detected by the cell.
  1. /opt/vmware/vcloud-director/bin/cell-management-tool cell -i $(service vmware-vcd pid cell) -s
  2. service vmware-vcd start

Feedback

thumb_up Yes

thumb_down No