A vCloud Director infrastructure can contain one or more cells. Multi-cell communication is achieved by using a Message Bus.
With multiple cells, one cell requires a session aware load balancer. Even though all the cells continue to run all the services, cells can be given special roles and do certain services. Cells learn about other cells when they register to the same Oracle database.
Notes:
- All cells in a multi-cell environment must be configured to use a centralized NTP server.
- NTP synchronization also required between all ESX hosts.
VMware Cloud Director critical components (services) provide high availability and survive hardware failures.
Some of the critical clustered components are:
- Monitoring Service
- Heartbeat Service
- Console Proxy
- VC Proxy
- Image Transfer Service
- Activity Log Cleaner
- LDAP Synchronizing Service
These services do not need to be clustered in VMware Cloud Director:
- Console Proxy – This component runs on every cell and is stateless. All instances are capable of doing work so failure of any one component does not affect user requests as they are redirected by the load balancer.
- Image Transfer Service – This component also runs on every cell and is stateless. All instances are capable of doing work so failure on any one component will not affect the user request.
Note: VMware vCloud Director cells are stateless. They can be restarted at any time without the risk of data loss. The only caveat is that current requests are interrupted.
VMware vCloud Director has two types of cells:
- Coordinator (or primary) cells
-
- Secondary cells
-
- All cells other than the coordinator cell are secondary cells.
- The secondary cell has these responsibilities:
-
- Report heartbeat to the coordinator cell so it can determine if the secondary cell is alive by updating a table entry in the database
- Periodically check if the coordinator cell is alive
- Listen for messages submitted by the coordinator cell and perform actions based on them (such as which services to start)
If there is a process failure, the secondary cell is restarted by the watch-dog on the respective machine. The coordinator cell considers the time it takes for the watch-dog to restart services on the failed secondary cell. If the secondary cell is deemed to be dead, the coordinator chooses another cell on which to start the service.
Election of a new coordinator cell happens when secondary cells detect that the coordinator is dead. All secondary cells monitor the heartbeat of the coordinator. When the coordinator is detected as dead, secondary cells try to grab the lock for the coordinator. The newly elected coordinator determines if all required services are running and starts any if required.