Decommissioning a degraded VIO Controller
search cancel

Decommissioning a degraded VIO Controller

book

Article ID: 321684

calendar_today

Updated On:

Products

VMware Integrated OpenStack

Issue/Introduction

  • Some of the Openstack Services are down and the impacted services are from the same controller node. 
  • In a VMware Integrated Openstack deployment one of the controller node is in unknown / degraded state and deemed to be unrecoverable. 
  • This article outlines the procedure to safely decommission an openstack controller and get the services running on the other controllers in the cluster.

 

Environment

7.x

Cause

There can be or many reasons for a controller to be come degraded or be in unknown state. While in most cases the controllers can be recoverable, this article specifically focusses on the scenario where the controller is deemed to be unrecoverable.

Resolution

  1. Identify the degraded controller
kubectl get po
kubectl get po -A -owide|grep "controller-name"
  1. Scale Out a New Controller
    • Log in to the Integrated OpenStack Manager web interface as the admin user.
    • In OpenStack Deployment, click the name of your deployment and open the Nodes tab.
    • Click Scale Out Controller Node.
  1. Unschedule the degraded controller so that no new pods are created.
kubectl cordon <controller-name>
  1. Safely evict the pods / container services running in the degraded node to the other nodes
kubectl drain <controller name>
  1. Validate if there are any services running or in error state in the degraded node and delete the pod so and it will be recreated on the other controllers.
kubectl get po -A -owide |grep <controller-name">
  1. Delete the degraded controller

    osctl delete --force --grace-period=0 po <pod name>

  2. Change count in viomachineset from 4 to 3

osctl get viomachineset
osctl edit viomachineset controller1

  1. Delete kubernetes node

kubectl get node
kubectl delete node controller-name

  1. Delete machine CR for the disconnected controller.

osctl get machine
osctl delete machine <controller-name>