Note that the drift detector is an experimental feature and as the drift is complicated, the detector is doing its best effort to find the drift, it may not cover all kinds of cases and should only be used as a reference.
Mismatch between the resources recorded in the backup and the actual state of the infrastructure of the Tanzu Kubernetes Grid management cluster.
VMware Tanzu Kubernetes Grid (TKG) is a product for managing the lifecycle of Kubernetes clusters.
Since version v2.1.0, a solution has been provided to TKG customers for performing backup and restore to cluster objects on a management cluster, such that in the case of a disaster which causes the management cluster to be unavailable but the workload clusters remain accessible, the user can provision a new management cluster instance, restore the cluster objects and continue managing the existing workload clusters via the new instance. For more details, please refer to Back Up and Restore Management and Workload Cluster Infrastructure on vSphere
In the context of this solution, "drift" refers to a situation where there is a mismatch between the resources recorded in the backup and the actual state of the infrastructure. This mismatch can lead to problems during the restoration process. To gain a better understanding of handling drift, please refer to the "Handling Drift" section in the doc to understand more details: Handling Drift
Download and unzip the drift-detector-v0.2.0.zip attached to this KB on the section "Attachments". The file contains binaries for Linux, MacOS, and Windows that you can use as shell commands without any installation process.
Use the drift detector tool before performing the restoration by following the steps:
Download the backup tarball either from the backup store portal directly or use the Velero CLI:
velero backup download <backup-name>
All the available options of the drift-detector command are as follows:
drift-detector detect -h Detect the drifts between the backup and infrastructure Usage: drift-detector detect [flags] Flags: --backup string The local path of the backup tarball file. Required --format string The report format. One of: (json) (default "json") -h, --help help for detect --ignore-healthy-resources Ignore the healthy resources in the report --insecure-skip-verify Skip the verification of an infra server’s certificate during a connection. -o, --output string Report output file. Required --skip-access-apiserver Specify whether skip accessing the API servers of workload clusters during the detection Global Flags: -D, --debug Enable debug mode
The "--backup" option is required, it is used to set the local file system path of the backup tarball.
If the management cluster manages lots of workload clusters, the output of the detector will contain lots of information which is hard to locate the drift resources in the output. Users can set the "--ignore-healthy-resources" option to set the output contain only the drift resources.
Connecting to the API server of the workload clusters is helpful, but not required. If the API servers of the workload clusters are not accessible, set "--skip-access-apiserver" option to skip it.
Run the drift detector command:
drift-detector detect --backup my-backup-data.tar.gz --insecure-skip-verify -o report.json
The output is as follows:
The command output has three main parts that describe how the Kubernetes objects in the backup match the VMs and other infrastructure resources:
The Machine listings have four possible statuses:
ControlPlane, Workers, Cluster, and overall Summary listings have four possible statuses:
Follow the guide to remediate all the ghost machines after performing the restoration