Gathering Logs for vSphere with Tanzu
search cancel

Gathering Logs for vSphere with Tanzu

book

Article ID: 345464

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

There are two components to gathering logs in vSphere with Tanzu

1. Workload Management Logs: Contains logs for the Supervisor Control Plane VM's/Supervisor Cluster.

2. Guest Cluster (TKG) Logs: Contains all of the logs for a specific guest cluster.

Environment

VMware vSphere 8.0 with Tanzu
VMware vSphere 7.0 with Tanzu

Resolution

Gather Workload Management Support Bundle
Workload Management support bundles can be retrieved by logging into the VC UI and selecting Menu -> Workload Platform -> Clusters -> Export Logs, with the appropriate cluster selected.
-This works even if the cluster is stuck in a removing, configuring or updating state.
-This includes a vCenter log bundle.
-This does not include esxi logs. If the issue pertains to vSpherePods or an issue with the Guest Cluster VM's themselves, customers should gather esxi logs additionally to upload to their support ticket. 

 

-If the log bundle from the GUI does not work, you can manually gather the logs from the command line by following this kb to ssh into each Supervisor Control Plane and run the following command to gather logs for JUST the Supervisor Control Plane VM that the command was ran on. Then you will need to manually scp the files off of the machine.

root@42184a1e6d3c54eff2384b2736cf2079 [ /usr/bin ]# wcp-agent-support.sh



Gather Guest Cluster(VKS) Support Bundle

This bundle is gathered via a cli tool attached to this kb. This is supported only on MacOS and Linux jumpbox's. 
 

Prerequisites:

1. A linux or macOS jumpbox to run the tool from. If you are a windows-only shop you can use the vCenter or a Supervisor Control Plane VM as your jumpbox to run the bundler from. 

Note: To run kubectl commands on vCenter you can pull it from the supervisor cluster by running this command from vCenter as root: 
# curl -k https://$(/usr/lib/vmware-wcp/decryptK8Pwd.py | grep IP -m 1 | awk '{print $2}')/wcp/plugin/linux-amd64/vsphere-plugin.zip -o /tmp/vsphere-plugin.zip && unzip -d /usr /tmp/vsphere-plugin.zip


2. The supervisor cluster kubeconfig file present on the system from which the vks-support-bundler command will be run. This can either be copied from another system or generated via running the kubectl vsphere login command.


3. Your current Kubernetes context must be set to the supervisor cluster.

4. When user chooses Guestops channel to gather log, the user must be a member of the Administrator group, as generating the support bundle requires permissions to create users and roles, and to add users to the ServiceProvider group.

5. For cluster with Windows nodes, if users collect logs via ssh channel, they don't need to prepare the admin username and password before log collection. If users collect logs via guestops channel, they must prepare the admin username and password before log collection. There are two ways to prepare it:

    Added before the BYOI (Bring Your Own Image): This is the only feasible approach if the customer wants to collect logs when the node network does not work. 
    Added through ssh:
        Use the script named set_windows_adminuser.sh under the attachfile.  (eg: ./set_windows_adminuser.sh {cluster-name} {cluster-namespace} {windows-admin-user})
            The environment must have a kubeconfig file with admin permissions that allows access to the Supervisor Cluster.
            This script should be executed on a machine that has access to the same subnet as the guest cluster.
            This script will SSH into the VMs to add a new admin username and password.

Gather Windows logs via Guestops channel - 
Below part is specific to a case of Guestops + Windows cluster + Create windows user in advance + VPC environment

  • Deploy PodVM with cluster namespace
    Note:
    Please replace <cluster-namespace> and <clustername> with actual cluster namespace and cluster name
    If the current PodVM image is not pullable, please update it to a customer-pullable photon:3.0 image.

    podvm.yaml
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: vks-support-bundler-sa
    namespace: <cluster-namespace>
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: vks-support-bundler-rolebinding
    namespace: <cluster-namespace>
    subjects:
    - kind: ServiceAccount
    name: vks-support-bundler-sa
    namespace: <cluster-namespace>
    roleRef:
    kind: ClusterRole
    name: view
    apiGroup: rbac.authorization.k8s.io
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
    name: vks-support-bundler-secret-role
    namespace: <cluster-namespace>
    rules:
    - apiGroups: [""]
    resources: ["secrets"]
    resourceNames: ["<clustername>-ssh"]
    verbs: ["get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: vks-support-bundler-secret-rolebinding
    namespace: <cluster-namespace>
    subjects:
    - kind: ServiceAccount
    name: vks-support-bundler-sa
    namespace: <cluster-namespace>
    roleRef:
    kind: Role
    name: vks-support-bundler-secret-role
    apiGroup: rbac.authorization.k8s.io
    ---
    apiVersion: v1
    kind: Pod
    metadata:
    name: podvm-1
    namespace: <cluster-namespace>
    spec:
    serviceAccountName: vks-support-bundler-sa
    containers:
    - image: photon:3.0
     name: vpc-traffic-podvm-1
     securityContext:
       runAsUser: 0
     command: [ "/bin/bash", "-c", "--" ]
     args:
       - |
         rm -f /etc/yum.repos.d/photon-updates.repo /etc/yum.repos.d/photon-extras.repo
         yum install -y jq openssh-server
         curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
         chmod +x kubectl
         mv kubectl /usr/bin/kubectl
         while true; do sleep 30; done
    • then, apply the yaml file above 
      kubectl apply -f podvm.yaml
  • Run set_windows_adminuser.sh in PodVM
    • Copy set_windows_adminuser.sh script into podvm
      kubectl cp set_windows_adminuser.sh -n vks-support podvm-1:/tmp/set_windows_adminuser.sh
    • kubectl exec into podvm:
      kubectl exec -it podvm-1 -n <cluster-namespace> -- /bin/bash
    • Run set_windows_adminuser.sh in podVM
      chmod +x ./tmp/set_windows_adminuser.sh
      ./tmp/set_windows_adminuser.sh <cluster-name> <cluster-namespace> <windows-admin-user>
  • Clean up resources
    kubectl delete -f podvm.yaml
  • Run vks-support-bundler cli
    After windows username and password are set successfully, you can run vks-support-bundler cli
    vks-support-bundler create \
    -k <supervisor-kubeconfig> \
    -o <output-dir> \
    -c <cluster-name> \
    -n <cluster-namespace> \
    -v <vc-ip> \
    -w <windows-admin-username> \
    -u <vc-username> \
    -i


6. When choosing the SSH channel to collect logs, users are required to run VKS Support Bundler cli in the same subnets with guest clusters as long as nodes are ping-reachable. For VPC environment, nodes can't be directly SSHed even if users are in same subnets with cluster. Users must run vks-support-bundler from podVM. For an example PodVM file, please check the Additional Information section below.

Flags available for vks-support-bundler:

./vks-support-bundler help create
Create a Kubernetes cluster support bundle

Usage:
  vks-support-bundler create [flags]

Flags:
  -b, --batch-size int                   Number of nodes on which parallel collection is triggered (default 5)
      --ca-certificate string            Path to the endpoint public certificate file
      --channel string                   Communication protocol to use for support bundle collection (guestops, ssh) (default "guestops")
  -c, --cluster string                   Kubernetes cluster to collect support-bundle for
      --config string                    Path to the YAML config file
      --controlplane-node-only           To collect support bundle only from control plane nodes
  -h, --help                             help for create
  -i, --insecure                         Creates an insecure connection to the VC
-k, --kubeconfig string                Absolute path to the kubeconfig (default "/home/<username>/.kube/config")
  -l, --log-ns string                    Comma separated namespaces list whose logs should be included
  -n, --namespace string                 Supervisor Cluster namespace where the Kubernetes cluster resides
  -s, --node-stats                       To include the node stats in the support bundle
  -o, --output string                    Absolute path to the directory where the support-bundle will be stored, e.g. /home/myuser/mybundle
  -p, --progress-bar                     To progress-bar for support-bundle collection per node
  -t, --resource-types string            Comma separated list of Kubernetes resource types (e.g. pvc,pv)
      --skip-create-user                 Use the provided user to run GuestOps without creating a temporary user
  -u, --user string                      VC User name
  -v, --vc string                        VC IP or FQDN with optional port (default: 443 for HTTPS).
  -V, --verbose                          Collect additional logs to help debug support-bundle collection failures and print to stderr
  -w, --windows-admin-username string    Windows vm admin username
  -e, --windows-event-hours-ago string   Specify the collection of windows events within a few hours (default "12")


Guestops channel - 

Required flags:

-c, --cluster string Kubernetes cluster to collect support-bundle for
-n, --namespace string Supervisor Cluster namespace where the Kubernetes cluster resides
-o, --output string Absolute path to the directory where the support-bundle will be stored, e.g. /home/myuser/mybundle
-u, --user string VC User name
-v, --vc string VC IP:. By default, 443 is considered as the https port
-w, --windows-admin-username string Windows vm admin username  ## This should only be used in a cluster that has Windows nodes.  


Example for default (guestops) support bundle where
- .kube/config file lives under ~/.kube/config and has its context set to the supervisor cluster
- 192.0.2.15 is the vCenter ip address
- Admin user is Administrator and the VMware SSO domain is vsphere.local 
- Guest cluster name is guestcluster01
- Supervisor Cluster Namespace where the Guest Cluster lives is supcluster01
- Output of the log bundle would be the user's home directory which is ~/

./vks-support-bundler create -k ~/.kube/config -v 192.0.2.15 -u [email protected] -c guestcluster01 -n supcluster01 -o ~/ -i true -p

 

SSH channel -

Required flags:

-c, --cluster string Kubernetes cluster to collect support-bundle for
-n, --namespace string Supervisor Cluster namespace where the Kubernetes cluster resides
-o, --output string Absolute path to the directory where the support-bundle will be stored, e.g. /home/myuser/mybundle

Example for support bundle collection via ssh channel
- .kube/config file lives under ~/.kube/config and has its context set to the supervisor cluster
- ssh represents support bundle collection via ssh channel 
- Guest cluster name is guestcluster01
- Supervisor Cluster Namespace where the Guest Cluster lives is supcluster01
- Output of the log bundle would be the user's home directory which is ~/

./vks-support-bundler create -k ~/.kube/config -c guestcluster01 -n supcluster01 --channel ssh -o


If there is already a service account named "vks-support-bundler-user-{cluster-name}-{cluster-namespace}" or permissions associated with it (role name:  "vks-support-bundler-guestops-role-{cluster-name}-{cluster-namespace}" ), the log bundle will fail. Therefore, users must ensure that they clean up this service account and the related role to enable logging collection.
There are two methods to delete them:
1. Automatic Deletion: After executing a binary file, the system will prompt for automatic deletion.
2. Manual Deletion:

To delete a role: navigate through the VC UI to Administration -> Roles, find the specific role: and then click the delete button.
To delete a user account, navigate through the VC UI to Administration -> Single Sign-On -> Users and Groups, find the specific user account, and then click the delete button.


Note:

  • When the log bundler fails, it will generate a .log file in the output directory with more details on why it failed. 
  • If the bundler finishes very quickly and only has a small log tar file, its likely that the vmware-system-user account is expired.
    Follow this kb to resolve this issue: https://knowledge.broadcom.com/external/article?legacyId=90469  
  • If automatic deletion of service accounts fails, users should manually clean up both the role and the user account from vCenter to ensure proper cleanup.
  • If users want to collect additional namespaces and resources types, they need to specify the following two flags:
    • --resource-types:
      Allows users to specify additional Kubernetes resource types (e.g., secret, volume attachment) to collect, which are not gathered by default.
      Example: --resource-types=secret,volumeattachment
    • --log-ns:
      Specifies additional Kubernetes namespaces from which detailed information should be collected.
      Useful when combined with --kubectl-commands.
      Example: --log-ns=local-path-storage,default
  • The domain of -u parameter is required to create the GuestOps user.  Therefore, it must be provided in the format of <username>@<domain>. If it is not in this format, an error will be reported. 
    • The domain part should match the vCenter admin domain, which can be found in VC → Administration → Single Sign-On → Users.

Additional Information

The version of vks-support-bundler 3.6.0 has following changes:

Flag change:

  • Added --channel flag to allow collecting logs via guestops or ssh channel

Some improvement:

  • Support log collection via ssh channel which doesn't need VC password when running vks-support-bundler

Added extra data:

Node status:

pvc info

sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pvc --chunk-size=10 -A -o wide -v 9 &> kubectl-pvc.out

pv info

sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pv --chunk-size=10 -A -o wide -v 9 &> kubectl-pv.out

NFS network statistics

sar -n NFS 1 5 > sarnfs1-5.out

system mount information

cat /etc/fstab > etc-fstab.out

all mounted file systems

findmnt >/dev/null 2>&1 && findmnt > findmnt.out 

mount statistics

cat /proc/self/mountstats > mountstats.out 2>&1

IO stats per device

iostat -N 1 5 > iostat-N.out

NFS statistics

nfsstat 1 5 > nfsstat.out 

 

pvc info

sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pvc --chunk-size=10 -A -o wide -v 9 &> kubectl-pvc.out

pv info

sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pv --chunk-size=10 -A -o wide -v 9 &> kubectl-pv.out

NFS network statistics

sar -n NFS 1 5 > sarnfs1-5.out

system mount information

cat /etc/fstab > etc-fstab.out

all mounted file systems

findmnt >/dev/null 2>&1 && findmnt > findmnt.out 

mount statistics

cat /proc/self/mountstats > mountstats.out 2>&1

IO stats per device

iostat -N 1 5 > iostat-N.out

NFS statistics

nfsstat 1 5 > nfsstat.out 

Tuned log

sudo journalctl -xeu tuned &> journalctl-tuned.out

Tuned log

sudo cat /var/log/tuned/tuned.log &> tuned-log.out

Guestinfo metadata

vmtoolsd --cmd "info-get guestinfo.metadata" | base64 -d | gunzip > guestinfo-metadata.out

K8s objects 

ippools 

sudo kubectl get ippools --chunk-size=500 -o yaml --kubeconfig ${KUBECONFIG} >> ippools.yaml 

network-attachment-definition

sudo kubectl get network-attachment-definitions -A --chunk-size=500 -o yaml --kubeconfig ${KUBECONFIG} >> network-attachment-definitions-all-namespaces.yaml

 

Example VPC PodVM -

Note:

If the current PodVM image is not pullable, please update it to a customer-pullable photon:3.0 image.
The value of memory/cpu/storage in resources is example and it works for collecting support bundle from cluster with 3 control plane nodes and 150 worker nodes.

apiVersion: v1
kind: Pod
metadata:
  name: vks-support-bundler-podvm
  namespace: <cluster-namespace>
spec:
  containers:
    - image: "photon:3.0"
      name: vpc-traffic-podvm-1
      securityContext:
        runAsUser: 0
      resources:
        requests:
          memory: 2Gi
          cpu: 500m
        limits:
          memory: 4Gi
          cpu: 2
      volumeMounts:
        - name: support-bundler-storage
          mountPath: /data
      command: ["/bin/bash", "-c", "--"]
      args:
        - |
          while true; do sleep 30; done

  volumes:
    - name: support-bundler-storage
      persistentVolumeClaim:
        claimName: vks-support-bundler-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vks-support-bundler-pvc
  namespace: <cluster-namespace>
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: <storageclass-name>

 

Run vks-support-bundler in VPC PodVM -
- Copy vks-support-bundler and supervisor kubeconfig into podvm

kubectl cp vks-support-bundler -n <cluster-namespace> vks-support-bundler-podvm:/tmp
kubectl cp <sv-kubeconfig> -n <cluster-namespace> vks-support-bundler-podvm:/tmp

- kubectl exec into podvm:

k exec -it -n <cluster-namespace> vks-support-bundler-podvm -- /bin/sh
chmod +x vks-support-bundler

------------------------------------------------------------------------------------------------------------------------------

The version of vks-support-bundler 3.5.0 has following changes:

Flag change:

  • Changed --kubectl-commands flag to --resource-types
  • Added --ca-certificate flag 
  • Added --skip-create-user flag to optionally skip the user creation step
  • Added --controlplane-node-only flag to allow collecting logs only from control plane nodes

Some improvement:

  • Support bundler output file is now gzip-compressed, reducing file size by 70% to 90%

Add extra data:

  • Added collection of additional system data to help diagnose slow boot issues
  • Added kubeadm-certs-check-expiration output to check certificate expiration times
  • Added collection of system logs related to systemd-networkd for improved network diagnostics

------------------------------------------------------------------------------------------------------------------------------

The version of vks-support-bundler 3.4.0 has following changes:

New Features - 

  • Added --verbose flag to show more detailed information during the collection process.
  • All requests and responses to/from VC are now dumped under the ~/.vsb directory for debugging purposes.
  • Etcd metrics collection
  • Collection of Include number of resources per APIResource
  • Collection of PodDisruptionBudget (PDB) resources in the default namespaces

Bug Fixes - 

  • Fixed the issue where the domain name was incorrectly assumed as vsphere.local when VC FQDN was missing from DNS.
  • Fixed collection failure caused by mismatch between guestops domain and VC domain.
  • Fixed cases that exists guestops users could always not be deleted.
  • Fixed failure in collecting etcd status.
  • Fixed issue with collecting dmesg logs.

CLI Flag Changes -

  • Removed:
    • -d, --domain-name string VC Domain name (default "vsphere.local")
      We noticed that the guestops’s domain must same with the vc admin domain(can be found in VC -> Administration -> Single Sign On -> Users) 
      If they don’t match two problem will occurred:

      • The guestops user must be created in the same domain as the VCadmin user. If the -d domain and the VCadmin domain don’t match, the guestops user won’t be able to login and  perform any actions.
      • When the guestops user is created, permissions are granted to the domain specified by -d. If the user specifies the wrong domain once and then specifies the correct domain again, the guestops user will never be automatically deleted because the permission domain is incorrect.
    • To simplify this, we expect the -u parameter to include the VC admin username with domain (like username@domain). We then extract the domain from -u and use it to create the guestops user, ensuring both users share the same domain. 
      Therefore, we require -u to be followed by <username>@<domain>. If it is not in this format, an error will be reported. 
  • Added:
    • -V, --verbose  Collect additional logs to help debug support-bundle collection failures and print detailed info to stderr. When enabled, it shows more detailed information during the collection process.

------------------------------------------------------------------------------------------------------------------------------

Attachments

set_windows_administrator.sh get_app
vks-support-bundler-linux-amd64-3.6.0.tar.gz get_app
vks-support-bundler-windows-amd64-3.6.0.tar.gz get_app
vks-support-bundler-darwin-arm64-3.6.0.tar.gz get_app