How to detach disk from unresponsive vm to collect logs
search cancel

How to detach disk from unresponsive vm to collect logs

book

Article ID: 401456

calendar_today

Updated On:

Products

VMware Tanzu Application Service Operations Manager

Issue/Introduction

When a vm is in unresponsive agent state, user cannot ssh into it or communicate with the vm. This is due to Bosh director not being able to communicate with the vm so bosh commands will not work. 

Usually, vms will be recreated by bosh resurrector and all logs will be deleted. 

Environment

Bosh vms in unresponsive agent state. (on vsphere).

Cause

VMs typically go into unresponsive agent state if they lose network connectivity,  run out of resources such as CPU or memory, or if there is a duplicate IP issue. 

Resolution

To capture the logs from the vm in unresponsive state, the following procedure can be used. 

Note: if the bosh resurrector is turned on, the vm will be recreated very fast. To stop vm from recreating it has to be renamed in Vsphere. This will prevent bosh from finding the vm to delete it. 

Or else bosh resurrector can be turned off with bosh update-resurrection off command. 

 

How to detach disk and reattach to another vm


1. Find the vm you want the disk from. If it is in an unresponsive state on bosh, find and rename it in Vsphere before the bosh resurrector starts recreating it. It has to be very fast. Click Actions → Rename. This will make sure the bosh director cannot find this vm guid to recreate it. 

2. Power off the machine in vCenter. 

3. Click Actions on the powered-off machine and select the option to create a clone --> Clone to Virtual Machine. Follow the steps to create a clone.

4. Once the clone creation has completed, detach the disk from the clone vm. Click  Actions --> Edit Settings --> Find Hard disk section. Choose which disk you want to detach. Click on 3-dot menu on the right and select Remove device. Keep a record of disk file name for the next steps.

4. Choose a vm you want the disk to be attached to now. Attach the disk to a chosen test Linux-based virtual machine. Only one VM can be powered on while using this disk at a time. Ensure the original machines remain powered off.

5. SSH to the test machine and run the sudo fdisk -l command to confirm that the disk is recognised:

testvm/e79dfff2-5890-48f0-a1f3-3d39db75dc41:~# sudo fdisk -l

Disk /dev/sda: 5 GiB, 5368709120 bytes, 10485760 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xec6be399

Device     Boot  Start      End  Sectors Size Id Type
/dev/sda1         2048   100351    98304  48M ef EFI (FAT-12/16/32)
/dev/sda2       102400 10485759 10383360   5G 83 Linux


Disk /dev/sdb: 8 GiB, 8589934592 bytes, 16777216 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 6160CF93-6375-4FD4-9E99-CBACF4D634EA

Device       Start      End  Sectors  Size Type
/dev/sdb1     2048  1959935  1957888  956M Linux filesystem
/dev/sdb2  1959936 16775167 14815232  7.1G Linux filesystem


Disk /dev/sdd: 128 GiB, 137438953472 bytes, 268435456 sectors → this is new disk
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: AA65390C-6E7D-4879-93C9-1DC1936C08CD

Device        Start       End   Sectors   Size Type
/dev/sdd1      2048  32745471  32743424  15.6G Linux filesystem
/dev/sdd2  32745472 268433407 235687936 112.4G Linux filesystem


Disk /dev/sdc: 102 MiB, 106954752 bytes, 208896 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 02A81C25-9176-4167-88E0-3CA00CCE965B

Device     Start    End Sectors  Size Type
/dev/sdc1   2048 206847  204800  100M Linux filesystem

6. Check the disk using command lsblk -f /dev/sdd and mount the disk with command sudo mount /dev/DISK_NAME /directory-to-mount-into on the test machine to access the filesystem. Replace "<disk_name>" with the name of the disk gathered from previous command, and specify the directory path where you want disk to be mounted to.

testvm/e79dfff2-5890-48f0-a1f3-3d39db75dc41:~# lsblk -f /dev/sdd
NAME   FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdd                                                                           
├─sdd1 swap   1           a5904f89-3a9f-4b3c-880e-2ccf9128628b                
└─sdd2 ext4   1.0         67a7b81b-8f10-4cac-be7f-af8c6f6f033e                

testvm/e79dfff2-5890-48f0-a1f3-3d39db75dc41:~# sudo mount /dev/sdd2 /test

7. Move the directory and confirm the files can be accessed:

cd /test 
ls -l 

7. Archive needed directory to copy it out.  The following 2 directories are of interest:

  • /sys --> logs of the TAS vm processes
  • /root_log --> vm kernel and syslog logs.

Command to archive those:

tar -czvf my-archive.tar.gz /test/sys /test/root_log


8. SCP the created archive (my-archive.tar.gz) to your machine.