Datastore usage and total size of PVs on the datastore showing different utilisation
search cancel

Datastore usage and total size of PVs on the datastore showing different utilisation

book

Article ID: 379915

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

The Datastore space utilisation KB shows that X amount of space is consumed on the datastore.

However from a kubernetes perspective, the total sum of the persistent volumes is Y.

This KB will help identify the source of the discrepancy.

Environment

Tanzu Kubernetes cluster with CSI PVs backed by NFS datastore

Cause

Check backup solution

If there are any backup solutions, confirm they are working as expected and that there are no stale backups.

For velero backup

velero backup get

velero backup describe <backup name>

velero backup logs <backup name>

 

Check for any stale CSI snapshots

kubectl get volumesnapshot -A

kubectl get volumesnapshotcontent -A

 

 

Check disk usage on ESX Host

Identify the datastore usage and UUID, this can be retrieved from vSphere UI or using esxcli or esxcfg-info

esxcli storage filesystem list

Mount Point                      Volume Name         UUID                                 Mounted  Type    Size            Free
-------------------------------  ------------------  -----------------------------------  -------  ------  --------------  ----

/vmfs/volumes/abcs1234-abcd1234     datastore_1      abcd1234-abcd1234                    true     NFS     38482906972160  31134093824000

 

For this datastore, Usage = Size - Free 38482906972160 - 31134093824000 => 7348813148160

Converted to TB:7348813148160/(1024 * 1024 * 1024 * 1024) => 6.68 TB

 

esxcfg-info -a | grep -A15 <UUID>

               |----Volume UUID.....................................abcd1234-abcd1234
               |----Volume Name.....................................datastore_1
               |----LVM Name........................................10.##.##.## /datastore_1
               |----Type............................................NFS
               |----Head Extent.....................................nfs:abcd1234-abcd1234
               |----Console Path..................................../vmfs/volumes/abcd1234-abcd1234
               |----Block Size......................................4096
               |----Total Blocks....................................9395240960
               |----Logical Disk Block Size.........................512
               |----Physical Disk Block Size........................512
               |----isSw512e........................................false
               |----Blocks Used.....................................1793955619
               |----Size............................................38482906972160
               |----Usage...........................................7348042215424

 

 

Check disk usage for datastore directory on ESX Host 

Check disk usage

du -sh /vmfs/volumes/<UUID>/

du -sh /vmfs/volumes/abcd1234-abcd1234/

3.6T    /vmfs/volumes/abcd1234-abcd1234/

 

Check size of all files on datastore

ls -Risla  /vmfs/volumes/abcd1234-abcd1234/ > ls-datastore.txt

cat ls-datastore.txt | grep total | awk '{print $2}' | awk '{ sum += $1 } END { print sum }'

This returns 3908174964 KB in this example which is 3.6 TB.

 

Check free space 

df -h

Filesystem   Size    Used Available Use% Mounted on

NFS         35.0T    6.7T     28.3T  19% /vmfs/volumes/datastore_1

 

In this example, the disk usage and free space reported by the NFS server are not consistent.

 

 

Check datastore usage reported by NFS server

On ESX Host, capture tcpdump on the NFS port and vmknic through which the NFS Server is connected and
run df -h. 

tcpdump-uw -i <vmknic> port 2049 -w /vmfs/volumes/df.pcap

 

Analyse the packet capture using the steps below.

Select any directory from the datastore, see ls-datastore.txt above for full list.

/vmfs/volumes/abcd1234-abcd1234/vm-0079a88c-e317-440a-bf4e-01c28b01b152 is selected in this example

 

Using the directory name from above, identify the file handle hash for the datastore. 

tshark -2 -Tfields -e frame.number -e frame.time_relative -e rpc.xid -e nfs.name -e nfs.fh.hash -Y >'rpc.procedure == 3 && rpc.msgtyp == 0' -r df.pcap  | grep vm-0079a88c-e317-440a-bf4e-01c28b01b152

45460    22.057884000    0x35164b64    vm-0079a88c-e317-440a-bf4e-01c28b01b152    0xc4045aca

 

The file handle hash for the datastore is 0xc4045aca, use this file handle hash to get the xid in FSSTAT
tshark -2 -Tfields -e frame.number -e frame.time_relative -e rpc.xid  -Y 'rpc.procedure == 18 && >rpc.msgtyp == 0 && nfs.fh.hash == 0xc4045aca' -r df.pcap
57824    28.540564000    0x35165c9

The xid is 0x35165c9, use this to filter FSSTAT reply. 
tshark -2 -Tfields -e frame.number -e frame.time_relative -e rpc.xid -e nfs.fsstat3_resok.tbytes -e >nfs.fsstat3_resok.fbytes  -Y 'rpc.procedure == 18 && rpc.msgtyp == 1 && rpc.xid == 0x35165c96' -r df.pcap
57825    28.540948000    0x35165c96    38482906972160    31118049619968  

Total bytes 38482906972160  (35TB)
Free Bytes  31118049619968  (~28.30 TB)

 

In this example, the NFS server is reporting inaccurate free space.

Resolution

Once the source of the discrepancy is identified, engage the appropriate team for further assistance.