"opening storage failed: open <device> no space left on device" error when starting a pod on a cluster deployed by Container Service Extension.
search cancel

"opening storage failed: open <device> no space left on device" error when starting a pod on a cluster deployed by Container Service Extension.

book

Article ID: 325562

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • You cannot start a Pod on a K8S Cluster deployed by CSE
  • Checking the Pod logs you see errors relating to disk usage.
kubectl logs <POD_NAME>

ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open <device>: no space left on device"
  • Inspection of the Pod shows a full disk.
$ kubectl exec <POD_NAME> -- df -h
 
Filesystem      Size  Used Avail Use% Mounted on
overlay          19G   13G  5.3G  71% /
tmpfs            64M     0   64M   0% /dev
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/sda4        19G   13G  5.3G  71% /tmp
/dev/sdb         10G   10G    0M 100% /mnt
shm              64M     0   64M   0% /dev/shm
tmpfs            32G   12K   32G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            16G     0   16G   0% /proc/acpi
tmpfs            16G     0   16G   0% /proc/scsi
tmpfs            16G     0   16G   0% /sys/firmware



Environment

VMware Cloud Director 10.x

Cause

This issue occurs because the default size by which the PVCs are created may not be adequate for the long-term storage needs of the service.

Resolution

This is a known issue impacting the cluster created using Container Service Extension (CSE) up to and including version 4.2, and the Container Storage Interface (CSI) driver up to and including version 1.5.

As the current version of the CSI driver currently does not support resizing of PVCs while attached to a Pod, see the workaround section below.

Workaround:
NOTE:
Before proceeding you will need a KUBECONFIG file to access the affected cluster and VCD credentials for the cluster author.
See Manage Clusters for more information regarding obtaining a copy of the current Kube Config.

The high-level process includes four phases:

  1. Shutdown the affected Pod.
  2. Increase the size of the volume in VCD.
  3. Use a temporary Pod to resize the filesystem on the volume.
  4. Restart the affected Pod.


Identify which Deployment or StatefulSet controls the affected Pod. You will use this resource to control the affected Pod.
 

Shutdown the affected workloads

The Pod must be shutdown before making any changes.

If the controlling resource is managed by kapp-controller, the PackageInstall object must be paused or changes to the resource will be automatically overwritten.

$ kctrl package installed list
$ kctrl package installed pause -i <PACKAGE_INSTALL_NAME>


The Pods for the resource may now be terminated by scaling it down to zero replicas.

$ kubectl get <RESOURCE_TYPE>/<RESOURCE_NAME>
# Record the desired number of replicas for the resource
$ kubectl scale <RESOURCE_TYPE>/<RESOURCE_NAME> --replicas=0


Use kubectl to retrieve information about the affected PVC.

$ kubectl get pvc -o=custom-columns=NAME:.metadata.name,VOLUME:.spec.volumeName
 
NAME                           VOLUME
data-kafka-controller-0        pvc-4556b190-a4f7-####-####-########47b
data-postgres-postgresql-0     pvc-ab50999f-06ac-####-####-########630
grafana-pvc                    pvc-9736b721-c2c7-####-####-########33a
minio                          pvc-cd616d40-25d7-####-####-########762


Record the NAME and VOLUME of the affected PVC for later steps.
 

Increase the volume size

Note:
This process may need to be repeated multiple times if there are multiple replicas in the StatefulSet.
Repeat this process for each Named Disk before continuing to the next step.

This process will use the VCD UI to resize the Named Disk associated with the PVC.

  1. Login to the VCD Tenant UI as the cluster author
  2. Browse to the Organization VDC hosting the CSE cluster.
  3. Click on Storage -> Named Disks.
  4. Filter the list of the Named Disks using the PVC Volume identified earlier.
  5. Select the disk and click Edit.
  6. Enter a new size for the Named Disk that will satisfy your requirements.
  7. Click Save.
  8. Wait for the associated resize task to finish.


The underlying volume for the PVC has now been increased in size, but the filesystem has not been expanded.
 

Resize the filesystem

Note:
This process may need to be repeated multiple times if there are multiple replicas in the StatefulSet.
Repeat this process for each PVC before continuing to the next step.

This process will use a temporary Pod to mount the PVC so you may resize the filesystem to consume the expanded capacity.

kubectl run -it --attach --rm reformat --overrides='
{
"spec": {
    "containers": [
      {
        "name": "reformat",
        "image": "ubuntu:14.04",
        "args": [
          "bash"
        ],
        "stdin": true,
        "stdinOnce": true,
        "tty": true,
        "securityContext": {
          "privileged": true
        },
        "volumeMounts": [{
          "mountPath": "/mnt",
          "name": "data"
        }]
      }
    ],
    "volumes": [{
      "name":"data",
      "persistentVolumeClaim":{
        "claimName": "<PVC_NAME>"
      }
    }]
  }
}
'  --image=ubuntu:14.04


The prompt will pause while the Pod is scheduled and started.
Click Enter a couple of times if you think it is ready but don’t see a command prompt.

Run df /mnt to identify the device associated with the mounted PVC. Record the value of Filesystem.

$ df /mnt
Filesystem     1K-blocks    Used               Available Use% Mounted on
/dev/sdc         5074592        4796064         0                  100% /mnt


Run resize2fs to resize the filesystem to consume the expanded capacity

$ resize2fs <FILESYSTEM>

Exit the shell. The Pod will be removed.

$ exit


The filesystem on the PVC has now been updated to consume the expanded capacity of the underlying volume.
 

Restart the affected workloads

If the controlling resource is managed by kapp-controller, then you can unpause the PackageInstall.
The package will reconcile and update the resource to the desired number of replicas.

$ kctrl package install kick -i <PACKAGE_INSTALL_NAME>


Otherwise, use kubectl to scale the resource back to the initial number of replicas.

$ kubectl scale <RESOURCE_TYPE>/<RESOURCE_NAME> --replicas=<COUNT>


Monitor the Pod status to ensure they start.
Restart the troubleshooting process if they continue to fail.

Additional Information

Impact/Risks:
Any operation impacting persistent storage should be tested before it is used in production. Improper steps may lead to the loss of production data.