Patching VCSA from 8.x is failing with "Installation of one or more containers failed"
search cancel

Patching VCSA from 8.x is failing with "Installation of one or more containers failed"

book

Article ID: 344748

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

To address the patch failures that error out with "Patching containers failed" while patching vCenter from 8.x.

vmon.log :

YYYY-MM-DDT03:55:42.227Z In(05) host-2065 <vc-ws1a-broker-prestart> Constructed command: /storage/containers/vc-ws1a-broker/f6b616b765efe15183f0e49383402ac4cddf90c9cb649a17192637ba38825517/prestart.sh
YYYY-MM-DDT03:55:42.295Z Wa(03) host-2065 <vc-ws1a-broker> Service pre-start command's stderr: umount: /storage/containers/vc-ws1a-broker/f6b616b765efe15183f0e49383402ac4cddf90c9cb649a17192637ba38825517/rootfs: no mount point specified.
YYYY-MM-DDT03:55:42.295Z Wa(03)+ host-2065
YYYY-MM-DDT03:55:42.296Z Wa(03) host-2065 <vc-ws1a-broker> Service pre-start command's stderr: /storage/containers/vc-ws1a-broker/f6b616b765efe15183f0e49383402ac4cddf90c9cb649a17192637ba38825517/prestart.sh: line 4: /storage/containers/vc-ws1a-broker/f6b616b765efe15183f0e49383402ac4cddf90c9cb649a17192637ba38825517/mount.sh: No such file or directory
YYYY-MM-DDT03:55:42.296Z Wa(03)+ host-2065
YYYY-MM-DDT03:55:42.758Z Wa(03) host-2065 <vc-ws1a-broker> Service pre-start command's stderr: YYYY-MM-DD 03:55:42 MainThread INFO Executing service-wrapper for vc-ws1a-broker

update_microservice.log:

YYYY-MM-DDT02:47:32.929735984Z" level=info msg="loading plugin \\"io.containerd.runtime.v2.task\\"..." type=io.containerd.runtime.v2\ntime="YYYY-MM-DDT02:47:32.929854096Z" level=info msg="loading plugin \\"io.containerd.monitor.v1.cgroups\\"..." type=io.containerd.monitor.v1\ntime="YYYY-MM-DDT02:47:32.933280348Z" level=info msg="loading plugin \\"io.containerd.service.v1.tasks-service\\"..." type=io.containerd.service.v1\ntime="YYYY-MM-DDT02:47:32.933325767Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.introspection\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.933351504Z" level=info msg="loading plugin \\"io.containerd.internal.v1.restart\\"..." type=io.containerd.internal.v1\ntime="YYYY-MM-DDT02:47:32.936097144Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.containers\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936128882Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.content\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936151471Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.diff\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936173185Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.events\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936197628Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.healthcheck\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936219538Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.images\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936240179Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.leases\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936261422Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.namespaces\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.936285949Z" level=info msg="loading plugin \\"io.containerd.internal.v1.opt\\"..." type=io.containerd.internal.v1\ntime="YYYY-MM-DDT02:47:32.937109183Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.snapshots\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.937137640Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.tasks\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.937162762Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.version\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.937245686Z" level=info msg="loading plugin \\"io.containerd.tracing.processor.v1.otlp\\"..." type=io.containerd.tracing.processor.v1\ntime="YYYY-MM-DDT02:47:32.937278297Z" level=info msg="skip loading plugin \\"io.containerd.tracing.processor.v1.otlp\\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1\ntime="YYYY-MM-DDT02:47:32.937297922Z" level=info msg="loading plugin \\"io.containerd.internal.v1.tracing\\"..." type=io.containerd.internal.v1\ntime="YYYY-MM-DDT02:47:32.937328307Z" level=error msg="failed to initialize a tracing processor \\"otlp\\"" error="no OpenTelemetry endpoint: skip plugin"\ntime="YYYY-MM-DDT02:47:32.937403788Z" level=info msg="loading plugin \\"io.containerd.grpc.v1.cri\\"..." type=io.containerd.grpc.v1\ntime="YYYY-MM-DDT02:47:32.937678427Z" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} Runtimes:map[runc:{Type:io.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:true SelinuxCategoryRange:1024 SandboxImage:k8s.gcr.io/pause:3.6 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"\ntime="YYYY-MM-DDT02:47:32.937786665Z" level=info msg="Connect containerd service"\ntime="YYYY-MM-DDT02:47:32.937956147Z" level=info msg="Get image filesystem path \\"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\\""\ntime="YYYY-MM-DDT02:47:32.938085460Z" level=warning msg="Selinux is not supported"\ntime="YYYY-MM-DDT02:47:32.939966845Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"\ntime="YYYY-MM-DDT02:47:32.942669186Z" level=info msg="Start subscribing containerd event"\ntime="YYYY-MM-DDT02:47:32.942822253Z" level=info msg="Start recovering state"\ntime="YYYY-MM-DDT02:47:32.942917315Z" level=info msg="Start event monitor"\ntime="YYYY-MM-DDT02:47:32.942945852Z" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc\ntime="YYYY-MM-DDT02:47:32.942945541Z" level=info msg="Start snapshots syncer"\ntime="YYYY-MM-DDT02:47:32.943033144Z" level=info msg="Start cni network conf syncer for default"\ntime="YYYY-MM-DDT02:47:32.943088058Z" level=info msg="Start streaming server"\ntime="YYYY-MM-DDT02:47:32.943532276Z" level=info msg=serving... address=/run/containerd/containerd.sock\ntime="YYYY-MM-DDT02:47:32.943564672Z" level=info msg="containerd successfully booted in 0.113195s"\nTraceback (most recent call last):\n File "/usr/lib/containerfw/patch_containers.py", line 122, in <module>\n patch(args.stage_dir)\n File "/usr/lib/containerfw/patch_containers.py", line 82, in patch\n commit(container_list[\'all_patch_containers\'])\n File "/usr/lib/containerfw/patch_containers.py", line 53, in commit\n remove_container(key)\n File "/usr/lib/containerfw/container_util/helper.py", line 134, in remove_container\n run_command([\'rm\', \'-R\', container_dir])\n File "/usr/lib/containerfw/container_util/helper.py", line 38, in run_command\n raise Exception(\'Command failed: stdout %r stderr %r\' % (text, err))\nException: Command failed: stdout b\'\' stderr b"rm: cannot remove \'/storage/containers/ws1-init-container/023cbfc77a34f45a2fc497972d4ed20c97d8044333d34ba56b0164cbd5f92923/rootfs\': Device or resource busy\\n"\ntime="YYYY-MM-DDT02:50:34.535516263Z" level=info msg="Stop CRI service"\ntime="YYYY-MM-DDT02:50:34.552831577Z" level=info msg="Stop CRI service"\ntime="YYYY-MM-DDT02:50:34.552963745Z" level=info msg="Event monitor stopped"\ntime="YYYY-MM-DDT02:50:34.553017816Z" level=info msg="Stream server stopped"\n'

YYYY-MM-DD 02:50:34,626 - 22409 - dbfunctions_target:: executeDML: 56 - DEBUG - Executing Query {UPDATE install_progress SET Subphase = ?, end_time = DATETIME('now'), status = ? WHERE Phase = ? } with parameters ('Patching containers failed.', 'failed', 'Installing Containers')

 

Environment

VMware vSphere 8.x

Cause

During patch_containers.py's commit phase, required containers are cleaned up and removing ws1-init-container fails stating rootfs is busy (in use). This is a short lived container used as part of pre-start and should exit post pre-start. vmon logs suggests the container filesystem was unmounted. The code expects the mount point to be free to unmount and remove dir safely but some process is actively using it.

Resolution

This issue is fixed in : vCenter 8.0U3g


Workaround:

Revert back to snapshot taken before patching.
Delete the below container folders as mentioned in above logs and re-run the patching. 

/storage/containers/vc-ws1a-broker/f6b616b765efe15183f0e49383402ac4cddf90c9cb649a17192637ba38825517
/storage/containersws1-init-container/65e6059dac4e8fa14fa066a0cb7bf5e26839c6b0f85cd7e4a87243bece54dbc8