Antrea OVS Container crashes on vSphere with Tanzu TKR versions 1.24.11 through 1.26.12
search cancel

Antrea OVS Container crashes on vSphere with Tanzu TKR versions 1.24.11 through 1.26.12

book

Article ID: 323460

calendar_today

Updated On:

Products

VMware vCenter Server vSphere with Tanzu

Issue/Introduction

Symptoms:
Workload clusters using Antrea CNI and TKR 1.24.11 will experience a crash on the antrea-ovs container once per approximately 24 hours.

The logging for the antrea-ovs container will look similar to the following:
 
[YYYY-MM-DDTHH:MM:SS] stdout F [[MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: Starting ovsdb-server
[YYYY-MM-DDTHH:MM:SS] stdout F Starting ovsdb-server.
[YYYY-MM-DDTHH:MM:SS] stdout F Configuring Open vSwitch system IDs.
[YYYY-MM-DDTHH:MM:SS] stderr F /usr/share/openvswitch/scripts/ovs-ctl: line 46: hostname: command not found
[YYYY-MM-DDTHH:MM:SS] stdout F Enabling remote OVSDB managers.
[YYYY-MM-DDTHH:MM:SS] stdout F [MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: Started ovsdb-server
[YYYY-MM-DDTHH:MM:SS] stdout F [[MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: Starting ovs-vswitchd
[YYYY-MM-DDTHH:MM:SS] stdout F [[MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: ovs-vswitchd set hw-offload to false
[YYYY-MM-DDTHH:MM:SS] stdout F Starting ovs-vswitchd.
[YYYY-MM-DDTHH:MM:SS] stderr F /usr/share/openvswitch/scripts/ovs-ctl: line 46: hostname: command not found
[YYYY-MM-DDTHH:MM:SS] stdout F Enabling remote OVSDB managers.
[YYYY-MM-DDTHH:MM:SS] stdout F [[MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: Started ovs-vswitchd
[YYYY-MM-DDTHH:MM:SS] stdout F [[MM;SS][YYYY-MM-DDTHH:MM:SS] [0;32mINFO [0;36mantrea-ovs[0m]: Started the loop that checks OVS status every 30 seconds
[YYYY-MM-DDTHH:MM:SS] stderr F error: failed to compress log /var/log/openvswitch/ovs-vswitchd.log.1
 

Environment

VMware vCenter Server 7.x
VMware vCenter Server 8.x

Cause

This is a known issue within the release of antrea included in this TKR: antrea:v1.7.2_vmware.3. The log-rotate feature in the antrea-ovs container is configured to rotate the logs daily but the image does not ship with gzip. Thus, the container crashes at regular intervals.

Resolution


This issue is resolved in TKR version 1.27.6, which is included in vCenter Server 7.0 Update 3P and later.

If the customer is using vCenter 8.x, it is recommended to upgrade to at least vCenter Server 8.0 Update 2c or later, which supports TKR 1.27.6.

TKr 1.27.6 for vSphere 7.x - Upgrade path to 8.x

 

Additional Information

Impact/Risks:
The container will show a restart, but there has been no noticeable effect to production systems experiencing this issue.