Cluster upgrades returns success, but requires manual reboot of VM post-upgrade
search cancel

Cluster upgrades returns success, but requires manual reboot of VM post-upgrade

book

Article ID: 369955

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

If a syslog daemon is configured in the cluster outside of the systemSettings addon within TCA, an upgrade of the cluster might result in success, but requires a manual reboot of the VM to properly function.

In the nodeconfig-daemon pod logs you see the following errors:

2024-05-09T03:42:44.134659827Z stdout F 2024-05-09T03:42:44.135Z [Err-syslog] : Start syslog-ng.service failed, exit status 1,Job for syslog-ng.service canceled.
2024-05-09T03:42:44.134691844Z stdout F 2024-05-09T03:42:44.135Z [Debug-profile_service] : receive plugin status update: {PluginName:syslog Status:Failed Reason:Start Syslog-ng.service failed LastErr:exit status 1 ErrTime:2024-05-09 03:42:44.13455614 +0000 UTC m=+543.021834294}
2024-05-09T03:42:44.172103874Z stdout F 2024-05-09T03:42:44.172Z [Info-profile_service] : update nodeprofilestatus to {Failed [{nodeFeatureDiscovery Normal <nil>} {kernelArgs Normal <nil>} {packages Normal <nil>} {fileInjection Normal <nil>} {nicNaming Normal <nil>} {kernelType Normal <nil>} {dpdkBind Normal <nil>} {systemdsrv Normal <nil>} {sriovdp Normal <nil>} {registries Normal <nil>} {syslog Failed Start Syslog-ng.service failed exit status 1 2024-05-09 03:42:44.13455614 +0000 UTC m=+543.021834294} {tuned Normal <nil>} {passwords Normal <nil>} {kernelMods Normal <nil>} {postCheck Normal <nil>} {profileMonitor Normal <nil>}] 1715225482} succeed
2024-05-09T03:42:54.108188163Z stdout F 2024-05-09T03:42:54.108Z [Debug-profile_service] : Failed plugin syslog retry left count 9
2024-05-09T03:43:04.126820209Z stdout F 2024-05-09T03:43:04.127Z [Debug-profile_service] : Failed plugin syslog retry left count 8
2024-05-09T03:43:14.1459792Z stdout F 2024-05-09T03:43:14.146Z [Debug-profile_service] : Failed plugin syslog retry left count 7
2024-05-09T03:43:24.165271521Z stdout F 2024-05-09T03:43:24.165Z [Debug-profile_service] : Failed plugin syslog retry left count 6
2024-05-09T03:43:34.184674993Z stdout F 2024-05-09T03:43:34.185Z [Debug-profile_service] : Failed plugin syslog retry left count 5
2024-05-09T03:43:44.205310554Z stdout F 2024-05-09T03:43:44.205Z [Debug-profile_service] : Failed plugin syslog retry left count 4
2024-05-09T03:43:54.22553811Z stdout F 2024-05-09T03:43:54.225Z [Debug-profile_service] : Failed plugin syslog retry left count 3
2024-05-09T03:44:04.24435868Z stdout F 2024-05-09T03:44:04.244Z [Debug-profile_service] : Failed plugin syslog retry left count 2
2024-05-09T03:44:14.262927465Z stdout F 2024-05-09T03:44:14.263Z [Debug-profile_service] : Failed plugin syslog retry left count 1
2024-05-09T03:44:24.282989999Z stdout F 2024-05-09T03:44:24.283Z [Debug-profile_service] : Failed plugin syslog retry left count 0
2024-05-09T03:44:34.302051693Z stdout F 2024-05-09T03:44:34.302Z [Info-profile_service] : Retry failed plugins: [syslog]
2024-05-09T03:44:34.302070321Z stdout F 2024-05-09T03:44:34.302Z [Debug-profile_service] : Trigger update to failed plugin syslog
2024-05-09T03:44:34.302489062Z stdout F 2024-05-09T03:44:34.302Z [Debug-profile_service] : receive plugin status update: {PluginName:syslog Status:Running Reason: LastErr:<nil> ErrTime:0001-01-01 00:00:00 +0000 UTC}
2024-05-09T03:44:34.322612338Z stdout F 2024-05-09T03:44:34.322Z [Err-syslog] : Check syslog-ng.service status failed,exit status 3

Environment

Telco Cloud Automation

Cause

If a syslog daemon is configured on a TCA-deployed cluster without configuring it via the systemSettings configuration within TCA, then a race condition may arise that results in a false-positive success of the upgrade.

Resolution

Currently the only supported syslog service in TCA is syslog-ng. This can be configured via the systemSettings addon within TCA. Configuring syslog in this way will prevent this issue.

As a workaround, you can reboot the VM to resolve the issue. This is only needed immediately post-upgrade.