[Internal] Salt minion cache cannot be saved properly after upgrading Aria Automation Config to 8.17
search cancel

[Internal] Salt minion cache cannot be saved properly after upgrading Aria Automation Config to 8.17

book

Article ID: 373037

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • The environment includes:
    • One salt master node installed on RHEL 8.9
    • One raas node installed on RHEL 8.9
    • One postgres node installed on RHEL 8.9
    • One redis node installed on RHEL 8.9
    • More than 400 Linux/Windows virtual machines with salt minion
  • The issue happens after upgrading Aria Automation Config from 8.16.2 to 8.17.
  • Salt minion grain is saved to postgresql database when there is salt-minion restart, which will trigger this problem.
  • Salt minion grain saving failure will lead to other salt feature's malfunction, e.g. failed to deploy virtual machine from Aria Automation due to salt failed to respond the request. It is because that raas stuck in trying saving minion grain cache repeatedly, so that other request cannot be responded by raas.
  • When issue happens, in /var/log/raas/raas log on raas node, there is "Failed to save updated minion cache from master" error as below:

2024-06-18 20:55:05,749 [raas.utils.rpc                                                    ][ERROR   :216 ][Webserver:150466] Failed to save updated minion cache from master
<master node FQDN>
Traceback (most recent call last):
  File "sqlalchemy/engine/base.py", line 1256, in _execute_context
    self.dialect.do_executemany(
  File "sqlalchemy/dialects/postgresql/psycopg2.py", line 912, in do_executemany
    cursor.executemany(statement, parameters)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type bytea
CONTEXT:  PL/pgSQL function minion_cache_grains_sign() line 3 at assignment

Environment

Aria Automation Config 8.17.x

Cause

  • This issue happens when there are large quantity of salt minions in the environment.
  • After upgrading to Aria Automation Config 8.17.x, saving large quantity of salt minion grain cache to postgresql can make raas service busy, then failed to respond to other request.
  • By now, the exact number of salt minion to trigger the problem is unclear.

Resolution

Adjust "sseapi_max_minion_grains_payload" in /etc/salt/master.d/raas.conf on salt master node can resolve the problem.

The procedure is:

  • Login salt master node with SSH terminal
  • Backup the config file by command:

cp /etc/salt/master.d/raas.conf /tmp/raas.conf

  • Edit file /etc/salt/master.d/raas.conf:
    • remove "#" before "sseapi_max_minion_grains_payload" if it exists to make sure the configuration is not commented out.
    • change "sseapi_max_minion_grains_payload" to 100
    • save&quit.
  • On raas node, input command below to restart raas service:

systemctl restart raas

  • On salt master node, input command below to restart salt-master service:

systemctl restart salt-master