-The HTTPS service goes down intermittently and users/automation may face issues connecting to the manager.
-Alarm "cluster being degraded due to HTTPS service going down" may triggered during that time.
/var/log/cbm/cbm.log
2024-12-16T04:51:43.895Z WARN pool-105-thread-1 EventReportSyslogSender 68903 MONITORING [nsx@6876 comp="nsx-manager" entId="0e####f9-###4-3###-###0-1c###26###" eventFeatureName="clustering" eventSev="warning" eventState="On" eventType="cluster_degraded" level="WARNING" subcomp="cbm"] Group member 0e####f9-###4-3###-###0-1c###26### of service HTTPS is down.
/var/log/proxy/reverse-proxy.log
INFO localhost-startStop-2 HeartbeatServiceClientImpl 2210640 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="http"] Stopping heartbeat for entity: HTTP
INFO localhost-startStop-2 HeartbeatServiceClientImpl 2210640 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="http"] Stopped heartbeat for entity: HTTP
/var/log/syslog
2024-12-16T04:50:57.359Z nsxmgr-02 systemd 1 - - Stopping proxy: VMware NSX reverse-proxy API server...
2024-12-16T04:51:25.240Z nsxmgr-02 systemd 1 - - Stopped proxy: VMware NSX reverse-proxy API server.
2024-12-16T04:51:25.475Z nsxmgr-02 systemd 1 - - Starting proxy: VMware NSX reverse-proxy API server...
2024-12-16T04:51:37.237Z nsxmgr-02 systemd 1 - - Started proxy: VMware NSX reverse-proxy API server.
2024-12-16T04:54:33.838Z nsxmgr-02 systemd 1 - - Stopping proxy: VMware NSX reverse-proxy API server...
2024-12-16T04:55:01.581Z nsxmgr-02 systemd 1 - - Stopped proxy: VMware NSX reverse-proxy API server.
2024-12-16T04:55:01.187Z nsxmgr-02 systemd 1 - - Starting proxy: VMware NSX reverse-proxy API server...
2024-12-16T04:55:14.453Z nsxmgr-02 systemd 1 - - Started proxy: VMware NSX reverse-proxy API server.
/var/log/proxy/proxy/proxy-tomcat-wrapper.log
2024-12-16T04:50:58.063Z INFO org.apache.coyote.AbstractProtocol pause Pausing ProtocolHandler ["https-jsse-nio-127.0.0.1-443"]
2024-12-16T04:50:58.065Z INFO org.apache.coyote.AbstractProtocol pause Pausing ProtocolHandler ["https-jsse-nio-##.##.##.##-443"]
2024-12-16T04:50:58.065Z INFO org.apache.coyote.AbstractProtocol pause Pausing ProtocolHandler ["https-jsse-nio-##.##.##.#-443"]
2024-12-16T04:50:59.066Z INFO org.apache.catalina.core.StandardService stopInternal Stopping service [Catalina]
STATUS | wrapper | 2024/12/16 04:50:57 | TERM trapped. Shutting down.
STATUS | wrapper | 2024/12/16 04:51:25 | <-- Wrapper Stopped
STATUS | wrapper | 2024/12/16 04:51:33 | --> Wrapper Started as Daemon
STATUS | wrapper | 2024/12/16 04:51:33 | Java Service Wrapper Professional Edition 64-bit 3.5.41
STATUS | wrapper | 2024/12/16 04:51:33 | Copyright (C) 1999-2019 Tanuki Software, Ltd. All Rights Reserved.
STATUS | wrapper | 2024/12/16 04:51:33 | http://wrapper.tanukisoftware.com
STATUS | wrapper | 2024/12/16 04:51:33 | Licensed to VMware Global, Inc. for VMware NSX Manager
STATUS | wrapper | 2024/12/16 04:51:33 |
STATUS | wrapper | 2024/12/16 04:51:34 | Launching a JVM...
INFO | jvm 1 | 2024/12/16 04:51:34 | WrapperManager: Initializing...
INFO | jvm 1 | 2024/12/16 04:51:37 | 2024-12-16T04:51:37.714Z INFO org.apache.catalina.startup.Catalina load Initialization processed in 3083 ms
INFO | jvm 1 | 2024/12/16 04:51:49 | 2024-12-16T04:51:49.728Z INFO org.apache.catalina.startup.Catalina start Server startup in 12013 ms
STATUS | wrapper | 2024/12/16 04:54:33 | TERM trapped. Shutting down.
STATUS | wrapper | 2024/12/16 04:55:00 | <-- Wrapper Stopped
STATUS | wrapper | 2024/12/16 04:55:10 | --> Wrapper Started as Daemon
STATUS | wrapper | 2024/12/16 04:55:10 | Java Service Wrapper Professional Edition 64-bit 3.5.41
STATUS | wrapper | 2024/12/16 04:55:10 | Copyright (C) 1999-2019 Tanuki Software, Ltd. All Rights Reserved.
STATUS | wrapper | 2024/12/16 04:55:10 | http://wrapper.tanukisoftware.com
STATUS | wrapper | 2024/12/16 04:55:10 | Licensed to VMware Global, Inc. for VMware NSX Manager
STATUS | wrapper | 2024/12/16 04:55:10 |
STATUS | wrapper | 2024/12/16 04:55:11 | Launching a JVM...
INFO | jvm 1 | 2024/12/16 04:55:11 | WrapperManager: Initializing...
INFO | jvm 1 | 2024/12/16 04:55:14 | 2024-12-16T04:55:14.907Z INFO org.apache.catalina.startup.Catalina load Initialization processed in 3249 ms
INFO | jvm 1 | 2024/12/16 04:55:27 | 2024-12-16T04:55:27.470Z INFO org.apache.catalina.startup.Catalina start Server startup in 12562 ms
VMware NSX
Every time the CLI is invoked to change the property "max-auth-failures", it restarts reverse-proxy. With every restart, the reverse proxy is down for 1-2 minutes during which the HTTPS status is down.
/var/log/syslog
2024-12-16T00:49:40.300Z nsxmgr-02 ansible-command - - - Invoked with _raw_params=/bin/nsxcli -c "set auth-policy api max-auth-failures 99999" _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
2024-12-16T00:53:13.129Z nsxmgr-02 ansible-command - - - Invoked with _raw_params=/bin/nsxcli -c "set auth-policy api max-auth-failures 5" _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
2024-12-16T00:49:40.300Z (67604 www-data /usr/bin/python3 /opt/vmware/nsx-node-api/bin/python/management_api/webserver/nvpapi.py --addr=127.0.0.1:7441 --ssl_server_addr_conf_file=DISABLED --ssl_addr_default=DISABLED --log_level=20 --error_log=/var/log/nvpapi/api_server.log --access_log=/var/log/nvpapi/api_access.log --ssl_access_log=/var/log/nvpapi/api_ssl_access.log --server_cert=/opt/vmware/nsx-node-api/etc/cert.pem --server_key=/opt/vmware/nsx-node-api/etc/privkey.pem --session_cookie=nvp_sessionid --inactivity_timeout=900 --process_user_name=www-data --process_user_group=www-data --ca_certs=CONTROL_API --persistent_task_location=/config/vmware/nsx-node-api/tasks --persistent_task_in_store=7200 --non_persistent_task_in_mem=3600 --descriptor_settings=)(85338 root sudo /opt/vmware/nsx-node-api/bin/api_roothelper.sh)(85339 root /bin/bash /opt/vmware/nsx-node-api/bin/api_roothelper.sh)(85340 root /usr/bin/python3 -OO /opt/vmware/nsx-node-api/bin/python/management_api/napi/rest_routine_roothelper.py)(85344 root systemctl restart proxy.service)
Alternately we can use the following API's to exclude the IP from lockout if there is a requirement to avoid lockout.
We have to add the following "lockout_immune_addresses": [ "#.#.#.#", "#.#.#.#" ]" to the GET payload and use it PUT API to update it.
GET /api/v1/cluster/api-service
PUT /api/v1/cluster/api-service
#.#.#.# ==> The IP's you want to exclude from the lockout.
Sample PUT payload
{ "_revision" : 0, "global_api_concurrency_limit" : 199, "basic_authentication_enabled": true, "cipher_suites": [ { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_256_GCM_SHA384" }, { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_128_GCM_SHA256" }, { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_256_CBC_SHA256" }, { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_256_CBC_SHA" }, { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_128_CBC_SHA256" }, { "enabled": true, "name": "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA" }, { "enabled": true, "name": "TLS_RSA_WITH_AES_128_CBC_SHA" } ], "protocol_versions": [ { "enabled": true, "name": "TLSv1.2" }, { "enabled": false, "name": "TLSv1.3" } ], "lockout_immune_addresses": [ "#.#.#.#", "#.#.#.#" ] }'