NSX Edge Load Balancer temporary returns 503 after pool member change
search cancel

NSX Edge Load Balancer temporary returns 503 after pool member change

book

Article ID: 303188

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • Connecting to NSX Edge Loadbalancer fails with HTTP Error 503 after a number of pool member change.
  • NSX Edge Loadbalancer has been configured to use Monitor Service for health monitoring in the past and currently uses BUILT-IN.
  • To identify which health monitor is used, use "show service loadbalancer virtual" in NSX Edge CLI.

    Example for Monitor Service:

    +->POOL MEMBER: poolX/memberX, STATUS: DOWN
    | | HEALTH MONITOR = MONITOR SERVICE, monitorX:CRITICAL
    | | | LAST STATE CHANGE: <DATE> <time></time>
    | | | LAST CHECK: <DATE> <time></time>
    | | | FAILURE DETAIL: PING CRITICAL - Packet loss = 100%
    | | SESSION (cur, cps, total) = (0, 0, 0)
    | | BYTES in = (0), out = (0)

    Example of Built-IN:

    +->POOL MEMBER: poolX/memberX, STATUS: UP
    | | HEALTH MONITOR = BUILT-IN, monitorX:L7OK
    | | | LAST STATE CHANGE: <DATE> <time>
    </time>
    | | SESSION (cur, max, total) = (0, 0, 0)
    | | BYTES in = (0), out = (0)
  • You see entries similar to NSX Edge logs when you cannot access backend servers through Loadbalancer:

loadbalancer[<PID>]: [LB]: [local0.info] XXX.XXX.XXX.XXX - - [<DATE>:<time>] "GET / HTTP/1.1" 503 XXX "" "" XXXXX XXX "X" "X" "<NOSRV>" 0 -1 -1 -1 0 SC-- 0 0 0 0 0 0 0 "" </time>

  • You see entries similar to NSX Edge logs when you configure loadbalancer pool members:

    config: [daemon.warning] WARN :: C_UTILS :: File /var/db/networkmonitor/monitor_retention.dat not exist
    loadbalancer[<PID>]: [LB]: [local0.alert] Server poolXX/member1 is DOWN, changed from CLI. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    loadbalancer[<PID>]: [LB]: [local0.alert] Server poolXX/member2 is DOWN, changed from CLI. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    loadbalancer[<PID>]: [LB]: [local0.alert] Server poolXX/member3 is DOWN, changed from CLI. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
    loadbalancer[<PID>]: [LB]: [local0.emerg] backend poolXX has no server available!
    config: [daemon.info] INFO :: CONFIG_MGR :: update ipvs...
    config: [daemon.info] INFO :: C_IPVS :: IPVS: stop connection sync-up daemon
    config: [daemon.info] INFO :: CONFIG_MGR :: update nagios...
    config: [daemon.info] INFO :: C_ServiceControl :: update nagios to down
    config: [daemon.info] INFO :: CONFIG_MGR :: --------------- Collecting the configurator output ---------------
    config: [daemon.info] INFO :: Utils :: saved data to /var/db/vmware/vshield/vse_one/resource_save.psf
    config: [daemon.info] INFO :: Utils :: saved data to /var/db/vmware/vshield/vse_one/config_save.psf
    config: [daemon.info] INFO :: vse_configure :: update success
    config: [daemon.info] INFO :: Utils :: ha: UpdateHaResourceFlags:

 



Environment

NSX for vSphere 6.3.x

NSX for vSphere 6.4.x

Cause

This issue occurs due to mis-loading old health status reported by Monitor Service

Resolution

To work around this issue, configure BUILT-IN with all different id & name pairs for pool / member / monitor from what were used in Monitor Service.

Using Web Client, to create NSX Edge Loadbalancer with new id & name pairs, navigate to Networking & Security > NSX Edges > Manage > Load Balancer.

  • For monitor, navigate to Service Monitoring.
  • For pool and its members, navigate to Pools.

Using REST API, change (poolId, name), (memberId, name), (monitorId, name)