Catalog service pod stuck crashing and restarting, where external IPAM is used. Errors are seen about GET /config/toggles.

search cancel

Catalog service pod stuck crashing and restarting, where external IPAM is used. Errors are seen about GET /config/toggles.

book

Article ID: 372894

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Catalog service can crash due to many concurrent endpoint enumeration requests.

Here are example error messages from the crash loop of the catalog-service-app pod:

WARN The web application [ROOT] appears to have started a thread named [OkHttp TaskRunner] but has failed to stop it. This is very likely to create a memory leak.
ERROR catalog-service-app [...] - Error while starting the application: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'catalogPolicyActuatorController': Invocation of init method failed;
Caused by: org.springframework.web.client.ResourceAccessException: I/O error on GET request for "http://provisioning-service:8282/provisioning/config/toggles": Read timed out; nested exception is java.net.SocketTimeoutException:

Environment

Aria Automation 8.18.0 or lower

Cause

There is a blocking call to the DB in SubnetRangeService to update IP ranges from e.g. Infoblox to Aria Automation.
This is called by IPAM endpoint enumeration when there is a change in Infoblox, causing provisioning service to execute the blocking code for multiple SubnetRangeStates.

When enough of this blocking invocations are performed, the index pool of provisioning service depletes, rendering it unable to service other database requests (which most APIs require).

Resolution

This is fixed in 8.18.1 so that these DB calls are non-blocking.

Workaround

To bring the vRA system back up, it is possible to make IPAM filter out all network objects. Therefore the problematic code won't get executed.
However, this also means the IPAM integration causing the issue is no longer usable for this time.

Another approach is to look at all IP ranges in Infoblox and in Automation, to see which differ in the start and end IP.
We can then either manually patch them or configure a filter to not collect them.

Feedback

thumb_up Yes

thumb_down No