If your environment is configured in such a way that each compute may belong to many groups, then the flow feature objects can become corrupted. Once this occurs, the clusters presented on the visualization canvas are not updated, the amount of computes in the "Unclustered" bubble can grow in size over time, and the flow clustering job will fail to complete successfully.
Symptoms:
1. Flow clustering job enters CrashLoop state (feature-service-flow-feature-creator pod repeatedly fails). This can be checked with:
Logging into SSPI via cli using root credentials and run the following:
k -n nsxi-platform get pods | grep feature-service-flow-feature-creator
Look for unsuccessful pod status, such as CrashLoopBackoff.
2. Flow clusters are not updated on the Visualization Canvas.
3. "Unclustered" bubble grows over time, showing an increasing number of computes.
SSP 5.0
If the environment has flows where the source or destination is a member of many groups (typically, greater than 6), the resulting flow is not handled properly by the flow clustering framework and causes an error when trying to write the resulting clustering assignment to postgres. The error is not handled and causes the job to crash with a stack trace similar to the following:
2025-05-12T07:49:35,332 ERROR [main] o.s.b.SpringApplication: Application run failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'visualizationServerApplication': Invocation of init method failed; nested exception is org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [INSERT INTO intelligence.ClusterAssignment (clusterId, clusterType, entityId, siteId) VALUES (?, ?, ?, ?)]; Batch entry 20 INSERT INTO intelligence.ClusterAssignment (clusterId, clusterType, entityId, siteId) VALUES (('2c23f9e4-dc60-4e95-9ae2-f06096de1d26'), ('FLOW'), ('025cfae6-428c-4f2d-b36c-b7d1da238ddd12c4a4bc-93fe-4d03-8c18-0e4ef1b75829161c46e6-d717-457c-9cc0-c0a4a54f07ce21c9675e-07c4-41fd-aa55-49d42f659a9c25adbc58-37ec-4c18-9069-f5961532a1f025e6ee67-7e52-49d9-a69f-d48a8da1b1df29aead0c-113b-4694-bf3c-ed0795830e142aa911af-2895-478c-963e-6d74fd4e9b1b2ab8ae74-72bd-4fd6-8ecc-4bb1c452b5062c7cea1f-4347-4566-ab3f-6abf1c5c27492f117ec8-f369-4483-9ec4-b14a4c80b4442ff52a93-0f6c-4abd-9926-0c1e23fed4373681bea4-bb83-44c6-9728-8420a0b9d052399450f3-76a6-4595-bbd3-7d43ab2e38283c2fdfb6-6a17-460c-8a8a-e8c31b81612a3c68814f-8608-4b95-8c47-7aa582dfcf804015b42b-1853-48dd-a719-d61f884d1d1251a0025b-1ca6-4af4-b7c2-e1e79d69a7e85415504b-146e-416c-b01c-0657b9639b0556dd85ed-6b0e-424a-9362-562511388b1b5823d78b-86b7-4ffa-b999-ff536ec58008598b782a-85fe-4c6f-bff1-3abdb9febdc45be351ce-7dd0-4b4d-9777-181d0764b68c5dc3d507-91e2-46fe-954b-3bdb5272dfb55e036f84-493c-4a44-9946-e88ae82b2cb760c6cb86-5487-43b5-a494-75ae13bbd3d661141c33-ef93-4ce9-9865-e0d2df47c841619f1b70-f82c-4fef-8e5e-'), ('33966d05-257b-4988-9115-f17dbc17bbc5')) was aborted: ERROR: value too long for type character varying(255) Call getNextException to see other errors in the batch.; nested exception is java.sql.BatchUpdateException: Batch entry 20 INSERT INTO intelligence.ClusterAssignment (clusterId, clusterType, entityId, siteId) VALUES (('2c23f9e4-dc60-4e95-9ae2-f06096de1d26'), ('FLOW'), ('025cfae6-428c-4f2d-b36c-b7d1da238ddd12c4a4bc-93fe-4d03-8c18-0e4ef1b75829161c46e6-d717-457c-9cc0-c0a4a54f07ce21c9675e-07c4-41fd-aa55-49d42f659a9c25adbc58-37ec-4c18-9069-f5961532a1f025e6ee67-7e52-49d9-a69f-d48a8da1b1df29aead0c-113b-4694-bf3c-ed0795830e142aa911af-2895-478c-963e-6d74fd4e9b1b2ab8ae74-72bd-4fd6-8ecc-4bb1c452b5062c7cea1f-4347-4566-ab3f-6abf1c5c27492f117ec8-f369-4483-9ec4-b14a4c80b4442ff52a93-0f6c-4abd-9926-0c1e23fed4373681bea4-bb83-44c6-9728-8420a0b9d052399450f3-76a6-4595-bbd3-7d43ab2e38283c2fdfb6-6a17-460c-8a8a-e8c31b81612a3c68814f-8608-4b95-8c47-7aa582dfcf804015b42b-1853-48dd-a719-d61f884d1d1251a0025b-1ca6-4af4-b7c2-e1e79d69a7e85415504b-146e-416c-b01c-0657b9639b0556dd85ed-6b0e-424a-9362-562511388b1b5823d78b-86b7-4ffa-b999-ff536ec58008598b782a-85fe-4c6f-bff1-3abdb9febdc45be351ce-7dd0-4b4d-9777-181d0764b68c5dc3d507-91e2-46fe-954b-3bdb5272dfb55e036f84-493c-4a44-9946-e88ae82b2cb760c6cb86-5487-43b5-a494-75ae13bbd3d661141c33-ef93-4ce9-9865-e0d2df47c841619f1b70-f82c-4fef-8e5e-'), ('33966d05-257b-4988-9115-f17dbc17bbc5')) was aborted: ERROR: value too long for type character varying(255) Call getNextException to see other errors in the batch.
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:160)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:440)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1796)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:336)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:334)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:209)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:591)
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:780)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:453)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:343)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1370)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1359)
at com.vmware.nsx.pace.visualization.server.VisualizationServerApplication.main(VisualizationServerApplication.java:40)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Checking for the stack trace can be done by:
Logging into SSPI via cli using root credentials and get the pod name for the flow feature job using below command:
k -n nsxi-platform get pods | grep feature-service-flow-feature-creator
feature-service-flow-feature-creator-xxxxxxx
and check the pod logs using below command:
k -n nsxi-platform logs feature-service-flow-feature-creator-xxxxxxx
If a stack trace similar to the one above is present, specifically with the error ERROR: value too long for type character varying(255), this is likely the issue.
The fix for this issue is present in SSP 5.1 and onwards, however the issue may be present even after upgrade, as the corrupted objects from the earlier release will still be present.
The corrupted objects will get evicted over time, and replaced with the fixed objects. However, depending on the setup, this can take up to 30 days to occur naturally.
In order to expedite the process, the following steps can be performed:
kubectlk -n nsxi-platform get pods | grep minio and note down the pods with name minio-<number>k -n nsxi-platform exec -it <pod name> -- bashrm -rf /data/minio/data-service/FLOW*rm -rf /data/minio/feature-service/FLOW* k -n nsxi-platform get jobs | grep flow-featurek -n nsxi-platform delete job <job name>k -n nsxi-platform create job manual-flow-feature-creator --from cronjob/feature-service-flow-feature-creator