To cluster the Visualization Canvas by flows, we create a graph of network activity in the datacenter before running an ML algorithm on the graph. In high scale environments we limit the number of edges in the graph such that it does not grow to the point that the system cannot handle. In some edge cases, such as environment where computes are members of many groups, the number of nodes in the graph can grow very large without the edge limit activating. This causes the graph size to grow in memory to the point of causing an OOM in the pod or for the pod to run out of ephemeral storage.
SSP 5.0
When we create a graph of the network communication in a customer environment, we often have many more edges than we do nodes in the graph. This is due to the fact that each compute is often talking to many more than one other compute over the course of a thirty day period. However, in some edge cases, the number of nodes in the graph can grow without the number of edges increasing. In these cases, the size of the graph we build can grow without tripping certain guardrails we have in place to limit the number of edges in the graph.
To check the status of the pod:
Login to SSPI via cli using root credentials and get the pod name which stuck in ContainerStatusUnknown using below command :
k -n nsxi-platform get pods -o wide | grep feature-service-flow-feature-creator
feature-service-flow-feature-creator-xxxxxxx
and check for the pod events using below command:
k -n nsxi-platform describe pod feature-service-flow-feature-creator-xxxxxxx
The events of the pod might look something like this in the ephemeral storage case:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 14m kubelet Container image "sspi101.ans.local/clustering/third-party/wait-for@sha256:feacef52ef1a9b1654d680c53af00b8461be50b1db29c3e4115b439a0ec03008" already present on machine
Normal Created 14m kubelet Created container wait-for-postgresql-ha-pgpool
Normal Started 14m kubelet Started container wait-for-postgresql-ha-pgpool
Normal Scheduled 14m default-scheduler Successfully assigned nsxi-platform/feature-service-flow-feature-creator-29052280-n2tzs to longevity-test-md-0-v5cdq-649zv
Normal Started 14m kubelet Started container wait-for-postgresql-ha-postgresql
Normal Created 14m kubelet Created container wait-for-postgresql-ha-postgresql
Normal Pulled 14m kubelet Container image "sspi101.ans.local/clustering/third-party/wait-for@sha256:feacef52ef1a9b1654d680c53af00b8461be50b1db29c3e4115b439a0ec03008" already present on machine
Normal Pulled 14m kubelet Container image "sspi101.ans.local/clustering/third-party/wait-for@sha256:feacef52ef1a9b1654d680c53af00b8461be50b1db29c3e4115b439a0ec03008" already present on machine
Normal Created 14m kubelet Created container wait-for-feature-service-s3-provisioning
Normal Started 14m kubelet Started container wait-for-feature-service-s3-provisioning
Normal Started 14m kubelet Started container feature-service-data-service
Normal Created 14m kubelet Created container feature-service-data-service
Normal Pulled 14m kubelet Container image "sspi101.ans.local/clustering/feature-service@sha256:4eb6535d28d22f2019989148ccaaed8429e91dc0a6b0a4259d0acab8c0066aea" already present on machine
Normal Pulled 12m kubelet Container image "sspi101.ans.local/clustering/feature-service@sha256:4eb6535d28d22f2019989148ccaaed8429e91dc0a6b0a4259d0acab8c0066aea" already present on machine
Normal Created 12m kubelet Created container feature-service
Normal Started 12m kubelet Started container feature-service
Normal Pulled 11m kubelet Container image "sspi101.ans.local/clustering/visualization@sha256:65df6746a941093c84a552dcec7e62834764a154f8262f2c331222ea7e1ef3fa" already present on machine
Normal Created 11m kubelet Created container feature-service-clustering-service
Normal Started 11m kubelet Started container feature-service-clustering-service
Warning Evicted 4m23s kubelet Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.
Normal Killing 4m23s kubelet Stopping container feature-service-clustering-service