There are multiple scenarios and manifestations of stale service instances being left behind after removing a service deployment. One of the errors is as below
If you are using Malware Prevention Service (MPS)
After deploying MPS on a cluster, we go to check the status of the deployment (IDS/IPS & Malware Prevention → Settings → Shared → Activate Hosts & Clusters for East-West Traffic)
Here, after clicking the Deployment Status to get the overall status as well as the status for each transport node, we see an error like "Error: The requested object : DeploymentUnitInstance/#### could not be found. Object identifiers are case sensitive. (Error code: 600)"
Using API calls to get the status of the deployment returns similar errors.
SSP 5.0
Looking at the InstanceRuntimes (output of corfu_tool_runner.py --tool corfu-browser -o showTable -n nsx -t InstanceRuntime), we see a InstanceRuntime which has a different service ID or a different product version from the other instances.
Example:
Key:
{
"uuid": { <------- Note this key for deleting the stale instance runtime.
"left": "$$$$",
"right": "$$$$"
}
}
Payload:
{
"managedResource": {
"displayName": "some-svm"
},
"serviceInstanceId": { <----- Note this key for deleting the stale service instance
"left": "&&&&",
"right": "&&&&"
},
"deploymentUnitId": {
"left": "###",
"right": "###"
},
"deploymentInstanceId": {
"left": "###",
"right": "###"
},
"hostId": "###",
"svmId": "###:vm-###",
"vmExternalId": "###",
"deploymentState": "VM_DEPLOYMENT_STATE_DEPLOYMENT_SUCCESSFUL",
"runtimeState": "VM_RUNTIME_STATE_IN_SERVICE",
"vmNicInfo": {
"nicInfo": [{
"nicMetadata": {
"interfaceLabel": "eth",
"interfaceType": "INTERFACE_TYPE_MGMT",
"userConfigurable": true
},
"networkId": "dvportgroup-##",
"ipAddress": {
"ipv4": ###
},
"subnetMask": {
"ipv4": ###
},
"gatewayAddress": {
"ipv4": ###
},
"macAddress": {
"mac": "###"
},
"vif": "###",
"ipPoolId": {
"left": "###",
"right": "###"
},
"dnsServer": ["###", "###"],
"dnsSuffix": "###",
"ipAllocationType": "IP_ALLOCATION_TYPE_STATIC"
}, {
"nicMetadata": {
"interfaceLabel": "eth",
"interfaceIndex": 1,
"interfaceType": "INTERFACE_TYPE_CONTROL",
"userConfigurable": false
},
"macAddress": {
"mac": "###"
},
"vif": "###"
}]
},
"markedAsSvm": true,
"serviceId": {
"left": "123456789",
"right": "987654321" <------------ This service ID seems to be old as well as it does not match the other deployments
},
"isMacAvailableForAllNic": true
}
Metadata:
{
"revision": "###",
"createTime": "###",
"createUser": "system",
"lastModifiedTime": "###",
"lastModifiedUser": "system",
"productVersion": "3.2.3.1.0" <----- this shows that it was deployed on older NSX
}
.
a. Un-deploy the service from the problematic cluster
b. Use the below API to cleanup the identified stale serviceinstances / instanceruntimes
POST https://<Manager-IP>/api/v1/serviceinsertion/services/<Service-ID>/service-instances/<Instance-ID>/instance-runtimes?action=delete
Note: you may use the below script to get the service-id and instance-id from the 'left', 'right' corfu key.
|
#!/usr/bin//python3
# usage : thistool.py left right
# example:
# user@ubuntu2204:~/tools$ python thistool.py 5309577210414842440 13828241991281864423
# > 49af6857-6e49-4248-bfe7-c8d36e7eeee7
import sys
import uuid
def main():
l=int(sys.argv[1])
r=int(sys.argv[2])
c = uuid.UUID(int=( l << 64 ) + r)
print(c)
if __name__ == "__main__":
main()
|
c. Re-deploy the service in the cluster
If the above steps do not resolve the issue, please execute the following commands from NSX Manager's root mode:
corfu_tool_runner.py -n nsx -o showTable -t ServiceInstance > /somelocation/ServiceInstance.txt
corfu_tool_runner.py -n nsx -o showTable -t InstanceEndpoint > /somelocation/InstanceEndpoint.txt
corfu_tool_runner.py -n nsx -o showTable -t InstanceRuntime > /somelocation/InstanceRuntime.txt
corfu_tool_runner.py -n nsx -o showTable -t ServiceDeployment > /somelocation/ServiceDeployment.txt
corfu_tool_runner.py -n nsx -o showTable -t GiNodeSolutionInfo > /somelocation/GiNodeSolutionInfo.txt
Collect the output, along with a support bundle, and submit a support request. Since the cleanup process involves database modifications, ensure that you have up-to-date backups.