This is a known issue affecting VMware NSX Intelligence 1.1.x and 1.2.0.
Currently, there is no resolution.
Workaround:
To work around this issue:
- Run these commands to confirm if you have the issue:
/opt/apache-hadoop/bin/hdfs dfsadmin -report
/opt/apache-hadoop/bin/hdfs dfs -du /druid
Note: If "DFS Used" from the first command is equal to the present capacity (For example: 300GB) then HDFS is full. You should expand the disk or delete some logs.
If the second command shows a lot of indexing logs:
- Use the "/opt/apache-hadoop/bin/hdfs dfs -rm -r /druid/indexing-logs/*" command to delete all indexing logs.
Note: Wait a while and check if DFS usage has gone down.
- Druid supervisors should automatically recover in a few hours. You can use the systemctl restart configure-druid command to speed up the recovery.
a. Wait a while and use "curl -X GET http://localhost:8090/druid/indexer/v1/supervisor?state=true" to check if all supervisors are in "RUNNING" state.
b. If any is not, use "curl -X POST http://localhost:8090/druid/indexer/v1/supervisor/<supervisorId>/reset" to reset the supervisor.
Note: If the setup was upgraded from 1.0.x, the following supervisors may exist. You do not need to reset and it can be terminated through:
"curl -X POST http://localhost:8090/druid/indexer/v1/supervisor/<supervisorId>/terminate"
<supervisorID> in
['pace2druid_manager_dfw_rule_config','pace2druid_manager_nsgroup_config','pace2druid_manager_vm_config','pace2druid_policy_dfw_rule_config', 'pace2druid_policy_group_config','pace2druid_policy_service_config','pace2druid_policy_service_entry_config']
- Use the systemctl restart nsx-config command to get the latest config objects from NSX.