This is a known issue affecting upgrades from NSX Intelligence 1.0.x to a higher version that includes NSX Intelligence 1.1.x and 1.2.0.
This issue is resolved in VMware NSX Intelligence 1.2.1, available at
VMware Downloads.
Workaround:
To work around this issue:
If you upgraded from NSX Intelligence 1.0.x and have no symptoms:
- Run the attached python remove_backlogged_cleanup_task.py script. This will cleans up existing backlogged cleanup tasks.
- Run the attached python unblock_cleanup_task.py script. This enables the tasks to find the old tables, and should not cause the backlog in the future.
Note: You need to enter an argument about the version number of NSX Intelligence when using this script. The version should start with either 1.1 or 1.2.
For example, "python3 unblock_cleanup_task.py -v 1.1.001"
If you upgraded from NSX Intelligence 1.0.x and experience an Out of Memory and the Druid service fails, the service needs to be restored first by giving more memory to the Druid service and then run the
remove_backlogged_cleanup_task.py and
unblock_cleanup_task.py.
- open /opt/druid/conf/druid/overlord/jvm.config using an editor.
- Search for -Xms and -Xmx. You should see something similar to:
-Xms512M
-Xmx512M
- Change the JVM memory to 2GB to start with. For example:
Xms2G
Xmx2G
- Save the file and restart the Druid service by running this command:
service druid restart
Note: After this, you should see the overlord service come up successfully. If it still fails, try changing the memory to higher values (3G, 4G etc. to bring the service back up.
- Run the remove_backlogged_cleanup_task.py and unblock_cleanup_task.py scripts.