Found a DR node down this morning, tried a restart vertica on host
admintools -t restart_node -F -s <IP_Address> -d drdata
Startup not successful. Printing errors collected from startup.log.
(Printing exhaustive errors, some errors may have been printed previously)
Host <IP_Address> error ''
--- End of startup.log errors ---
We found the server was rebooted unexpected. When the node tries to recover we see these errors show up in the vertica.log file.
2025-06-29 00:42:57.574 nameless:7f72cc05f600 [Init] <INFO> Startup [Reading Catalog] Applying transaction log (bytes) - 104874246 / 112152838
2025-06-29 00:42:57.574 nameless:7f72cc05f600 [Catalog] <INFO> Opening TxnLog file txn_237660683_g211520965.cat
2025-06-29 00:42:57.778 nameless:7f72cc05f600 [Catalog] <INFO> Catalog is corrupt. Vertica is renaming TxnLogs to corrupt_
2025-06-29 00:42:57.778 nameless:7f72cc05f600 [Catalog] <INFO> Fixing... corrupted_txn_237660683_g211520965.cat --> txn_237660683_g211520965.cat, size 7277028
2025-06-29 00:42:57.872 nameless:7f72cc05f600 [Catalog] <WARNING> Partial or corrupted transaction read from /catalog/drdata/v_drdata_node0003_catalog/Catalog/Txnlogs/txn_237660683_g211520965.cat (total read: 307)
After that we see these additional messages that lead up to the node failing.
2025-06-29 00:43:27.360 nameless:7f71327fc700 [SAL] <WARNING> Error getting size of bundle file of [02088b24417bcfcc7bdbda8ad3a031f600c00004e811e8c9]: Unable to stat file [/data/drdata/v_drdata_node0003_data/953/02088b24417bcfcc7bdbda8ad3a031f600c00004e811e8c9_0.gt]: No such file or directory
2025-06-29 00:43:27.361 nameless:7f71327fc700 <WARNING> @v_drdata_node0003: 01000/3938: MiniRos 54043216601852351 does not have proper SAL files
Those end here where it shuts down in a failed state.
2025-06-29 00:43:27.595 nameless:7f71327fc700 [SAL] <WARNING> Error getting size of bundle file of [02088b24417bcfcc7bdbda8ad3a031f600c00004e80f06b5]: Unable to stat file [/data/drdata/v_drdata_node0003_data/005/02088b24417bcfcc7bdbda8ad3a031f600c00004e80f06b5_0.gt]: No such file or directory
2025-06-29 00:43:27.595 nameless:7f71327fc700 <WARNING> @v_drdata_node0003: 01000/3938: MiniRos 54043216601614093 does not have proper SAL files
...
2025-06-29 00:43:30.782 Main:0x7f72cc05f600-fff0000000000cc1 [Catalog] <INFO> CorruptPartition: table dauser.stg_etl_device has corrupt partition 1
2025-06-29 00:43:30.782 Main:0x7f72cc05f600-fff0000000000cc1 [Catalog] <INFO> CorruptPartition: table dauser.stg_etl_poll_item has corrupt partition 1
2025-06-29 00:43:30.782 Main:0x7f72cc05f600-fff0000000000cc1 [Catalog] <INFO> CorruptPartition: table dauser.stg_etl_item_type has corrupt partition 1
2025-06-29 00:43:30.782 Main:0x7f72cc05f600-fff0000000000cc1 [Catalog] <INFO> CorruptPartition: table dauser.item is not partitioned, projection dauser.item_by_item_id_seg_b0 has corrupt file
...
2025-06-29 00:43:30.796 Main:0x7f72cc05f600-fff0000000000cc1 [Recover] <INFO> Loading UDx libraries
2025-06-29 00:43:30.796 Main:0x7f72cc05f600-fff0000000000cc1 [Recover] <WARNING> Cannot load library file [/catalog/drdata/v_drdata_node0003_catalog/Libraries/0261a8064d1b8a5354ce406dbd72ea6600a000000000049a/IdolLib_0261a8064d1b8a5354ce406dbd72ea6600a000000000049a.so], It was built with incompatible SDK Version: 9.0.1, expected version: 10.1.0
...
2025-06-29 00:43:32.776 Main:0x7f72cc05f600-fff0000000000cc1 [Recover] <WARNING> Error during setting up function, message: The library was built with an incompatible SDK
Final failure takes place here.
2025-06-29 00:43:32.863 Main:0x7f72cc05f600-fff0000000000cc1 [Catalog] <WARNING> Couldn't load libraries
2025-06-29 00:43:32.863 Main:0x7f72cc05f600-fff0000000000cc1 <PANIC> @v_drdata_node0003: VX001/2973: Data consistency problems found; startup aborted
HINT: Check that all file systems are properly mounted. Also, the --force option can be used to delete corrupted data and recover from the cluster
LOCATION: mainEntryPoint, /data/qb_workspaces/jenkins2/ReleaseBuilds/Hammermill/REL-10_1_1-x_hammermill/build/vertica/Basics/vertica.cpp:1810
2025-06-29 00:43:32.967 Main:0x7f72cc05f600-fff0000000000cc1 [Main] <PANIC> Wrote backtrace to ErrorReport.txt
2025-06-29 00:43:32.967 Main:0x7f72cc05f600-fff0000000000cc1 [Main] <ALL> Core dumped to /catalog/drdata/v_drdata_node0003_catalog
All supported Network Observability DX NetOps Performance Management Data Repository Vertica database releases
Unexpected reboot of the node server due to virtual environment problems where the server is hosted.
Ensure the virtual environment in use is stable or resolve the issues creating instability to ensure the nodes remain up.