VMware Smart Assurance NCM: After NCM upgrade on distributed setup, openssl connectivity from AS to all DS work fine, jobs fail on certain DS
search cancel

VMware Smart Assurance NCM: After NCM upgrade on distributed setup, openssl connectivity from AS to all DS work fine, jobs fail on certain DS

book

Article ID: 345323

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
NCM Distributed setup is upgraded recently from version 9.3 to 9.6, OS on Linux servers is also upgraded to RHEL 6.8. There are 10 DS associated to AS, after upgrade jobs on devices which belong to 3 DS are not working. Verified with instructions on KB https://kb.vmware.com/s/article/447603 that openssl connectivity works from AS to 3 non-working DS.

Environment

VMware Smart Assurance - NCM

Cause

1) Checked the non-working DS upgrade/install log file and below Non-fatal errors are observed.

Installation: Successful with errors.

1924 Successes
0 Warnings
4 NonFatalErrors
0 FatalErrors

Install File:             /apps/smarts-ncm/bin/sm_logerror
                          Status: ERROR
                          Additional Notes: ERROR - ZeroGpl: /apps/smarts-ncm/bin/sm_logerror (Text file busy)   
      

2) On non-working DS service status shows below, which means commmgrd, syssyncd, zebedee services are not operational.
root      7566  7562  0 12:58 pts/0    00:00:00 evdispatchd none
root      7568  7562 13 12:58 pts/0    00:00:26 autodiscd -r
root      8009  7562  0 13:01 pts/0    00:00:00 [syssyncd] <defunct>
root      8024  7562  0 13:01 pts/0    00:00:00 [commmgrd] <defunct>
root      8029  7562  0 13:01 pts/0    00:00:00 [zebedee] <defunct>


3) Tried to start the commmgrd service manually from $VOYENCE_HOME/bin and it gives following error. This indicates that NCM 9.6 commmgrd binary is still pointing to old library files.

[root@ds1 bin]# ./commmgrd -r
./commmgrd: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
[root@ds1 bin]# 



Whereas autodiscd process is working as expected:

[root@ds1 bin]# ./autodiscd -r
20200219.13:24:14: 13243: (3)DEBUG  : getBootCounter: found entry (file) (engine id) (boot counter): (/apps/smarts-ncm/logs/voyence_devserver_autodiscd_boot_counter), (VoyenceDevServer), (51)
20200219.13:24:14: 13243: (5)INFO   : saveBootCounter: Saved counter (file) (engine id) (boot): (/apps/smarts-ncm/logs/voyence_devserver_autodiscd_boot_counter), (VoyenceDevServer), (52)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added auth protocol (id): (3)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added auth protocol (id): (2)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added priv protocol (id): (2)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added priv protocol (id): (4)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added priv protocol (id): (20)
20200219.13:24:14: 13243: (6)INFO   : AuthPriv: Added priv protocol (id): (21)


4) Ran ldd on commmgrd and can find that it is still pointing to old version NCM 9.3 (19.0) of library, where as autodiscd is pointing correctly to version NCM 9.6 (25.0) of library.

[root@ds1 bin]# ldd commmgrd
        linux-vdso.so.1 =>  (0x00007fff59342000)
        libvcdasl.so.19.0 => /apps/smarts-ncm/lib/libvcdasl.so.19.0 (0x00007efdea974000)

[root@ds1 bin]# ldd autodiscd
        linux-vdso.so.1 =>  (0x00007ffe587e0000)
        libvcdasl.so.25.0 => /apps/smarts-ncm/lib/libvcdasl.so.25.0 (0x00007f48b3de2000)




5) Ran ldd on working DS binaries, it is pointing correctly to 9.6 version of library.

[root@ds2 bin]# ldd commmgrd
        linux-vdso.so.1 =>  (0x00007ffda4b87000)
        libvcdasl.so.25.0 => /apps/smarts-ncm/lib/libvcdasl.so.25.0 (0x00007f4331090000)
[root@ds2 bin]# ldd autodiscd
        linux-vdso.so.1 =>  (0x00007ffcf61e1000)
        libvcdasl.so.25.0 => /apps/smarts-ncm/lib/libvcdasl.so.25.0 (0x00007f85cf9f8000)



6) Checked the binaries date at $VOYENCE_HOME/bin on non-working DS and can find that 4 binaries are not updated. Updated files of NCM 9.6 will have Jan 2019 as the date, NCM 9.3 files have June 2014 as the date.

[root@ds1 bin]# ls -lrth
total 63M
-rwxr-x--- 1 root voyence 2.7M Jun  1  2014 syssyncd
-rwxr-x--- 1 root voyence 354K Jun  1  2014 zebedee
-rwxr-x--- 1 root voyence  15M Jun  1  2014 commmgrd
-rwxr-x--- 1 root voyence  16K Jun  1  2014 sm_logerror
drwxrwx--- 6 root root    4.0K Sep  2  2015 demoCA
-rwxr-x--- 1 root root    2.0K Jan 15  2019 applyPerms.pl
-rwxrwx--- 1 root root    9.5K Jan 15  2019 versiondb.pl
-rwxr-x--- 1 root voyence 1.2M Jan 15  2019 voyenced
-rwxr-x--- 1 root voyence 2.7K Jan 15  2019 voyence
-rwxr-x--- 1 root voyence 1.1M Jan 15  2019 vccheck
-rwxr-x--- 1 root voyence 1.1M Jan 15  2019 setupcron
-rwxr-x--- 1 root voyence 1.1M Jan 15  2019 sendsig
-rwxr-x--- 1 root voyence 9.4K Jan 15  2019 openssl.cnf
-rwxr-x--- 1 root voyence 4.8K Jan 15  2019 makekeys.sh
-rwxr-x--- 1 root voyence 6.2K Jan 15  2019 importcertsintods.pl
-rwxr-x--- 1 root voyence 3.6K Jan 15  2019 exportcertsintopkcs.pl
-rwxr-x--- 1 root voyence 2.8M Jan 15  2019 evdispatchd
-rwxr-x--- 1 root voyence 3.2M Jan 15  2019 cfgmgrd
-rwxr-x--- 1 root voyence 4.2K Jan 15  2019 CA.sh
-rwxr-x--- 1 root voyence  519 Jan 15  2019 vcmaster.service
-rwxr-x--- 1 root voyence 9.2K Jan 15  2019 vcmaster
-rwxr-x--- 1 root voyence  478 Jan 15  2019 tomcat.service
-rwxr-x--- 1 root voyence 7.4M Jan 15  2019 packageorder
-rwxr-x--- 1 root voyence 2.1M Jan 15  2019 libsshtools-daemon-unix.so
-rwxr-x--- 1 root voyence 9.2M Jan 15  2019 libproxy.so
-rwxr-x--- 1 root voyence 1.3M Jan 15  2019 cstdriver
-rwxr-x--- 1 root voyence  540 Jan 15  2019 controldb.service
-rwxr-x--- 1 root voyence  12M Jan 15  2019 autodiscd
-rwsr-x--- 1 root voyence 3.5M Jan 15  2019 sysmon

 

Resolution

Follow the below steps to resolve the issue:
  • Take backup of the files syssyncd, zebedee, commmgrd & sm_logerror at $VOYENCE_HOME/bin directory on non-working 9.6 device server. Copy the same files from working 9.6 DS to non-working DS.
  • Change the ownership on the copied files to root:voyence with below command -

chown root:voyence syssyncd zebedee commmgrd sm_logerror