sm_monitor utility usage
search cancel

sm_monitor utility usage

book

Article ID: 315858

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

To provide steps on how to run sm_monitor in Smarts server.

Definition:
sm_monitor is a support tool/utility that is used to troubleshoot performance issues in Smarts domains that attach to the broker.

Environment

Smarts 10.1.x

Resolution

What is the Smarts sm_monitor support tool?
How do I gather sm_monitor data as requested by Support for Smarts?
How do I implement Smarts sm_monitor support tool?




The VMware Smarts sm_monitor is a support utility that gathers various performance related data when implemented against a particular server. This data can help identify performance bottlenecks, and so on. 

The sm_monitor utility is provided by default in all Smarts products.

The sm_monitor files will be in the /bin, bin/system and /rules/utils directories.

Running sm_monitor for troubleshooting

You can run this against the relevant Smarts server as below using a cron job as follows:

sm_monitor -s <SAM_SERVER_NAME>

Support will recommend an interval at which to run the above in cron. This will typically be every 10-15 minutes for troubleshooting of any performance related issues. The sm_monitor utility needs to remain running, until the issue being investigated recurs. Once the issue is seen again, you will need to collect a zip file of the sm_monitor log file directory. For instructions on how to collect this file, see the readme included with utility referenced in the preceding section of this Fix statement.

Locating files and Logs for sm_monitor: 

After sm_monitor runs it creates an SM_Monitor-<Domain Name> folder in the ../local/logs/ directory that contains a number of files.  Users can check to see if the files have a modification date of the last run of sm_monitor.  Tar this SM_Monitor-<Domain Name> folder and attach it to the SR if the information is needed for trouble shooting purposes.  

Full sm_monitor options:
 

Usage: sm_monitor -s <server> [options... ]
      * -s - Name of InCharge Server to Monitor

Options:
                -b - Define new broker
                -m -  mem|correlation|run-all
                        mem - memory growth or crash data collection ( default )
                        correlation - Network device or interface correlated incorrect status
                        run-all* - run both mem and correlation data collection <not recommended>
                -S - Process Type ( adapter/trapd, default is server )
                -t - Turn on truss monitoring
                -z - Turn on stacktrace ouput
                -k - Turn on lock table output
                -w - Define sleeptime between loops (Default 25 for Memory only)
                -l - Define number of loops (Default 1)
                -h - Displays this usage screen
                -v - Version

*mem run-all should be avoided if top shows any swap usage or free memory indicates memory resources are low.

Example below:

./sm_monitor -s <server name> -p <pid> -m <memory> -z <stacktrace> -l <loops> -w <sleeptime>

./sm_monitor -s SAM_DOMAIN -p 4322 -m mem -z -l 2 -w 120

If using cron job remove -l 2 -w 120
You can control the timing via the cron job and run command as root to generate the sm_monitor logs in the local/logs folder 
 


Workaround:
If there are resource concerns then you can use the commands below along with the Linux top command to evaluate resources.  

If the Host or Host VM is using swap memory, increasing memory is imperative.  If that is not possible the number of domains needs to reduced on this Host or HostVM.

dmctl - s <Domain-name> exec dmdebug --threads --output=threads.txt
dmctl - s <Domain-name> exec dmdebug --queues --output=queues.txt
dmctl - s <Domain-name> exec dmdebug --clients --output=clients.txt
dmctl - s <Domain-name> exec dmdebug --stacktrace --output=stacktrace.txt
dmctl - s <Domain-name> exec dmdebug --locktable --output=locktable.txt


Additional Information

Impact/Risks:
If the system is already facing resource issues with CPU, Memory or Disk space, as determined by using top or disk space commands sm_monitor may cause the host to run out of resources and the Smarts domains may crash.