In this example we will use logmon probe and run a command and use a regex filter to pull out and store the bi and bo values for disk I/O and store them in group variables.
Highlighted in blue are the two variables we are going to capture in this example.
the
This is an alternative option to monitor Disk I/O QoS by using the logmon probe.
This guide is intended to provide a general overview and you should bear in mind some of the Linux commands may not be available or provide differing output depending on your system.
First lets take a look at the linux command for Disk IO stats:
# vmstat --help
usage: vmstat [-V] [-n] [delay [count]]
-V prints version.
-n causes the headers not to be reprinted regularly.
-a print inactive/active page stats.
-d prints disk statistics
-D prints disk table
-p prints disk partition statistics
-s prints vm table
-m prints slabinfo
-S unit size
delay is the delay between updates in seconds.
unit size k:1000 K:1024 m:1000000 M:1048576 (default is K)
count is the number of updates.
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 253668 21240 104600 116712 0 0 3 46 11 11 1 1 98 0 0
The simple command vmstat provides various information including IO usage:
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
Tick the checkbox "Generate Quality of Service"
Variables
Next select the Variables tab:
QOS Tab
Unfortuantly things are rarely that easy and the first time round you will probably come across a problem or typo somewhere along the line. Under the "settings" section of logmon (the grey cogwheel to left of the window )
Set the log level to 3:
Right clicking the probe and selecting "View Log" or simply CTRL+V on the probe will show the log window.
TIP: for diagnosing logmon problems it is a good idea to change the "Check interval" to just a few seconds helping speed up diagnostics, remember to set it back to something practical once complete.
Lets take a look at a sucessful pattern match:
logmon: [Linux Disk IO] NO MATCH [BI and BO Values] offset now 0
logmon: [Linux Disk IO] FORMAT START [default] - ' 1 0 227820 23316 94108 95284 0 0 3 43 1 2 1 1 98 0 0'
logmon: [Linux Disk IO] FORMAT LINES [default] - ' 1 0 227820 23316 94108 95284 0 0 3 43 1 2 1 1 98 0 0'
logmon: (scan) BI and BO Values offset 0
logmon: [Linux Disk IO] MATCH [BI and BO Values] on line 0
logmon: SREQUEST: post ->192.168.1.50/48001
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: (LogMon) CiOpen Device Success.
logmon: (LogMon) CiOpen Device Success.
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: [Linux Disk IO] used 14 ms scanning 241 bytes
logmon: (scan) - before ptScanClose...
logmon: (ptScanClose) - closing Linux Disk IO modified
logmon: (ptScanClose) - before storeInDB
logmon: (ptScanClose) - after storeInDB
logmon: (ptScanClose) - leaving
logmon: (scan) - after ptScanClose...
Lets not go into too much detail over what is heppening here but its useful to know what your looking for when things are working!
The full config for this exercise can be found at the bottom of this document, just copy it into the logmon.cfg file
As promised lets take a look at the actual syntact for the regex "Pattern Match"
NOTE: I seriously reccommend a copy of regex buddy or similar regex highlighting tool if your going to work through this example
Using the unix command
# vmstat
We can return the current disk I/O usage for a given machine:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 2257564 232040 2880 206620 0 1 14 11 4 9 2 14 84 0
In this example we will use regex to pull out and store the bi and bo values for disk I/O and store them in group variables. Highlighted in blue are the two variable we are going to capture in this example.
We use the following syntax to capture these values and place them in individual groups to use as QoS variables.
/[0-9]{\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/
Looks a little confusing and overwhalming for the newbie users of regex but fear not it's actually quite simple when broken down:
Logmon regex is always "wrapped" in opening and closing "/" this is how we tell Nimsoft NMS we are about to start using REGEX and not standard pattern matching.
If you look closely you will see a repeating pattern in the above syntax:
[0-9]+\s+
this breaks our row of numbers into columns:
2 0 2257564 232040 2880 206620 0 1 14 11 4 9 2 14 84 0
Lets look a little closer:
[0-9] - Match any numeric character of value 0 - 9
+ - Any number of times
\s - Match white spaces
+ - Any number of times
Repeating that syntax for each "column", in this case there is 8 columns before the first value we want to capture for our QoS data:
[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+
We then simply place brackets around the next numberic pattern match:
([0-9]+)
This gives us our first group for the BI value in the 9th column. Adding:
\s+ - Match white spaces any number of times
captures the white space between the BI and BO columns. We then use a second numeric group match to capture our second QoS value for the BO value.
([0-9]+)\s+([0-9]+)
^Group 1 ^Group 2
Combining the above gives us a full syntax of:
/[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/
Below is a snippet from the logmon CFG file. This should be quite portable so feel free to insert into your existing logmon.cfg file.
<Linux Disk IO>
active = yes
interval = 5 min
scanfile = vmstat
scanmode = command
alarm = no
qos = yes
message =
subject =
max_alarms =
max_alarm_msg =
password =
<watchers>
<BI and BO Values>
active = yes
match = /[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/
level = information
subsystemid =
message =
i18n_token =
restrict =
expect = no
abort = no
sendclear = no
count = no
separator =
suppid =
source =
qos =
runcommandonmatch = no
commandexecutable =
commandarguments =
expect_message =
expect_level =
<variables>
<bi>
definition = $1
operator =
threshold =
qosactive = yes
qosname = <Default>
qostarget =
</bi>
<bo>
definition = $2
operator =
threshold =
qosactive = yes
qosname = <Default>
qostarget =
</bo>
</variables>
</BI and BO Values>
</watchers>
</Linux Disk IO>