IO Count from vmstat - logmon probe

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

In this example we will use logmon probe and run a command and use a regex filter to pull out and store the bi and bo values for disk I/O and store them in group variables.

Highlighted in blue are the two variables we are going to capture in this example.
the
This is an alternative option to monitor Disk I/O QoS by using the logmon probe.

This guide is intended to provide a general overview and you should bear in mind some of the Linux commands may not be available or provide differing output depending on your system.

Environment

Any logmon version

Resolution

Getting the Raw Data

First lets take a look at the linux command for Disk IO stats:

# vmstat --help

usage: vmstat [-V] [-n] [delay [count]]
-V prints version.
-n causes the headers not to be reprinted regularly.
-a print inactive/active page stats.
-d prints disk statistics
-D prints disk table
-p prints disk partition statistics
-s prints vm table
-m prints slabinfo
-S unit size

delay is the delay between updates in seconds.
unit size k:1000 K:1024 m:1000000 M:1048576 (default is K)
count is the number of updates.

# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 253668 21240 104600 116712 0 0 3 46 11 11 1 1 98 0 0

The simple command vmstat provides various information including IO usage:

Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.

Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)

Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).

System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.

CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

Logmon Configuration

Deploy logmon from the archive and select and right-lick - configure the probe:
Create a new profile by right clicking in the left hand window and selecting "new", name this profile Linux Disk IO or what ever you choose:
Change the mode drop down box to "command" to issue a bash command to the Linux robot. Type in "vmstat" in the command input text box. Select an appropriate "Check Interval" , this is the period of time it takes between checking the vmstat command again and generating a new QoS entry.

Tick the checkbox "Generate Quality of Service"

"Generate Alarm" this is out of the scope for this document but could be used by creating another profile to watch for IO values over a specific threshold. For this tutorial we are just interested in QoS.

Format Rules Tab:

This can be left blank, ensure nothing is active here

Watcher Tab:

This is the core of the configuration to parse the output of the vmstat command and select the required variables.
Create a new watcher profile by right clicking in the vertical pane and selecting new, for this example I call the profile "BI and BO Vales"
The match expression text box is where we will place the regex for pattern matching our variable and where this gets a more complex. We will explain the rexeg syntax in greater detail in a later section.

Variables

Next select the Variables tab:

Right click and select new in the left hand column to create a new variable profile. In this example we will call it "BI and BO Values".
Ensureing this new profile is highlighted right click in the variables window and select new to create a new variable.
In the name text box enter "bi". Under the "Source FROM position" select Match and ensure character position is 1. Now this section is not the most intuitive for a new user of logmon and I will take the time to try and explain what is happening here.
Selecting "Match Expression" uses the REGEX syntax from the watcher as our variable input. REGEX is a very powerful tool or pattern matching and any given regex syntax may include "variable groups" lets look at a basic example:
The\squick\sbrown\s(fox)\sjumps\sover\sthe\slazy\s(dog)
The above syntax matches the whole expression "The quick brown fox jumps over the lazy dog" however the words "fox" and "dog" are in brackets which create "groups". The first group is the first syntax to be placed in brackets, in this case fox, however this could be a complex regex expression if required.
So taking this understanding of regex groups and appying it to our scenario above, we can use the "Match Expression" and the "Character Position" preferances to match one of the regex groups to use as a variable.
When we look at the REGEX syntax used for this tutorial in closer detail we will see that the regex group 1 is in fact the value for the Disk IO BI value. This will be used as our QoS data.
Go ahead and repeat the above steps to create another variable for "bo" using character position 2 to take the 2nd regex group as its input.

QOS Tab

In the botton right hand corner of the window is the "QoS on Variables" section. Ensure each of our newly created variables are checked. This will generate QoS data based on these variables.
NOTE: For future understanding it is important to note only numeric values can be used as QoS data. Logmon variables can be used to match words and phrases which is useful for parsing into the alarm messages no QoS can be generated on anything alphanumeric.
All done! Select the apply button and reload the probe. All things well the logmon probe will start to capture QoS data.

Unfortuantly things are rarely that easy and the first time round you will probably come across a problem or typo somewhere along the line. Under the "settings" section of logmon (the grey cogwheel to left of the window )

Set the log level to 3:

Right clicking the probe and selecting "View Log" or simply CTRL+V on the probe will show the log window.

TIP: for diagnosing logmon problems it is a good idea to change the "Check interval" to just a few seconds helping speed up diagnostics, remember to set it back to something practical once complete.

Lets take a look at a sucessful pattern match:

logmon: [Linux Disk IO] NO MATCH [BI and BO Values] offset now 0
logmon: [Linux Disk IO] FORMAT START [default] - ' 1 0 227820 23316 94108 95284 0 0 3 43 1 2 1 1 98 0 0'
logmon: [Linux Disk IO] FORMAT LINES [default] - ' 1 0 227820 23316 94108 95284 0 0 3 43 1 2 1 1 98 0 0'
logmon: (scan) BI and BO Values offset 0
logmon: [Linux Disk IO] MATCH [BI and BO Values] on line 0
logmon: SREQUEST: post ->192.168.1.50/48001
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: (LogMon) CiOpen Device Success.
logmon: (LogMon) CiOpen Device Success.
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: RREPLY: status=OK(0) <-192.168.1.50/48001 h=37 d=28
logmon: SREQUEST: _close ->192.168.1.50/48001
logmon: [Linux Disk IO] used 14 ms scanning 241 bytes
logmon: (scan) - before ptScanClose...
logmon: (ptScanClose) - closing Linux Disk IO modified
logmon: (ptScanClose) - before storeInDB
logmon: (ptScanClose) - after storeInDB
logmon: (ptScanClose) - leaving
logmon: (scan) - after ptScanClose...

Lets not go into too much detail over what is heppening here but its useful to know what your looking for when things are working!

The full config for this exercise can be found at the bottom of this document, just copy it into the logmon.cfg file

As promised lets take a look at the actual syntact for the regex "Pattern Match"

NOTE: I seriously reccommend a copy of regex buddy or similar regex highlighting tool if your going to work through this example

Using the unix command

# vmstat

We can return the current disk I/O usage for a given machine:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 2257564 232040 2880 206620 0 1 14 11 4 9 2 14 84 0

In this example we will use regex to pull out and store the bi and bo values for disk I/O and store them in group variables. Highlighted in blue are the two variable we are going to capture in this example.

We use the following syntax to capture these values and place them in individual groups to use as QoS variables.

/[0-9]{\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/

Looks a little confusing and overwhalming for the newbie users of regex but fear not it's actually quite simple when broken down:

Logmon regex is always "wrapped" in opening and closing "/" this is how we tell Nimsoft NMS we are about to start using REGEX and not standard pattern matching.

If you look closely you will see a repeating pattern in the above syntax:

[0-9]+\s+

this breaks our row of numbers into columns:

2 0 2257564 232040 2880 206620 0 1 14 11 4 9 2 14 84 0

Lets look a little closer:

[0-9] - Match any numeric character of value 0 - 9

+ - Any number of times

\s - Match white spaces

+ - Any number of times

Repeating that syntax for each "column", in this case there is 8 columns before the first value we want to capture for our QoS data:

[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+

We then simply place brackets around the next numberic pattern match:

([0-9]+)

This gives us our first group for the BI value in the 9th column. Adding:

\s+ - Match white spaces any number of times

captures the white space between the BI and BO columns. We then use a second numeric group match to capture our second QoS value for the BO value.

([0-9]+)\s+([0-9]+)

^Group 1 ^Group 2

Combining the above gives us a full syntax of:

/[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/

Logmon CFG

Below is a snippet from the logmon CFG file. This should be quite portable so feel free to insert into your existing logmon.cfg file.

<Linux Disk IO>
active = yes
interval = 5 min
scanfile = vmstat
scanmode = command
alarm = no
qos = yes
message =
subject =
max_alarms =
max_alarm_msg =
password =
<watchers>
<BI and BO Values>
active = yes
match = /[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+[0-9]+\s+([0-9]+)\s+([0-9]+)/
level = information
subsystemid =
message =
i18n_token =
restrict =
expect = no
abort = no
sendclear = no
count = no
separator =
suppid =
source =
qos =
runcommandonmatch = no
commandexecutable =
commandarguments =
expect_message =
expect_level =
<variables>
<bi>
definition = $1
operator =
threshold =
qosactive = yes
qosname = <Default>
qostarget =
</bi>
<bo>
definition = $2
operator =
threshold =
qosactive = yes
qosname = <Default>
qostarget =
</bo>
</variables>
</BI and BO Values>
</watchers>
</Linux Disk IO>