Monitoring count of files with same year and day in the name

book

Article ID: 225583

calendar_today

Updated On:

Products

DX Infrastructure Management CA Unified Infrastructure Management for z Systems CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) NIMSOFT PROBES

Issue/Introduction

We have a need to monitor for duplicate files in a directory that have a date in the name.

For instance,

ABCPXXXXX-INV-2021-09-24-01-21.pgp

ABCPXXXXX-INV-2021-09-24-01-48.pgp

I've tried to use regular expressions in the file name, but these may only work in directory names, as when trying to use the fetch value it returns 0 files.

I've tried several different matching patterns such as

ABCPXXXXX-INV-%Y-%m-%d*.pgp

with no luck.  Documentation only mentions date related regex for directory names, but can you confirm that it will not work for file names?

Environment

Release : 20.3

Component : UIM - DIRSCAN

Resolution

Create a logmon profile to run a command or script to determine if there are duplicate files in a folder and send a custom alarm message. The logmon Watcher can handle a more robust-useful regex than the dirscan probe and you can then use the 'time-formatting' primitives in logmon.

Here is one example of a Watcher regex for a changing file name using year, month, day, hour, minute and second.:

TextLog_%y.%m.%d_%H.%M.%S.log

But in this case you may just be able to simply run the script and parse the resultant output and send a custom alarm message.

Reference:

logmon Advanced IM Configuration

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/alphabetical-probe-articles/logmon-log-monitoring/logmon-im-configuration/logmon-advanced-im-configuration.html

So, you could check for dupe files based on sizecontent or name. Here is an example of checking for dupe files in Linux by MD5 Checksum (content). You can find more scripts via web search.

awk '{ md5=$1 a[md5]=md5 in a ? a[md5] RS $2 : $2 b[md5]++ } END{for(x in b) if(b[x]>1) printf "Duplicate Files (MD5:%s):\n%s\n",x,a[x] }' <(find . -type f -exec md5sum {} +)

Sample results:

# awk '{
  md5=$1
  a[md5]=md5 in a ? a[md5] RS $2 : $2
  b[md5]++ }
  END{for(x in b)
        if(b[x]>1)
          printf "Duplicate Files (MD5:%s):\n%s\n",x,a[x] }' <(find . -type f -exec md5sum {} +)

Duplicate Files (MD5:d41d8cd98f00b204e9800998ecf8427e):
./ABCPXXXXX-INV-2021-09-24-01-21.pgp
./ABCPXXXXX-INV-2021-09-24-01-48.pgp

In this case, my scenario above, both files are empty but have the same checksum, hence are dupes.