We have a need to monitor for duplicate files in a directory that have a date in the name.
For instance,
ABCPXXXXX-INV-2021-09-24-01-21.pgp
ABCPXXXXX-INV-2021-09-24-01-48.pgp
I've tried to use regular expressions in the file name, but these may only work in directory names, as when trying to use the fetch value it returns 0 files.
I've tried several different matching patterns such as
ABCPXXXXX-INV-%Y-%m-%d*.pgp
with no luck. Documentation only mentions date related regex for directory names, but can you confirm that it will not work for file names?
Release : 20.3
Component : UIM - DIRSCAN
Create a logmon profile to run a command or script to determine if there are duplicate files in a folder and send a custom alarm message. The logmon Watcher can handle a more robust-useful regex than the dirscan probe and you can then use the 'time-formatting' primitives in logmon.
Here is one example of a Watcher regex for a changing file name using year, month, day, hour, minute and second.:
TextLog_%y.%m.%d_%H.%M.%S.log
But in this case you may just be able to simply run the script and parse the resultant output and send a custom alarm message.
Reference:
So, you could check for dupe files based on size, content or name. Here is an example of checking for dupe files in Linux by MD5 Checksum (content). You can find more scripts via web search.
awk '{ md5=$1 a[md5]=md5 in a ? a[md5] RS $2 : $2 b[md5]++ } END{for(x in b) if(b[x]>1) printf "Duplicate Files (MD5:%s):\n%s\n",x,a[x] }' <(find . -type f -exec md5sum {} +)
Sample results:
# awk '{
md5=$1
a[md5]=md5 in a ? a[md5] RS $2 : $2
b[md5]++ }
END{for(x in b)
if(b[x]>1)
printf "Duplicate Files (MD5:%s):\n%s\n",x,a[x] }' <(find . -type f -exec md5sum {} +)
Duplicate Files (MD5:d41d8cd98f00b204e9800998ecf8427e):
./ABCPXXXXX-INV-2021-09-24-01-21.pgp
./ABCPXXXXX-INV-2021-09-24-01-48.pgp
In this case, my scenario above, both files are empty but have the same checksum, hence are dupes.