Before rolling out a patch in a large SSIM environment it is advise to follow these guidelines:
1- Bring down the number of incidents within the system:
- Run a advanced SQL query like this : “SELECT COUNT(INCIDENT_ID) FROM SYMCMGMT.SYMC_IMR_INCIDENT_LIST_VIEW”. The Value returned is the number of item in DB. Keep this number low if possible it will reduce the size of the DB. If this value is more than 20000 incidents/alerts, they will need to be purged.
- Verify "Incident Archiving" settings in your web interface for the correlation/archive appliance(s):
- The recommendations from engineering would be to set to 20 days for Incidents (removing all open and closed incidents). Short term should be set to 14 days and long term to 30 days.
- Reducing the number of incidents/alerts helps in also increasing correlation performances.
2- Bring down rate of incidents entering the system;
Review default rules that can trigger large number of entries (especially if some Agents are out of time sync or turned off):
- Alerts created by the system sensor malfunction rule.
- Alerts created by invalid event time
Make sure no rules trigger False positive incidents
- Incidents created by the IP watch list rule.
- Custom made rules
3- If you have change settings in purge option wait for the next purge job and run your advanced SQL query again to see how many incidents/Alerts are left in the DB.
4- It would be advisable to reduce the number of events coming in during the update as it might take a while (in a multi SSIM environment with high EPS):
Review your top 5 sensors and if possible disable them all for the duration
- Only disable the sensors that have a pointer file so they will resume their reading where they stopped
- Busy sensor like Checkpoint Collector or databases will resume the reading of events where they stopped so you won’t loose any information
This will make the system less overloaded during reboot period and avoid to much queuing on the agents and collector appliances.
5- Make sure all the appliance's configurations are "In Sync"
- Go to System Tab -> Administration
- Right click on appliance and select properties then go to the Services tab
- Review all the configuration and make sure the "in Sync" flag is set to Yes
- If this is not the case try to distribute them and restart the corresponding service if needed.
Once the steps above have been done you can start the update process:
#Before starting backup your system make sure the following:
- User name -> You need to make sure all the user names (db2admin, cn=root, Administrator) are the same on all appliances. This is critical that you know all the username password before starting your migration. If you want to know how to change password go to : www.symantec.com/docs/TECH89260
- LiveUpdate -> All appliances needs to be up to date and have LiveUpdate done.
- Disk space -> Make sure there is disk space available on each appliance on all volumes. Read the documentation of the patch as they always specify the disk requirement.
- In case of updating with Maintenance Patch 3 (MP3), be aware that the tar file is already 600Mb and once extracted it takes 1.3Gb of disk. You will need at least 2Gb free on top of this.
- Pre Patch requirement -> Verify your /etc/ssim-history on each appliance and make sure you have the required patch installed.
- Save all work and close any Information Manager console sessions. Make all console users are aware of the downtime and that they don't run or change settings during migration. Before running upgrade you can log to web interface and go to "User Sessions" page to see who is logged.
#Backup -> You need to make full DB2 and ldap backup (In general there is no roll back option once you start the update.)
You need also to follow this article if relevant : http://www.symantec.com/docs/TECH89265
Order to apply patch:
This is the most important step, you need to understand your architecture, what job each appliance is fulfilling
#Apply any patch or update in this order:
- Master Ldap -> Open the system tab -> Administration and browse to directories. It should display the Primary
- Replica Ldap -> Open the system tab -> Administration and browse to directories. It should display the Replica
- Correlation -> Open the system tab -> Appliance tab (check if correlation box is enabled)
- Archive -> Open the system tab -> Appliance tab -> Event Archive (check what the machine is archiving)
- Collector appliance 1
- Collector appliance 2
- Collector appliance n+1
To apply update or upgrade:
- Make sure you review and study the readme and release note document.
- Make sure before running the script that you check the integrity of the files (md5sum)
Only update one appliance at time. For each appliance update after the reboot:
- Wait around 5 minutes (running status command and that each service is up for 5 minutes) before login.
- Verify that the events flow works, correlation etc…
- Make sure that all the Event queues are flushed and empty -> Web configuration -> Event Service option
- Once you are happy with update, make a backup of the ldap and DB2
- Go to the next appliance (following order above)
Important : In case of 4.7 Maintenance Pack 4 upgrade, be sure to update all the machines in your environement to 4.7.4.54 version before starting to update the master ldap to a great version. If this happens you will get error message saying : "this is a replica sim, first upgrade the directory master". Once all your machines are on 4.7.4.54 version you can start to apply Patches on the master ldap. This is becasue the install script expect the exact matching version.