This knowledge document gives an overview of basic Fault Isolation and Fault Suppression in Spectrum.
Please reference the "Fault Isolation" section of the documentation for additional information.
Release : All Supported Releases
Component : SPCCSS - Spectrum Core / SpectroSERVER
Spectrum relies on the modeling of the devices in the database. How they are connected. When two device models are connected, Spectrum populates what is referred to an the "Neighbor Table" on each model. So the modeling of the network is vital for accurate Fault Isolation and Fault Suppression.
Using the following basic modeling:
Sim15090 has Sim15089 and Sim1598 for neighbors.
Sim15089 has only Sim15090 as a neighbor.
Sim15098 has Sim15090 and Sim15095 as neighbors.
Sim15095 has only Sim15098 as a neighbor.
When Spectrum loses contact with a model, Fault Isolation checks the the status of the models neighbors. If Spectrum cannot communicate with ALL of a models neighbor models, then logically, the device associated with the model could be up but because Spectrum cannot communicate with any of it's neighbors, Spectrum does not know either way. So, Spectrum suppresses the model (the model turns to gray) and asserts the 0x10d36 event stating the VNM is unable to contact this model because all of this models neighbors are unreachable.
If Spectrum can communicate with at least one neighbor, then Spectrum logically assumes this model is the root cause of the lost contact. Spectrum asserts a Critical lost contact alarm on the model.
The following is an example of each.
Spectrum lost contact with model Sim15089. Spectrum checks the status of it's only neighbor Sim15090. Spectrum can communicate with Sim15090 so Spectrum asserts a Critical lost contact alarm on Sim15089.
Spectrum lost contact with Sim15095. Spectrum checks the status of it's only neighbor Sim15098. Spectrum cannot communicate with Sim15098 so Spectrum asserts a Suppressed condition on Sim15095.
Since Spectrum is not able to communicate with Sim15098, Spectrum will check the status of it's neighbors. Spectrum cannot communicate with Sim15095. Spectrum can communicate with Sim15090 so Spectrum asserts a Critical lost contact alarm on Sim15098.
A group of two or more interconnected models is referred to as a "Fault Doman". The example above is a Fault Domain of four models.
Out of the box, when contact is lost to ALL of the models in a Fault Domain, Spectrum will suppress ALL of the models in the Fault Domain and assert a Critical "Unresolved Fault" on the Fault Isolation model.
The following is an example.
The Unresolved Fault functionality can be configured to assert the Unresolved Fault alarm on a model in the Fault Domain. Please reference the "How to configure Spectrum so that 'Unresolved Faults' generate on a device rather than the Fault Isolation Application Model" knowledge article.
If you know there is going to be a network outage for maintenance, you can use Spectrums "Maintenance Mode" (MM) to suppress alarms on the model placed into MM as well as on neighbor models if contact is lost to the neighbor model of a model that is in MM. This is important to remember because it does not matter if the model in MM is up and available. Because it is in MM, Spectrum does not send snmp or icmp to a model in MM. So, as a neighbor, it will not respond to a request from Fault Isolation.
Another thing to remember is Spectrum will continue to be able to communicate with a downstream neighbor of a model in MM if the model in MM is still up and running. The network traffic will still be passing through the model in MM to the models downstream. So if contact is lost to a model downstream of the model that is downstream of the model in MM, Spectrum will still assert an Critical lost contact alarm on that model.
The following is example of each:
Model Sim15090 is placed into MM. Sim15090 may or may not be up and running on the network. Spectrum loses contact with Sim15089. Since the only neighbor to Sim15089 is in MM, Spectrum will suppress Sim15089.
Model Sim15090 is placed into MM. Sim15090 is up and running on the network. Spectrum loses contact with Sim15095. Spectrum checks the status of it neighbor Sim15098. Spectrum CAN communicate with neighbor Sim15098. So Spectrum asserts a Critical lost contact alarm on Sim15095.
If Sim15090 was shutdown, Spectrum would lose contact will all the models downstream. In this instance, Spectrum would suppress all of the downstream neighbors as seen below.