NerveCenterTM: Downstream Alarm Suppression - Contents - Understanding How the Model Works -
Introduction      Testing the Model

Understanding How the Model Works

The first downstream alarm suppression model (which included DSCollectRoutes, DSIcmpStatus, and DSSnmpStatus), used information about local routers to determine the status of an unreachable node. If a route existed for the node, the node was assumed to be down; otherwise, it was marked as unreachable. In either case, the node was suppressed. For simple networks that consisted of nodes behind routers, this model was adequate. However, for more complex networks with multiple routers, switches, and hubs, and for certain routing protocols, the new model provides a more accurate determination of a node's status.

What is a complex network, as opposed to a simple network? A simple network might include single parent-child relationships. Nodes that are dependent on other nodes for a route to the NerveCenter server are child nodes. Nodes on which other nodes are dependent are parent nodes.

A Simple Network

AlarmSuppressiona

A more complex network might include nodes with multiple parents and nodes that are themselves parents to other nodes.

A More Complex Network

AlarmSuppression9

The new model uses the status of devices between NerveCenter and managed nodes in the network to make real-time determinations about whether nodes are up, down, or unreachable. NerveCenter can then take appropriate actions based on the statuses of those nodes. For example, suppose NerveCenter is monitoring 1000 nodes, and 300 nodes behind a router stop responding to polls. NerveCenter can use the status of the router and any intermediate devices to determine whether the nodes are down or unreachable. If the nodes are actually down, NerveCenter forwards the appropriate alarms to the network management platform; however, if they are unreachable, NerveCenter just forwards one critical alarm for the router and uses built-in Open NerveCenter Smart Polling technology to stop suppressible polls for those nodes until they are available again.

NerveCenter can get information about the nodes in the following ways:

Once NerveCenter has that relationship information, the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms monitor nodes and maintain their statuses in the NerveCenter database.

NerveCenter Maintains Parent-Child Relationship and Status Information

AlarmSuppression10

The following execptions apply to HP OpenView:

Nodes can have the following statuses: up, testing, down, and unreachable. Any node that responds has a status of up. The first time a node does not respond, its status is set to testing. While a node is in testing, its status is not updated again until NerveCenter determines that the node is up, down, or unreachable.

NerveCenter decides whether the node is down or unreachable based on whether the node has parents, whether the parents' statuses are more current than the node's last status update, and what those statuses are.

The model uses the following logic:

The key is to only update a node's status when NerveCenter can make a definitive decision about the status based on real-time data, which can only happen when the parents' status is more current than the node's status.

If the node does go to down or unreachable, NerveCenter continues to monitor the node and its parents to determine if the node is available again, if the parents' statuses have affected the status of the node, or if there has been no change.

For example, the Updating a Node's Status Depends on When the Parent's Status Was Last Updated shows a node that has one parent. At T0, the node does not respond to an SNMP poll, so the alarm transitions from ground to error and the node's status is updated to testing. If the node does not respond to a second poll at T1, the alarm transitions from error to testing but the node's status is not updated. On a circular transition that loops back to the testing state, a Perl subroutine checks -- and continues to check -- the parent's status. At T2, the parent's status has been updated. Since the parent's status is more current than the node's status, the alarm transitions to unreachable and the node's status is set to unreachable. At T3, the parent's status has not changed, so the node's status is not updated.

Updating a Node's Status Depends on When the Parent's Status Was Last Updated

AlarmSuppression11

As long as the parent's status remains down and is more current than the node's status every time the Perl subroutine checks it, the node's status is refreshed.

If you are running the NerveCenter Server on Windows and you are running DwnStrmIcmpStatus_LogToDB and DownStrmSnmpStatus_LogToDB, you can run reports on the availability of managed nodes. Three reports included with this version of NerveCenter include a summary of availability (availsum.rpt), the status of each node by property group (availstat.rpt), and a list of all transitions for each node (availtrans.rpt). Summary of Node Availability is an example of the summary of availability report.

Summary of Node Availability

nc36_report_availsum_html

For more details about the new downstream alarm suppression model, see Understanding the Technical Details.


Introduction Testing the Model
29 July 2003