NerveCenterTM: Downstream Alarm Suppression - Contents - Understanding the Technical Details - Alarms -
Understanding the Technical Details      Glossary

Alarms

The downstream alarm suppression behavior model monitors node status using both SNMP and ICMP. This section includes descriptions of the following alarms:

DwnStrmSnmpStatus Alarm

This alarm accurately monitors the status of nodes and their SNMP agents by taking into consideration the status of the nodes' parents. This alarm is the same as the DwnStrmSnmpStatus_LogToDB version, except that the DwnStrmSnmpStatus_LogToDB version also logs data on most transitions.

DwnStrmSnmpStatus/DwnStrmSnmpStatus_LogToDB Alarm State Diagram

NewSnmpStatusStateDiagram

Severities of Each State in DwnStrmSnmpStatus lists the severity of each state:

Severities of Each State in DwnStrmSnmpStatus

State Severity Color

Ground

Normal

Green

Error

Normal

Green

Testing

Normal

Green

AgentDown

Minor

Yellow

DeviceDown

Critical

Red

Unreachable

Inform

Purple


When this alarm is turned on, the following polls and masks cause state transitions:

This alarm uses the following Perl subroutines:

The following sections describe the states in the DwnStrmSnmpStatus alarm and the transitions and actions that can happen from those states:

Ground State

In Ground state, the node is reachable and the SNMP agent is up.

As long as the node and agent respond to the SnmpPoll and SnmpFastPoll requests, the agentUp circular transition is triggered. The agentUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers can transition the alarm from Ground to Error:

Transitions to the Error state call the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

ICMP_ERROR also calls the SS_IcmpError Perl subroutine (see SS_IcmpError Perl Subroutine). If the SS_IcmpError Perl subroutine determines that the port is unreachable, it fires SS_PortUnreach. The SS_PortUnreach trigger does the following:

Error State

The alarm suppression behavior model uses the Error state to confirm that there is actually a problem (as opposed to a dropped packet, for example). From the Error state, a node can transition back to Ground, to Testing, or to AgentDown.

If the node and agent respond to the SnmpFastPoll request, the agentUpFast transition is triggered. The agentUpFast transition does the following:

If the node still does not respond to the poll, the following triggers transition the alarm from Error to Testing:

If the SS_IcmpError Perl subroutine determines that the port is unreachable, it fires SS_PortUnreach. The SS_PortUnreach trigger does the following:

Testing State

While an alarm is in the Testing state, NerveCenter identifies whether the node is:

If SS_PortUnreach is triggered by the SS_IcmpError Perl subroutine while the node transitioned from Error to Testing or if SS_nodeUpFast is triggered by SS_IcmpFastPoll, the trigger:

If SSF_IcmpError is triggered by SS_IcmpFastPoll, the trigger:

If SS_nodeUpFast results in a circular ICMP_TIMEOUT transition, the TestParentStatus Perl subroutine (see TestParentStatus Perl Subroutine) looks up the status of the parents. If TestParentStatus can determine the node's state based on the parents' status, TestParentStatus fires the appropriate trigger: UnReachable or Down.

AgentDown State

While an alarm is in the AgentDown state, NerveCenter continues to monitor the node for any changes.

As long as the node responds to the SS_IcmpPoll requests, the SSnodeUp transition is triggered. The SSnodeUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers transition the node from AgentDown to Testing:

Each transition calls the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

If NerveCenter receives a warmStart trap or a coldStart trap, or agentUp is triggered in response to an SnmpPoll response, the trigger:

Unreachable State

While an alarm is in the Unreachable state, NerveCenter continues to monitor the node for any changes.

If NerveCenter receives a coldStart trap or SSnodeUp is triggered by a response to SS_IcmpPoll, the trigger:

If the poll does not get a response and an ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode fires the Down trigger or refreshes the node's update time.

The Down trigger:

DeviceDown State

While an alarm is in the DeviceDown state, NerveCenter continues to monitor the node for any changes.

If NerveCenter receives a coldStart trap or the SSnodeUpFast transition is triggered by an SS_IcmpFastPoll, the trigger:

If SSF_IcmpError is triggered by SS_IcmpFastPoll, the trigger:

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentStatus fires the Unreachable trigger or refreshes the node's update time.

The Unreachable trigger:

DwnStrmIcmpStatus Alarm

This alarm accurately monitors the status of nodes by taking into consideration the status of the nodes' parents. This alarm is the same as the DwnStrmIcmpStatus_LogToDB version, except that the DwnStrmIcmpStatus_LogToDB version also logs data on most transitions.

DwnStrmIcmpStatus/DwnStrmIcmpStatus_LogToDB Alarm State Diagram

NewIcmpStatusStateDiagram

Severities of each state in DwnStrmSnmpStatus lists the severity of each state:

Severities of each state in DwnStrmSnmpStatus

State Severity Color

Ground

Normal

Green

Error

Normal

Green

Testing

Normal

Green

DeviceDown

Critical

Red

Unreachable

Inform

Purple


When this alarm is turned on, the following polls cause state transitions:

This alarm uses the following Perl subroutines:

The following sections describe the states in the DwnStrmIcmpStatus alarm and the transitions and actions that can happen from those states:

Ground State

In Ground state, the node is reachable.

As long as the node responds to the IS_IcmpPoll requests, the ISnodeUp transition is triggered. The ISnodeUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers can transition the alarm from Ground to Error:

Transitions to the Error state call the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

Error State

The alarm suppression behavior model uses the Error state to confirm that there is actually a problem (as opposed to a dropped packet, for example). From the Error state, an alarm can transition back to Ground or to Testing.

If the node responds to the IS_IcmpFastPoll request, the ISnodeUpFast transition is triggered. The trigger:

If the node still does not respond to the poll, the following triggers transition the alarm from Error to Testing:

Testing State

While an alarm is in the Testing state, NerveCenter identifies whether the node is down or unreachable.

If ISnodeUpFast is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

If ISF_IcmpError is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

If ISnodeUpFast results in a circular ICMP_TIMEOUT transition, NerveCenter calls the TestParentStatus Perl subroutine (see TestParentStatus Perl Subroutine) looks up the status of the parents. If TestParentStatus can determine the node's state based on the parents' status, TestParentStatus fires the appropriate trigger: UnReachable or Down.

Unreachable State

While an alarm is in the Unreachable state, NerveCenter continues to monitor the node for any changes.

If ISnodeUp is triggered by a response to IS_IcmpPoll, the trigger:

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode either fires the Down trigger or refreshes the node's update time.

The Down trigger:

DeviceDown State

While an alarm is in the DeviceDown state, NerveCenter continues to monitor the node for any changes.

If the ISnodeUpFast transition is triggered by an IS_IcmpFastPoll, the trigger:

If ISF_IcmpError is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode fires the Unreachable trigger or refreshes the node's update time.

The Unreachable trigger:

Perl Subroutines

The new downstream alarm suppression behavior model uses several Perl subroutines to store parent-child relationships and maintain node statuses. This section includes descriptions of the following Perl subroutines:

SS_IcmpError Perl Subroutine

The ICMP_ERROR transition calls this Perl subroutine to evaluate the error and determine whether or not it indicates that the node is unreachable. If the ICMP error is Port Unreachable, the node is up and reachable. It is assumed that other ICMP errors indicate an unreachable node. This assumption may be incorrect depending on the behavior of your network. To include other ICMP errors that indicate that the node is unreachable, modify this Perl subroutine.


   my $Type = VbValue( 0 );
   my $Code = VbValue( 1 );
   if( $Type == 3 && $Code == 3 )
   {
    FireTrigger( "SS_PortUnreach" );
   }
   else
   {
    # Modify this else to eliminate other types of
    # ICMP errors that are not indicative of an
    # unreachable node. The assumption is that if
    # SS_IcmpError is fired, we are being told, by
    # the network, that the node is unreachable.
    FireTrigger( "SS_IcmpError" );
   }

SetNodeStatus Perl Subroutines

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, all state transitions -- except transitions from Error to Testing -- call one of the following Perl subroutines:

These Perl subroutines update the node status so the node's children can accurately update their statuses based on the node's status.

SetNodeStatusTesting


   my $Return;
   $Return = NC::SetNodeStatus($NodeName,"Testing");
   #If $Return = 0, operation failed

SetNodeStatusDown


   my $Return;
   $Return = NC::SetNodeStatus($NodeName,"Down");
   #If $Return = 0, operation failed

SetNodeStatusUnreachable


   my $Return;
   $Return = NC::SetNodeStatus($NodeName,"Unreachable");
   #If $Return = 0, operation failed

SetNodeStatusUp


   my $Return;
   $Return = NC::SetNodeStatus($NodeName,"Up");
   #If $Return = 0, operation failed

TestParentStatus Perl Subroutine

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, if a node is in a Testing state, the ERROR trigger is fired every time the node is polled and doesn't respond. Each resulting ERROR transition calls the TestParentStatus Perl subroutine.

The TestParentStatus Perl subroutine tests the parent node status and determines the status of the node by doing the following:

If TriggerFlag is set to Testing, TestParentStatus does nothing because TestParentStatus must have more information to make an accurate decision. If the alarm should be in another state, TestParentStatus fires the appropriate trigger to transition the node into that state.

The code for this subroutine follows:

# The purpose of this subroutine is to test the parent

# node status and fire the appropriate trigger to take the

# alarm to either down or unreachable. You must make sure

# that all parents are being monitored with the status

# alarms.

use NC;

my $NodeUpdateTime; # Last time node status was updated

my $LastNodeStatus; # Last node status

my @Parents = (); # Array of parents

my $Parent; # Parent Node

my $ParentUpdateTime; # Last time parent node status was updated

my $ParentStatus; # Last parent status

my $TriggerFlag = "NotSet";

my $ParentNotUpdated = 0; # Remember if we have any parents not updated

#Define all triggers that can be fired

DefineTrigger('UnReachable');

DefineTrigger('Down');

DefineTrigger('Testing');

# Get the last node status and update time for this node

($LastNodeStatus,$NodeUpdateTime) = NC::GetNodeStatus($NodeName);

# Get the array of parents for this node

@Parents = NC::GetParents($NodeName);

if( defined( $Parents[0] ) )

{

# Test each parent, if ANY are ok, we assume the node

# is reachable. Parents update time must be past the

# last time the node was updated or we can't assume the

# status is accurate.

foreach $Parent (@Parents)

{

($ParentStatus,$ParentUpdateTime) = NC::GetNodeStatus($Parent);

if( $ParentUpdateTime >= $NodeUpdateTime )

{

# Using TriggerFlag to store name of trigger to be fired. If any

# parent is found to be up, then the flag will be set to down. If

# all parents are down or unreachable, then the flag will be set

# to unreachable. If no parents are down and at least one parent

# is testing, set flag to testing. Otherwise, it will remain not

# set and we will update the node's current status and time. Testing

# handles the case where one parent is testing and another is

# unreachable. We need to make sure we do not mark the node as

# unreachable until the parent node in testing goes to some final

# state because that state could be agent down which is treated

# as up.

if( ($ParentStatus eq "Down" || $ParentStatus eq "UnReachable") && $TriggerFlag eq "NotSet")

{

$TriggerFlag = "UnReachable";

}

elsif( $ParentStatus eq "Up" )

{

$TriggerFlag = "Down";

}

elsif( $ParentStatus eq "Testing" && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

}

else

{

# Remember that we have at least one parent that hasn't been updated.

$ParentNotUpdated = 1;

}

}

}

else

{

# If no parents, assume node is down.

$TriggerFlag = "Down";

}

# If I have at least one parent not updated and I do not have

# any Up parents, Set TriggerFlag to testing.

if( $ParentNotUpdated && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

if( $TriggerFlag ne "Testing" )

{

# Fire trigger if node's status should change.

if( $TriggerFlag ne $LastNodeStatus )

{

# Fire trigger

FireTrigger( $TriggerFlag );

}

}

TestParentSetNode Perl Subroutine

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, if an alarm is in a DeviceDown or Unreachable state, the ERROR trigger is fired every time the node is polled and doesn't respond. Each resulting ERROR transition calls the TestParentSetNode Perl subroutine.

The TestParentSetNode Perl subroutine tests the parent node status and determines the status of the node by doing the following:

If TriggerFlag is set to Testing, TestParentSetNode does nothing because TestParentSetNode must have more information to make an accurate decision. If the alarm should be in another state, TestParentSetNode fires the appropriate trigger to transition the alarm into that state. If the alarm is already in the correct state, TestParentSetNode just refreshes the node update time so the node's children can accurately update their statuses based on the node's status.

The code for this subroutine follows:

# The purpose of this subroutine is to test the parent

# node status and, if the node is not in a terminal state

# but should be, fire a trigger to make it so. If the node

# is already in the correct state, just refresh the node

# update time. You must make sure that all parents are

# being monitored with the status alarms.

use NC;

my $NodeUpdateTime; # Last time node status was updated

my $LastNodeStatus; # Last node status

my @Parents = (); # Array of parents

my $Parent; # Parent Node

my $ParentUpdateTime; # Last time parent node status was updated

my $ParentStatus; # Last parent status

my $TriggerFlag = "NotSet";

my $ParentNotUpdated = 0; # Remember if we have any parents not updated

#Define all triggers that can be fired

DefineTrigger('UnReachable');

DefineTrigger('Down');

DefineTrigger('Testing');

# Get the last node status and update time for this node

($LastNodeStatus,$NodeUpdateTime) = NC::GetNodeStatus($NodeName);

# Get the array of parents for this node

@Parents = NC::GetParents($NodeName);

if( defined( $Parents[0] ) )

{

# Test each parent, if any are ok, we assume the node

# is reachable. Parents update time must be past the

# last time the node was updated or we can't assume the

# status is accurate.

foreach $Parent (@Parents)

{

($ParentStatus,$ParentUpdateTime) = NC::GetNodeStatus($Parent);

if( $ParentUpdateTime >= $NodeUpdateTime )

{

# Using TriggerFlag to store name of trigger to be fired. If any

# parent is found to be up, then the flag will be set to down. If

# all parents are down or unreachable, then the flag will be set

# to unreachable. If no parents are down and at least one parent

# is testing, set flag to testing. Otherwise, it will remain not

# set and we will update the node's current status and time. Testing

# handles the case where one parent is testing and another is

# unreachable. We need to make sure we do not mark the node as

# unreachable until the parent node in testing goes to some final

# state because that state could be agent down which is treated

# as up.

if( ($ParentStatus eq "Down" || $ParentStatus eq "UnReachable") && $TriggerFlag eq "NotSet" )

{

$TriggerFlag = "UnReachable";

}

elsif( $ParentStatus eq "Up" )

{

$TriggerFlag = "Down";

}

elsif( $ParentStatus eq "Testing" && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

}

else

{

# Remember that we have at least one parent that hasn't been updated.

$ParentNotUpdated = 1;

}

}

}

else

{

# Node does not have parents so assume down

$TriggerFlag = "Down";

}

# If I have at least one parent not updated and I do not have

# any up parents, Set TriggerFlag to testing.

if( $ParentNotUpdated && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

if( $TriggerFlag ne "Testing" )

{

# Fire trigger if node's status should change. Otherwise

# refresh the time for the node's current state.

if( $TriggerFlag ne $LastNodeStatus )

{

# Fire trigger

FireTrigger( $TriggerFlag );

}

else

{

# Refresh node status

NC::SetNodeStatus($NodeName,$LastNodeStatus);

}

}


Understanding the Technical Details Glossary
29 July 2003