Alarms

NerveCenter^TM: Downstream Alarm Suppression - Contents - Understanding the Technical Details - Alarms -
Understanding the Technical Details Glossary

Alarms

The downstream alarm suppression behavior model monitors node status using both SNMP and ICMP. This section includes descriptions of the following alarms:

DwnStrmSnmpStatus Alarm
DwnStrmIcmpStatus Alarm

DwnStrmSnmpStatus Alarm

This alarm accurately monitors the status of nodes and their SNMP agents by taking into consideration the status of the nodes' parents. This alarm is the same as the DwnStrmSnmpStatus_LogToDB version, except that the DwnStrmSnmpStatus_LogToDB version also logs data on most transitions.

DwnStrmSnmpStatus/DwnStrmSnmpStatus_LogToDB Alarm State Diagram

Severities of Each State in DwnStrmSnmpStatus lists the severity of each state:

Severities of Each State in DwnStrmSnmpStatus

State	Severity	Color
Ground	Normal	Green
Error	Normal	Green
Testing	Normal	Green
AgentDown	Minor	Yellow
DeviceDown	Critical	Red
Unreachable	Inform	Purple

When this alarm is turned on, the following polls and masks cause state transitions:

ColdStart (trap mask)
SnmpFastPoll (SNMP get request)
SnmpPoll (SNMP get request)
SS_IcmpFastPoll (ICMP echo request, or ping)
SS_IcmpPoll (ICMP echo request, or ping)
WarmStart (trap mask)

This alarm uses the following Perl subroutines:

The following sections describe the states in the DwnStrmSnmpStatus alarm and the transitions and actions that can happen from those states:

Ground State

In Ground state, the node is reachable and the SNMP agent is up.

As long as the node and agent respond to the SnmpPoll and SnmpFastPoll requests, the agentUp circular transition is triggered. The agentUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers can transition the alarm from Ground to Error:

ICMP_ERROR
SNMP_TIMEOUT

Transitions to the Error state call the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

ICMP_ERROR also calls the SS_IcmpError Perl subroutine (see SS_IcmpError Perl Subroutine). If the SS_IcmpError Perl subroutine determines that the port is unreachable, it fires SS_PortUnreach. The SS_PortUnreach trigger does the following:

Transitions the alarm to an AgentDown state
Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the agent is down
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

Error State

The alarm suppression behavior model uses the Error state to confirm that there is actually a problem (as opposed to a dropped packet, for example). From the Error state, a node can transition back to Ground, to Testing, or to AgentDown.

If the node and agent respond to the SnmpFastPoll request, the agentUpFast transition is triggered. The agentUpFast transition does the following:

Returns the alarm to Ground state
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the node still does not respond to the poll, the following triggers transition the alarm from Error to Testing:

ICMP_ERROR, which calls the SS_IcmpError Perl subroutine (see SS_IcmpError Perl Subroutine)
SNMP_TIMEOUT

If the SS_IcmpError Perl subroutine determines that the port is unreachable, it fires SS_PortUnreach. The SS_PortUnreach trigger does the following:

Transitions the alarm to an AgentDown state
Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the agent is down
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

Testing State

While an alarm is in the Testing state, NerveCenter identifies whether the node is:

Down
Unreachable
Up, but its agent is down

If SS_PortUnreach is triggered by the SS_IcmpError Perl subroutine while the node transitioned from Error to Testing or if SS_nodeUpFast is triggered by SS_IcmpFastPoll, the trigger:

Transitions the alarm to an AgentDown state
Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the agent is down
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If SSF_IcmpError is triggered by SS_IcmpFastPoll, the trigger:

Transitions the alarm to an Unreachable state
Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If SS_nodeUpFast results in a circular ICMP_TIMEOUT transition, the TestParentStatus Perl subroutine (see TestParentStatus Perl Subroutine) looks up the status of the parents. If TestParentStatus can determine the node's state based on the parents' status, TestParentStatus fires the appropriate trigger: UnReachable or Down.

The UnReachable trigger:
- Transitions the alarm to an Unreachable state
- Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
- Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
The Down trigger:
- Transitions the alarm to a DeviceDown state
- Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
- Calls the SetNodeStatusDown Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
- Sends an Inform action to notify a network management platform or another NerveCenter of the status of this node

AgentDown State

While an alarm is in the AgentDown state, NerveCenter continues to monitor the node for any changes.

As long as the node responds to the SS_IcmpPoll requests, the SSnodeUp transition is triggered. The SSnodeUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers transition the node from AgentDown to Testing:

ICMP_TIMEOUT
SS_IcmpError

Each transition calls the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

If NerveCenter receives a warmStart trap or a coldStart trap, or agentUp is triggered in response to an SnmpPoll response, the trigger:

Transitions the alarm to a Ground state
Uses the Set Attribute action to turn poll suppression off so NerveCenter can resume all normal polling
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the time of the last status change so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

Unreachable State

While an alarm is in the Unreachable state, NerveCenter continues to monitor the node for any changes.

If NerveCenter receives a coldStart trap or SSnodeUp is triggered by a response to SS_IcmpPoll, the trigger:

Transitions the alarm to a Ground state
Uses the Set Attribute action to turn poll suppression off so NerveCenter can resume all normal polling
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the poll does not get a response and an ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode fires the Down trigger or refreshes the node's update time.

The Down trigger:

Transitions the alarm to a DeviceDown state
Calls the SetNodeStatusDown Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
Sends an Inform action to notify a network management platform or another NerveCenter of the status of this node

DeviceDown State

While an alarm is in the DeviceDown state, NerveCenter continues to monitor the node for any changes.

If NerveCenter receives a coldStart trap or the SSnodeUpFast transition is triggered by an SS_IcmpFastPoll, the trigger:

Transitions the alarm to a Ground state
Uses the Set Attribute action to turn poll suppression off so NerveCenter can resume all normal polling
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the time of the last status change so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If SSF_IcmpError is triggered by SS_IcmpFastPoll, the trigger:

Transitions the alarm to an Unreachable state
Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentStatus fires the Unreachable trigger or refreshes the node's update time.

The Unreachable trigger:

Transitions the alarm to an Unreachable state
Calls the SetNodeStatusUnReachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

DwnStrmIcmpStatus Alarm

This alarm accurately monitors the status of nodes by taking into consideration the status of the nodes' parents. This alarm is the same as the DwnStrmIcmpStatus_LogToDB version, except that the DwnStrmIcmpStatus_LogToDB version also logs data on most transitions.

DwnStrmIcmpStatus/DwnStrmIcmpStatus_LogToDB Alarm State Diagram

Severities of each state in DwnStrmSnmpStatus lists the severity of each state:

Severities of each state in DwnStrmSnmpStatus

State	Severity	Color
Ground	Normal	Green
Error	Normal	Green
Testing	Normal	Green
DeviceDown	Critical	Red
Unreachable	Inform	Purple

When this alarm is turned on, the following polls cause state transitions:

IS_IcmpFastPoll (ICMP echo request, or ping)
IS_IcmpPoll (ICMP echo request, or ping)

This alarm uses the following Perl subroutines:

The following sections describe the states in the DwnStrmIcmpStatus alarm and the transitions and actions that can happen from those states:

Ground State

In Ground state, the node is reachable.

As long as the node responds to the IS_IcmpPoll requests, the ISnodeUp transition is triggered. The ISnodeUp transition calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to refresh the update time.

If the node does not respond to the polls, the following triggers can transition the alarm from Ground to Error:

ICMP_TIMEOUT
IS_IcmpError

Transitions to the Error state call the SetNodeStatusTesting Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status to Testing.

Error State

The alarm suppression behavior model uses the Error state to confirm that there is actually a problem (as opposed to a dropped packet, for example). From the Error state, an alarm can transition back to Ground or to Testing.

If the node responds to the IS_IcmpFastPoll request, the ISnodeUpFast transition is triggered. The trigger:

Returns the alarm to Ground state
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the node still does not respond to the poll, the following triggers transition the alarm from Error to Testing:

ICMP_TIMEOUT
ISF_IcmpError

Testing State

While an alarm is in the Testing state, NerveCenter identifies whether the node is down or unreachable.

If ISnodeUpFast is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

Transitions the alarm to Ground
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If ISF_IcmpError is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

Transitions the alarm to Unreachable
Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If ISnodeUpFast results in a circular ICMP_TIMEOUT transition, NerveCenter calls the TestParentStatus Perl subroutine (see TestParentStatus Perl Subroutine) looks up the status of the parents. If TestParentStatus can determine the node's state based on the parents' status, TestParentStatus fires the appropriate trigger: UnReachable or Down.

The UnReachable trigger:
- Transitions the alarm to an Unreachable state
- Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
- Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
The Down trigger:
- Transitions the alarm to a DeviceDown state
- Uses the Set Attribute action to suppress the node so the node won't be polled by suppressible polls while the node is unreachable
- Calls the SetNodeStatusDown Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
- Sends an Inform action to notify a network management platform or another NerveCenter of the status of this node

Unreachable State

While an alarm is in the Unreachable state, NerveCenter continues to monitor the node for any changes.

If ISnodeUp is triggered by a response to IS_IcmpPoll, the trigger:

Transitions the alarm to a Ground state
Uses the Set Attribute action to turn poll suppression off so NerveCenter can resume all normal polling
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, NerveCenter calls the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode either fires the Down trigger or refreshes the node's update time.

The Down trigger:

Transitions the alarm to a DeviceDown state
Calls the SetNodeStatusDown Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately
Sends an Inform action to notify a network management platform or another NerveCenter of the status of this node

DeviceDown State

While an alarm is in the DeviceDown state, NerveCenter continues to monitor the node for any changes.

If the ISnodeUpFast transition is triggered by an IS_IcmpFastPoll, the trigger:

Transitions the alarm to a Ground state
Uses the Set Attribute action to turn poll suppression off so NerveCenter can resume all normal polling
Calls the SetNodeStatusUp Perl subroutine (see SetNodeStatus Perl Subroutines) to update the time of the last status change so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If ISF_IcmpError is triggered in response to an IS_IcmpFastPoll poll while the node is in the Testing state, the trigger:

Transitions the alarm to Unreachable
Calls the SetNodeStatusUnreachable Perl subroutine (see SetNodeStatus Perl Subroutines) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

If the poll does not get a response and a circular ICMP_TIMEOUT transition is triggered, the TestParentSetNode Perl subroutine (see TestParentSetNode Perl Subroutine), which looks up the status of the parents. If TestParentSetNode can determine the node's state based on the parents' status, TestParentSetNode fires the Unreachable trigger or refreshes the node's update time.

The Unreachable trigger:

Transitions the alarm to an Unreachable state
Calls the SetNodeStatusUnReachable Perl subroutine (see TestParentStatus Perl Subroutine) to update the status so that NerveCenter can evaluate the states of the children of this node, if there are any, accurately

Perl Subroutines

The new downstream alarm suppression behavior model uses several Perl subroutines to store parent-child relationships and maintain node statuses. This section includes descriptions of the following Perl subroutines:

SS_IcmpError Perl Subroutine

The ICMP_ERROR transition calls this Perl subroutine to evaluate the error and determine whether or not it indicates that the node is unreachable. If the ICMP error is Port Unreachable, the node is up and reachable. It is assumed that other ICMP errors indicate an unreachable node. This assumption may be incorrect depending on the behavior of your network. To include other ICMP errors that indicate that the node is unreachable, modify this Perl subroutine.

my $Type = VbValue( 0 );

my $Code = VbValue( 1 );

if( $Type == 3 && $Code == 3 )

  FireTrigger( "SS_PortUnreach" );

else

  # Modify this else to eliminate other types of

  # ICMP errors that are not indicative of an

  # unreachable node. The assumption is that if

  # SS_IcmpError is fired, we are being told, by

  # the network, that the node is unreachable.

  FireTrigger( "SS_IcmpError" );

SetNodeStatus Perl Subroutines

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, all state transitions -- except transitions from Error to Testing -- call one of the following Perl subroutines:

SetNodeStatusTesting
SetNodeStatusDown
SetNodeStatusUnreachable
SetNodeStatusUp

These Perl subroutines update the node status so the node's children can accurately update their statuses based on the node's status.

SetNodeStatusTesting

my $Return;

$Return = NC::SetNodeStatus($NodeName,"Testing");

#If $Return = 0, operation failed

SetNodeStatusDown

my $Return;

$Return = NC::SetNodeStatus($NodeName,"Down");

#If $Return = 0, operation failed

SetNodeStatusUnreachable

my $Return;

$Return = NC::SetNodeStatus($NodeName,"Unreachable");

#If $Return = 0, operation failed

SetNodeStatusUp

my $Return;

$Return = NC::SetNodeStatus($NodeName,"Up");

#If $Return = 0, operation failed

TestParentStatus Perl Subroutine

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, if a node is in a Testing state, the ERROR trigger is fired every time the node is polled and doesn't respond. Each resulting ERROR transition calls the TestParentStatus Perl subroutine.

The TestParentStatus Perl subroutine tests the parent node status and determines the status of the node by doing the following:

If the node has parents, TestParentStatus evaluates each parent's last update time. Based on the following rules, TestParentStatus sets a flag (TriggerFlag) that determines what trigger, if any, should be fired.
- If no parents have an update time more recent than the node's update time, then TriggerFlag is set to Testing.
- If at least one parent has a more recent update time but is not up, the flag is set to Testing.
- If at least one parent has a more recent update time and is up, the flag is set to Down, regardless of the status or time of last update of any other parent.
- If all parents have more recent update times and no parent is up or in testing, the flag is set to Unreachable.
If the node has no parents, TriggerFlag is set to Down.

If TriggerFlag is set to Testing, TestParentStatus does nothing because TestParentStatus must have more information to make an accurate decision. If the alarm should be in another state, TestParentStatus fires the appropriate trigger to transition the node into that state.

The code for this subroutine follows:

# The purpose of this subroutine is to test the parent

# node status and fire the appropriate trigger to take the

# alarm to either down or unreachable. You must make sure

# that all parents are being monitored with the status

# alarms.

use NC;

my $NodeUpdateTime; # Last time node status was updated

my $LastNodeStatus; # Last node status

my @Parents = (); # Array of parents

my $Parent; # Parent Node

my $ParentUpdateTime; # Last time parent node status was updated

my $ParentStatus; # Last parent status

my $TriggerFlag = "NotSet";

my $ParentNotUpdated = 0; # Remember if we have any parents not updated

#Define all triggers that can be fired

DefineTrigger('UnReachable');

DefineTrigger('Down');

DefineTrigger('Testing');

# Get the last node status and update time for this node

($LastNodeStatus,$NodeUpdateTime) = NC::GetNodeStatus($NodeName);

# Get the array of parents for this node

@Parents = NC::GetParents($NodeName);

if( defined( $Parents[0] ) )

{

# Test each parent, if ANY are ok, we assume the node

# is reachable. Parents update time must be past the

# last time the node was updated or we can't assume the

# status is accurate.

foreach $Parent (@Parents)

{

($ParentStatus,$ParentUpdateTime) = NC::GetNodeStatus($Parent);

if( $ParentUpdateTime >= $NodeUpdateTime )

{

# Using TriggerFlag to store name of trigger to be fired. If any

# parent is found to be up, then the flag will be set to down. If

# all parents are down or unreachable, then the flag will be set

# to unreachable. If no parents are down and at least one parent

# is testing, set flag to testing. Otherwise, it will remain not

# set and we will update the node's current status and time. Testing

# handles the case where one parent is testing and another is

# unreachable. We need to make sure we do not mark the node as

# unreachable until the parent node in testing goes to some final

# state because that state could be agent down which is treated

# as up.

if( ($ParentStatus eq "Down" || $ParentStatus eq "UnReachable") && $TriggerFlag eq "NotSet")

{

$TriggerFlag = "UnReachable";

}

elsif( $ParentStatus eq "Up" )

{

$TriggerFlag = "Down";

}

elsif( $ParentStatus eq "Testing" && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

else

{

# Remember that we have at least one parent that hasn't been updated.

$ParentNotUpdated = 1;

}

else

{

# If no parents, assume node is down.

$TriggerFlag = "Down";

}

# If I have at least one parent not updated and I do not have

# any Up parents, Set TriggerFlag to testing.

if( $ParentNotUpdated && $TriggerFlag ne "Down" )

{

$TriggerFlag = "Testing";

}

if( $TriggerFlag ne "Testing" )

{

# Fire trigger if node's status should change.

if( $TriggerFlag ne $LastNodeStatus )

{

# Fire trigger

FireTrigger( $TriggerFlag );

}

TestParentSetNode Perl Subroutine

For the DwnStrmSnmpStatus and DwnStrmIcmpStatus alarms, if an alarm is in a DeviceDown or Unreachable state, the ERROR trigger is fired every time the node is polled and doesn't respond. Each resulting ERROR transition calls the TestParentSetNode Perl subroutine.

The TestParentSetNode Perl subroutine tests the parent node status and determines the status of the node by doing the following:

If the node has parents, TestParentSetNode evaluates each parent's last update time. Based on the following rules, TestParentSetNode sets a flag (TriggerFlag) that determines what trigger, if any, should be fired.
- If no parents have an update time more recent than the node's update time, then TriggerFlag is set to Testing.
- If at least one parent has a more recent update time but is not up, the flag is set to Testing.
- If at least one parent has a more recent update time and is up, the flag is set to Down, regardless of the status or time of last update of any other parent.
- If all parents have more recent update times and no parent is up or in testing, the flag is set to Unreachable.
If the node has no parents, TriggerFlag is set to Down.

If TriggerFlag is set to Testing, TestParentSetNode does nothing because TestParentSetNode must have more information to make an accurate decision. If the alarm should be in another state, TestParentSetNode fires the appropriate trigger to transition the alarm into that state. If the alarm is already in the correct state, TestParentSetNode just refreshes the node update time so the node's children can accurately update their statuses based on the node's status.