42267members
186168posts

How much broadcast (i.e ARP) traffic can Continuum controllers handle before they turn off their transceiver and go offline.

How much broadcast (i.e ARP) traffic can Continuum controllers handle before they turn off their transceiver and go offline.

Issue

Offline alarms received from Continuum controllers every few minutes and log error 0x00006c0e Broadcast Storm AKA Error transmitting and/or 0x00006c03 Driver fatal error during operation.

Product Line

Andover Continuum

 

Environment

Continuum Net Controller II

Continuum ACX 4 Controller

Continuum CX Gen 1

Continuum bCX4040

Continuum bCX9640

Continuum BCX4000

Cause

Excessive ARPs and/or other broadcast traffic on the network.

Resolution

When troubleshooting controller offline issues that are suspected to have excessive broadcast traffic as the root cause it is important to understand how to interpret Wireshark capture to determine whether too much broadcast on the VLAN  is indeed the problem.

Note that taking the percentage of broadcast traffic in a Wireshark capture while a good starting point cannot in itself be solely used as proof that excessive broadcast is indeed the root cause of the controllers going offline, this is because in the absence of much unicast traffic the broadcast traffic will make up most of the packets resulting in a misleading high percentage.

A much better way to determine whether the controller maybe going offline due to excessive broadcasts is to look at the average broadcast packets per second over a 15-20-minute capture.

The screen capture below shows the statistics from a 20 minute capture filtered by broadcast traffic (eth.dst == ff:ff:ff:ff:ff:ff) 

The capture shows that about 63% of the traffic is made up of broadcast at an average of 24 packets per second.

The Continuum controllers on the network were going offline multiple times an hour.

Note that the Continuum controllers are initially able to handle a much higher rate of broadcast traffic, tests performed in the PSS lab show a CX9680 can handled about 10 times  the amount shown here sustained for 5-10 minutes but over a much longer period of time such as several days or weeks the amount of broadcast traffic shown was enough to cause the controllers  to go offline.

The screen capture below shows the statistics after the root cause of the excessive broadcast was identified and fixed.

The problem at the site in question here was a defect in the network switch that was making the switch duplicate ARP traffic.

Note that Cisco switch model Catalyst 4500 is also effected by the firmware bug.

Current version which fixed the issue:

Catalyst 4500 L3 Switch  Software (cat4500e-UNIVERSALK9-M), Version 03.08.05.E RELEASE SOFTWARE (fc2)

 

Previous version which caused the broadcast problems:

Catalyst 4500 L3 Switch  Software (cat4500e-UNIVERSALK9-M), Version 03.08.02.E RELEASE SOFTWARE (fc2)

Labels (1)
No ratings