SDSC Event Resolved: Networking Outage Sept 20

Event: Loss of connectivity to some SDSC devices

Event status: Event has been cleared at this time. If you notice issues please report them to operations right away.

Start: Approx. Sept 20th, 2012 9:15pm End:  Sept 21st,2012 12:18am

Event Notice: SDSC networking experienced a loss of connectivity across some of its networks.the ENS team responded and cleared the issue.

Action Required:  None

Event Summary:
The outage last night was triggered by an event on the network that we are working to isolate and identify.  This event caused a major spanning tree loop on our distribution and floor switches. This in turn, caused the link between the distribution switches and lightning to be error disabled by spanning tree. This is an expected behavior and normally serves the protect the network by isolating the issue.  In this case, it resulted unexpectedly in lightning and Thor being unable to see each other and each router then believed it was the Master VRRP (we affectionally call this split brain condition).  Clients then randomly assigned their gateway to one device or the other which when combined with the split brained condition results in random “black hole” events of partial and random subnets.  Fortunately,   Rebooting Lightning, and restoring Thor to Lightning connectivity solved the split brain issue and made vrrp converge and caused the networks to be routable.  We are continuing to review the logs to see if we can determine the precipitating event so we can more accurately report the issue to the vendors.  ENS is on call this weekend to resolve any issues that may popup.  If you see further issues please report them to operations so they can be looked into.




Leave a Reply