A failover fail

What happened with the campus-wide network outage?

Ferris’ entire main campus lost internet and server access for four hours on Feb. 2 Graphic by: Charlie Zitta | Production Manager

Ferris’ main campus experienced a network blackout from 9:30 am to 1:30 pm on Feb. 2. To prevent future outages, the IT department is working with technology vendor Merit Network on future fail-safes. 

Merit Network is an Ann Arbor based nonprofit organization that provides Ferris with fiber pipelines. High speed network connection known as broadband is carried through these pipelines all over the university. 

Bhavani Koneru, Chief Technology Officer of Ferris’ IT department, has explained that the incident can be attributed to a router error. 

“There was a problem with one of the pipelines that the vendor was fixing. But, at that critical point, the main campus was supposed to be failed over to a different pipeline,” Koneru said. 

In computing, a router is used to failover the network connection to a secondary one in case of primary connection fails. When the primary connection fails for any reason, the network is to be automatically failed over, or toggled, to a backup connection. 

Despite thorough testing, the failover did not take place automatically on Feb. 2. Engineers from Merit Network and IT had to work together to identify and solve the issue while they manually routed the connection. 

“The router was corrupted during the outage, so [IT worked with] the vendor in solving and fixing the issue with the router,” Koneru said. Because the outage affected the network across the entire campus, it was difficult for members inside the network to send updates, and members outside to receive them. 

“We monitor our networks and systems. We received text messages that the main campus network was down, and this helped us start work on the issue immediately,” Koneru said. “Students, faculty and staff working from home didn’t see that we were down immediately.” 

Even in intense situations such as this one, Koneru ensures that there is no “panic mode.” All the workers of IT remain calm and follow specific protocol. 

“When IT gets this kind of critical stuff, we know exactly who should be getting it,” Koneru said. 

She received reports from Merit every ten minutes within the span of four hours and provided regular updates. 

To Koneru, the most important thing to maintain in an emergency is communication with the university. IT provided recurring messages to the Ferris community through Ferris IT Alerts during the outage. One of the IT employees responsible for the network sent hourly university updates from his cell phone hot spot. 

By 1:30 pm, the system had successfully failed over from the primary to the secondary line. 

After the system had been fixed, engineers performed several more tests. Koneru stated that plans for further testing are in the works, as well as a joint effort with Merit to ensure a stable connection. 

“As a contingency plan, I have asked my engineers to put in periodic testing and toggling of the signal,” Koneru said.