In one of our sites, our network gear is getting fairly old – and although that shouldn’t really be too much of an issue, one of our switches has started to cause us some networking issues.
It’s a Netgear GS748T and it started showing it’s issue with extended response times to devices attached to it. None of us were aware of where the networking issue was initially so diagnosis took a while.
Networking issue diagnosis
The first ideas were actually that is was the switch directly attached to the servers (a 24 port TP-Link) or one of the devices attached to it as the other switches between the buildings (there’s two) supported desktop devices.
First port of call was to reboot the TP-Link switch – that didn’t work. Next thing to try was to start removing cables from it one by one to see where the issue lay. As we did that, we were pinging a server attached to that switch and monitored response times. Once we tried all of the ports, we realised that the only port we hadn’t done was the one that we were pinging. So, time to ping another machine and pull the last one left.
Ping times dropped.
The problem? That particular server dealt with Active Directory, DNS, DHCP and DFS server. Either we had to rectify the issue with it, or reconfigure something else to take over DHCP and DFS (AD and DNS would be looked after by our other site).
Off we go to reconfigure a different network card with the same IP address (ignoring all the Windows warnings of multiple gateways / duplicate IP addresses etc.).
Then – disaster. As we’re doing this, ping times start ramping up again – up to around 3000ms again! What we then realise is that we’re pinging through another switch – the Netgear GS748T. Hmmm, I wonder. Guy on site wanders over to the other building, reboots the switch and boom – network drops to normal.
Job done – and a note in my head to think that it’s not always the device you think is the problem!
Read here to see how we’re going to resolve the issues.