When Experience Becomes a Liability
Through the course of my career, I've spent an extraordinary amount of time troubleshooting other people’s networks. All the experience made me good at solving problems, but it also got me used to solving the same problems over and over.
It sort of became a cycle of being really smart and feeling really good. You design something. Implement it. The thing works for the most part. There are problems. You identify them. The thing works better. You learn more about the thing. You have a Diet Coke. It's a decent little cycle, whether you're a network manager or product manager. Unfortunately, it tends to get you searching in the same places for problems. And that can eventually end up biting you in the rear end. At least it did for me.
Back when I worked in product for networking companies, I was always on the hook to travel for remote troubleshooting projects. When big clients called, I found myself on a plane. Plans were canceled. All the important people in my life knew the drill.
On this particular occasion, I landed in the Deep South, where some of our most valued customers were having fits. They had a large chemical plant that connected to their HQ, so they could run supply chain software. Two weeks before, they'd finally found a decent time to plan an outage and upgrade my box with our most recent software version. They didn't want to do it, but we pushed, because it had key performance fixes that would help them (and finally silence the barrage of support calls).
To their dismay, however, the upgrade seemed to create a new problem. Now, every day, at 2:25 p.m., the network went down. Their workforce lost connectivity to mail, the Internet, to HQ. Everything gone, in an instant, for no apparent reason.
Now, my box was inline and I knew it. At the time, I thought the problem very well could be my fault. I also knew the client's network had a simple router. But the L3 switch inside was a beast. I still don’t really understand why they needed such a complex network. All I know is they had a "server" network, an “office” network, and a network for the plant's "machines." The routing looked to have been done one at a time with static routes, machine to desktop, server to machine, desktop to HQ. I immediately hoped it was my fault. Anything to keep from having to look into that mess they called a network.
But, when I started picking around, it became quite clear that everything was particularly flawless. Things were simple, smooth and even set up right. And then the clock struck 2:25 p.m. The network broke - not long - just 2 minutes. I'd convinced myself the problem would be log related, but to my surprise the logs were fine. CPU use was low. Memory was o.k. What was going on? It wasn't my fault. It wasn't my box. I was going to have to come back the next day.
As I drove in the next morning, I let my mind take a break and enjoy the rolling green countryside, while taking notice of the incredible amount of train tracks that criss-crossed the landscape. When I got to the plant, I asked about them, immediately regretting my decision as one of their IT guys began explaining the inner workings of the coal industry. Then, in mid-sentence, the guy just started laughing.
"2:25!" He said, his head shaking from side to side. "2:25."
Apparently, an old mine had begun producing again, and now a new train passed by the plant every day at - you guessed it - 2:25 p.m. As it chugged by, it crossed straight through the line-of sight wireless link that connected HQ to a repeater and then down to the telco.
The fact that the network went loopy immediately after the box was upgraded? A red herring. If I'd stopped focusing in my “area of expertise” for even a few seconds, abandoned my preconceived diagnosis, I might have looked for other signs or changes and found a quicker solution.
Would my new fancy cloud network have made a difference? Not in this case. And that provides a great reminder that even as we move toward virtualizing the network, we shouldn't lose sight of the fundamental basics: the physical network.