This could be a long one so bear with me. I actually lost my cool over this, the wife said it wasn't pretty.
TWC's servers are in a data center on the 11th floor of a building in downtown Denver. Saturday afternoon (Christmas Eve) one of two main buss fuses between the 2nd and 3rd floor blew. This is the second time that exact same fuse has blown this year. I don't know how familiar most of you are with powering a large building, but when I say "fuse" this is a 4000 amp fuse that costs like $3,000 and it powers a big copper bar that runs up the walls in a shaft similar to an elevator shaft, something like this one. The first time it blew early in 2016 they just replaced it and never figured out why it blew. This time (Sat night) they still had no explanation why it blew, and instead of replacing it with an identical fuse they put in a slow blow fuse. A slow blow fuse is supposed to handle temporary surges way above its amp rating instead of blowing instantly.
In this instance that kicked us in the ass. What they failed to notice was that there was water in the main buss. Don't ask me where from, I have no idea and at this point I do not care. I am sure everyone involved wanted to go home because of Christmas Eve (I know I did) so they rushed it. Anyways, water got back into the buss and instead of blowing right off it kept power long enough for the water to run all the way around the fuse. At that point it didn't matter if the fuse blew or not, because the water was bridging both sides. It melted, and by that I mean completely freaking destroyed about 8-10 feet of that big copper buss. Everything above the 2nd floor in that building is completely dead.
The backup generators are in the basement....
Obviously you can see the problem. We have live power in the basement (both grid and generator), and no way to get it above floor #2. I found this out Tuesday at about noon. At that time I was told it would take 24 hours or so to get parts in and fix the problem. I wasn't freaking out too badly over TWC at this point, but I have 2 big clients of my own in that datacenter (VoIP, email servers, web servers, etc) and that isn't counting the 200 other servers in there. These are companies companies of all size that suddenly have no communications. My biggest client is in that building and they have 150 or so employees and they were completely dead in the water on Tuesday. So Tuesday night myself and a few other guys (datacenter owner, 2 guys from my client) humped a couple of portable generators (about 120 lbs each) up 11 flights of stairs and got the core systems up (routers and switches and stuff like that) and then powered up a few servers. I intentionally left TWC dark because while we all love TWC, it doesn't stop people from getting paid when it goes down, and we ran out of generators before we got all the businesses up anyways.
I should note here that we did not ask permission of the building superintendent to do this. He got a little bent out of shape over generators spewing fumes, even though most of them are chained on the fire escape outside. That is when he and I got into it and my wife freaked out. I think she thought I was going to throw him out the 11th floor window. This was about midnight on Tuesday. Anyways, that got my clients up and running for business on Weds and I spent the night at the datacenter on Tuesday making sure stuff stayed up and refilling gas tanks on generators. We have a rotation going for that by the way, more on that soon.
So I got home about 5 AM on Weds and took a nap, and at 8:30 my phone woke me up. The electricians in charge of fixing this mess discovered it was worse than they first thought. They had to jackhammer part of this thing out of the wall and floor, and they couldn't replace it with off the shelf parts. It seems this one was custom built for this building which isn't exactly uncommon, and the company that made it is in Indiana. They said they can build a new one, but it would take a couple of days and then it would have to be shipped in.
By this time I am really losing my mind. We (datacenter owner and I) got ahold of the building maintenance people and pretty much demanded they run a cable from a generator up the side of the building or tie it into the buss on the 4th floor or whatever. Something has to be done. He said he wasn't going to make that decision, it was up to the guy who actually owns the building who would be in town from Dallas in a few hours. He got into town around 5 PM or so on Weds, and spent a few hours talking with the electricians to try and figure something out.
Tying into the building buss isn't an option, because then they wont be able to work on it because it would be hot, and they still don't know where the water came from. Cables up the side of the building are out, the city of Denver is already all over this. The only time in my life I ever heard of a bureaucrat jumping on something this fast, but whatever. They did come up with a plan to separate the buss at the 4th or 5th floor where they have somewhat easier access, put in a couple of big lugs, and hardwire that to the live power in the basement by going down an elevator shaft. But that will not be done until sometime Friday afternoon, at best. And it will just be a band-aid anyways and it wont be power to the entire building. Mostly just the datacenter and lights in other offices. They are telling everyone to run only the basics, no microwaves or refrigerators, etc They worked on it all day today.
So today I made a few decisions. First, I got together with my client I mentioned earlier and came up with a long term plan to get us out of that building. Basically he is growing so much that he needs a dedicated connection for his business anyways, and then he can have his own servers in his own building so that if something like this happens again he can just fire up his own generators. We got with Comcast here in Denver and made plans for a dedicated 50 mbps fiber connection that can be turned up at anytime for speeds up to 1000 mbps. TWC uses about 8-10 and so we are going to absorb some of his cost there and put our servers in his building.
Dedicated fiber optic from someone like Comcast is expensive, we are talking like $3500 a month here, but its probably worth it. Power in the building becomes my problem, but its already my problem anyways the way things are working out here. And we are going to do it right, big battery backups and a dedicated diesel generator in case we need it. He already owns a 7500 watt generator, I already own tons of rack equipment and stuff like that. As long as the connection stays up we should be good, and we have a backup plan for that in his second office. We have to build a server room in his office just to keep everything secure, but it will take 90 days for Comcast to run the fiber into the building anyways so we have time to do that. They said we can be live 90 days after we sign a 3 year contract, and guarantee our bandwidth and uptime, and a maximum response time of 4 hours for any problems. We also get a dedicated network engineer. Each of their engineers only handles so many clients, and we have work, cell, and home numbers for 3 guys at Comcast.
But that's 90 days from now. Today we were still dead. So I got another generator and carried it up the 11 flights of stairs and brought up the main TWC server and switch, and a few more of the datacenters clients. The Vault server and TWC mail server and others are still down. Hopefully they have their band-aid in place and I can bring that up tomorrow.
Then sometime next week when they get the stuff back for a permanent fix, I will fire the generators back up and power one side of the TWC web server (it has dual power supplies) while they take the buss down and put stuff in. Then I can plug back into the building power and unplug from the generator without having to take the site down. Hopefully at least, but even if I have to take it down it should only be for a few minutes.
So from now until then there are 3 of us rotating duties at the datacenter so we can fill gas tanks and handle whatever else comes up. I am going to bed.