hosteverything.net

Further Outages - This time a network issue

Availability of the server was a bit poor yesterday due to a network failure at our ISP's datacenter, BlueSquare... details from my ISP below:

Sorry for any inconvenience this caused. I'm speaking to my ISP how they can improve their redundancy.

The connectivity to this server was only lost to the UK for the majority of the time, with network peering from the US (and thus Asia and Australia) and Europe remaining OK.

--

Dear customer,

Today we experienced an issue with BlueSquare's primary fibre link - a
break in the fibre was detected at the Telehouse (East) datacentre in
London.

Traffic is currently being routed through BlueSquare's secondary fibre
link, whilst engineers work on repairing the primary fibre.

During this time, however, there was significant packet-loss due to the
third fibre connection between Telehouse (East) and Telecity (HEX). This
is what caused the majority of the outage that customers, PoundHost, and
other providers in BlueSquare experienced this afternoon.

At this time, traffic is still being routed through the secondary
connection, although extensive work is being spent to bring the primary
connection back in service.

Let me reiterate that this issue is not directly related to PoundHost -
we are one of a handful of providers at BlueSquare affected by this outage.

We will send out a more detailed announcement once we are confident that
service will no longer be affected.

Many thanks for your support and patience during this time.

--
Update: 03/02/2007 10:51 and this just in... a further explanation:

Further to the two e-mails sent yesterday, I can now confirm further details of yesterdays issues.

The main cause of the outage for PoundHost, and the outage suffered by other ISPs located in BlueSquare, was due to their fibre provider, TeliaSonera, *disconnecting* the wrong fibre when testing another customer's link, due to them mis-labeling the fibre in their London datacentre suite (Telehouse East).

As per yesterdays email, our second diverse fibre, which was working fine, was down for general maintenance due to router and switch software upgrades which we were carrying out. When the primary fibre went down, we bought our second fibre up as quickly as possible, however due to the maintenance we were doing the connection caused packetloss and routing from some ISP's but not others.
We have now fully completed our maintenance on this fibre overnight, so in the event of any further issues on the primary fibre we can easily route over the second diverse fibre without the issues experienced yesterday afternoon.

Some customers have mentioned that contacting us via phone was difficult, this is due to our phone system being VoIP (voice over IP) based, so the bad connection meant we had troubles hearing customers, and in some cases no phones at all. I have this morning ordered 4 BT lines to replace the VoIP system to be installed within the next 14 days direct into our support office using the existing numbers, to avoid this happening in the future.

Following customer feedback, we have now setup a new NOC (Network Operations Centre) status page, which we will be finishing and launching on Tuesday to keep customers updated should there ever be any network issues in the future. This will allow you to get a quick glance status, and updates such as these on the website. We will email you on Tuesday once this site is completed with more details.

Again, we apologise for any inconvenience caused yesterday. It's an unfortunate chain of events that we already have systems and links in place to deal with (this kind of fibre break), but as with all things in life when you need it, it's the one time it isn't available, in our case the second fibre being down for our own internal maintenance. For reference, our two diverse fibres are called 'THE' and 'TCX' which you will see in any 'trace routes' you complete, depending on the routing destination.