Thursday, November 01, 2007
Rack failure
We have experienced a complete power failure in one of our racks at the IFL data centre. This occured at the PDU breaker as opposed to in rack, and we are in the process of assessing damage and bringing up services as it is safe to do so.
This has taken out a number of primary services, however we are aware of the situation there is no need to mail in.
We will post progress as it occurs, and if you are experiencing issues above and beyond what is reported as fixed, then please do mail into support@support.nsnoc.com . Thank you.
[2209 Update] We have returned at this point domain name resolution and the primary of the three clusters. We are continuing to work on the situation both on site and remotely and will post more as we can. SSL server services have been returned.
[2240 Update] We have returned mail collection services and mail relay services. Inbound mail from external sources has not been effected. Clusters two and three are back in service delivering web traffic.
[2330 Update]
All core services have been returned to normal. We currently have 5 servers offline hosting about 30 customers which we have tracked back to a failed APC power distribution unit. It is unclear whether the unit failed causing the trip switch to cut power to the rack or whether the unit failed as a result of a power distribution fault. We are making arrangements for the unit to be tested and replaced if necessary as soon as possible.
This has taken out a number of primary services, however we are aware of the situation there is no need to mail in.
We will post progress as it occurs, and if you are experiencing issues above and beyond what is reported as fixed, then please do mail into support@support.nsnoc.com . Thank you.
[2209 Update] We have returned at this point domain name resolution and the primary of the three clusters. We are continuing to work on the situation both on site and remotely and will post more as we can. SSL server services have been returned.
[2240 Update] We have returned mail collection services and mail relay services. Inbound mail from external sources has not been effected. Clusters two and three are back in service delivering web traffic.
[2330 Update]
All core services have been returned to normal. We currently have 5 servers offline hosting about 30 customers which we have tracked back to a failed APC power distribution unit. It is unclear whether the unit failed causing the trip switch to cut power to the rack or whether the unit failed as a result of a power distribution fault. We are making arrangements for the unit to be tested and replaced if necessary as soon as possible.