Friday, November 30, 2007
Forwarding to BT
We have had a few cases now where the nice people at BTinternet/Yahoo are delaying mail that is being forwarded to accounts with them.
Needless to say there is little we can do about this. If you are set up with a forwarder from your domain to a BT account, and mail is not getting through or is delayed, please contact them and ask for support on the matter.
In the meanwhile the sender may be receiving mails from our mail servers telling you that they have been unable to deliver so far; this is not our issue, they are just not accepting the mail.
Needless to say there is little we can do about this. If you are set up with a forwarder from your domain to a BT account, and mail is not getting through or is delayed, please contact them and ask for support on the matter.
In the meanwhile the sender may be receiving mails from our mail servers telling you that they have been unable to deliver so far; this is not our issue, they are just not accepting the mail.
Saturday, November 17, 2007
Load Balancer reboot
Our LVS load balancer required a reboot this evening. Sadly our monitoring and sms alerting provided by Tmobile failed to notify the oncall engineer of the issue for over an hour. Once the alert had been issued we were able to restore service within 10 minutes.
Services with issues incude, hosting on service 150 and 212 as well as pop collection from pop1.nsnoc.com
Services with issues incude, hosting on service 150 and 212 as well as pop collection from pop1.nsnoc.com
Saturday, November 10, 2007
Temporary outage remedied.
A short outage was alerted and remedied this morning. This involved a load balancer, and for those users that were involved, connections to the cluster one and three and mail collection will have been noticed. This has been remedied for the effected inbound users traffic.
During this period inbound email was unaffected for all users.
During this period inbound email was unaffected for all users.
Thursday, November 01, 2007
Rack failure
We have experienced a complete power failure in one of our racks at the IFL data centre. This occured at the PDU breaker as opposed to in rack, and we are in the process of assessing damage and bringing up services as it is safe to do so.
This has taken out a number of primary services, however we are aware of the situation there is no need to mail in.
We will post progress as it occurs, and if you are experiencing issues above and beyond what is reported as fixed, then please do mail into support@support.nsnoc.com . Thank you.
[2209 Update] We have returned at this point domain name resolution and the primary of the three clusters. We are continuing to work on the situation both on site and remotely and will post more as we can. SSL server services have been returned.
[2240 Update] We have returned mail collection services and mail relay services. Inbound mail from external sources has not been effected. Clusters two and three are back in service delivering web traffic.
[2330 Update]
All core services have been returned to normal. We currently have 5 servers offline hosting about 30 customers which we have tracked back to a failed APC power distribution unit. It is unclear whether the unit failed causing the trip switch to cut power to the rack or whether the unit failed as a result of a power distribution fault. We are making arrangements for the unit to be tested and replaced if necessary as soon as possible.
This has taken out a number of primary services, however we are aware of the situation there is no need to mail in.
We will post progress as it occurs, and if you are experiencing issues above and beyond what is reported as fixed, then please do mail into support@support.nsnoc.com . Thank you.
[2209 Update] We have returned at this point domain name resolution and the primary of the three clusters. We are continuing to work on the situation both on site and remotely and will post more as we can. SSL server services have been returned.
[2240 Update] We have returned mail collection services and mail relay services. Inbound mail from external sources has not been effected. Clusters two and three are back in service delivering web traffic.
[2330 Update]
All core services have been returned to normal. We currently have 5 servers offline hosting about 30 customers which we have tracked back to a failed APC power distribution unit. It is unclear whether the unit failed causing the trip switch to cut power to the rack or whether the unit failed as a result of a power distribution fault. We are making arrangements for the unit to be tested and replaced if necessary as soon as possible.