Tuesday, June 10, 2008

 

Webclusters 150 and 156/212

It has become apparent that the work on the database server today has not resolved all of the issues on the above clusters. While the service is working better than it was earlier, the process list on the web servers is continuing to climb to the point that new connections are not permitted.

At this point we have eliminated the possibility that the fault is within the LVS load balancer (that was replaced this morning) and the SQL server (replaced 1pm with a new server). We have also eliminated the network as the possible source of the issue as virtual servers and the mail service are working without issue.

The only other element of the service which is now in question is the nfs file server. While there are no obvious errors being produced we feel that it is the only possible cause of the issues left. A new file server is in the rack and we have just begun the process of transferring of data from the old server to the new. We expect that to be substantially completed within the next 4 hours.

UPDATE 8PM

The transfer of files to a new file server is underway and proceeding without issues. The web servers have been pointed to the new file server. Files are being restored from a to z so sites starting a and b have already been migrated. Judging by the first hour of transfer, we expect the process to complete in the early hours of the morning.

We would like to thank customers for their patience during this time.

UPDATE 1AM

We are approaching half way through the transfer of sites from the old file server to the new. We expect the remainder of the process to be completed by 6 to 8am.

Webmail services have been restored and are working without issue.

FTP access to the new file server will be suspended until mid morning Wednesday.

UPDATE 7AM

The file transfer is still running with about 75% of sites completed allbeit very slowly. Clients with sites still not available can email our support email address with any sites not showing so that we can push them by hand. Priority will be given to business sites.

UPDATE 11 AM

All of the remaining sites should have been resored within the next 2 hours. Once that is complete, ftp access will be made available to the new file server. Customers will not need to change any settings in their ftp clients.

UPDATE 2PM

All transfers are completed and there appears to be stability at last. A few bugs with sites have cropped up during the process but they have been ironed out. If you are aware of any site which is not working correctly please raise the issue with our support mail address and we will investigate it.

In summary,
Mysql has moved to a new server, no changes required from customers.
File server is on new hardware, no customer changes needed.
FTP service is up and working, no change to FTP settings needed.
Webmail up and working.

We would again like to thank our customers for their patience during this issue.

 

Web Clusters 150, 156, 212

We are currently dealing with an issue on all web clusters whereby requests are not being serviced correctly resulting in slow page load times.

Our investigations over the last 12+ hours have drawn a blank with the result that there is no apparent reason for the poor performance of the web servers.

We are in the process of bringing in outside support this morning to double check our own investigations and are in the process of installing a new file server.

At this point, we have tried to eliminate all of the common points of failure in the web cluster service. The file server appears to be operating correctly but as it is such a major component in the delivery of the web pages, and given that we have no other potential sources for the problem we have decided to replace the file server as a precaution. We expect that work to be completed by lunchtime today.

UPDATE 9.40am

Further testing has identified that the bottleneck appears to be the SQL server not the File server as previously thought. Work to replace the file server has been suspended for the time being. We have a standby version 5 SQL server and are currently investigating the issues involved in making an upgrade from the current mysql4 service to mysql5 on the new server. If the issues appear to carry too high a risk, then we will prepare a Mysql 4 server. This work is expected to last for the next 3 hours.

UPDATE 1pm

The database tables and indexes on the existing sql server have all been repaired but this has not given us any significant performance increase. A new Mysql 4 server is being prepared to take the databases. Next update 2pm.

UPDATE 2PM

The new database server should be running within the next 10 to 15 minutes.

UPDATE 3PM

The main database has now migrated to a new multi core host server. Initial tests are showing that the loads on the web servers is now more stable with pages being served within normal times.

We will continue to monitor the service for the next 24 hours to ensure stability has been returned.

We would like to thank our customers for their patience during this outage and offer our sincere apologies for the intermittent service over the last 24 hours.

Monday, June 09, 2008

 

VPS/ vserver clients.

VPS/vserver users please be aware that unless you specifically have a managed account with us then it is your responsibility to keep the server as secure as possible.

We have had two incidences already this week where clients have failed to keep on top of patches and updates that has resulted in machines being taken off line after they have been used in attacks on other networks.

If you have any questions regarding this or could do with some advice, then drop us a mail to support and we will assist as much as we can - we would prefer you to be on top of the situation as opposed to it coming as unwelcome news.

 

Network congestion

We are currently experiencing an inbound UDP flood.

Once we have isolated the source we will be able to deal with it and return services to you. Technically the services are unaffected, however while the switches are so busy service will be slow or time out.

Monday, June 02, 2008

 

Phone Support

We are currently moving offices.

While we are back online and answering support we are afraid we are yet to have our phones re routed.

In the meantime please continue to use email support (support@nsnoc.com), it is our preferred method of contact - we will post here again to update you on the phones status as and when we are able.

This page is powered by Blogger. Isn't yours?