Monday, December 19, 2011

 

Routing Issue

We've been investigating two reports of end users not able to reach our network. In both cases the points of failure appear to be well outside of our network. As yet we have been unable to identify the network at the heart of the issue. If anything develops we'll advise here, but at this point in time we're not seeing any changes in network usage or border routes.

Thursday, December 08, 2011

 

Rack A5 power trip

The power supply to rack A5 has tripped due to the power supply on server 64 failing.

Power has been restored, servers in A5 are all working as normal.

Services (POP/IMAP mail) from Server 64 have transferred to servers 65 and 66 via the cluster management system.

Sunday, December 04, 2011

 

POP/IMAP/Webmail Servers

The file server responsible for storing mail data has crashed three times in the last 3 days.

Investigations have shown one of the drives to be generating IO errors.

At midnight the failed drive will be replaced and re silvered

UPDATE 9am Monday

The drive replacement / resilvering is 50% completed. The file server is operating in a degraded state so is running with less performance than normal.

We expect the process to be completed by 6pm.

Until that time the mail servers will run with a higher than normal load and may be less responsive that usual.

UPDATE 1pm Monday

Work to bring the file server up to speed with a full disk set should now be completed by 4pm.

We have identified during the work that one of the POP cluster servers is suffering from a faulty network connection. Further investigation will continue this afternoon.

UPDATE 3 pm Monday

The process to re silver the missing drive is due to complete at 4.30pm.

During the monitoring of the server loads it has become increasingly apparent that there are a number of customers with POP accounts where mail is being stored for considerable periods of time. POP is not best suited to storing mail for longer periods and it is apparent that the loads on the POP servers are being caused by inappropriate use of the POP accounts.

We will be performing a review of POP usage over the next week and advising customers where their usage needs to be modified.

UPDATE 8pm Monday

File system completed the rebuilt at 4.30pm. Server loads all returned to normal within a few minutes of the work being completed.

We will monitor the situation over the next 24 hours to ensure there are no further issues.

This page is powered by Blogger. Isn't yours?