Slight interruption of service at Union Station

By Hongli Lai on April 10th, 2011

At 3:00 PM CET a slight interruption of service has occurred due to network problems at our data center. As a result some databases could not be reached, resulting in the web interface displaying error messages. After 45 minutes, the network problems were restored. Access to the databases were restored 15 minutes later, at 4:00 PM CET, due to emergency checkups.

No data has been lost during this event. The Union Station server architecture was designed with failures like this mind; during the down time, all data sent to us was stored in a backlog. Our background workers restored the backlog at 4:25 PM CET, i.e. within 30 minutes.

In the past few weeks we’ve been fixing bugs, adding features as well as tuning and optimizing our backend servers. The tuning and optimizing have paid off greatly and it is evidenced by the fact that the background workers and the databases were holding up during peak load (which is around this time) while restoring the backlog. We shall blog about the changes in a new blog post soon.

Our apologies for the inconvenience caused today.