Object Storage backend unavailable

Minor incident Region RMA (Rümlang, ZH, Switzerland) Object Storage Service (RMA)
2025-07-01 09:10 CEST · 5 minutes

Updates

Post-mortem

After an online configuration change the Object Storage in Rümlang (RMA) was returning “503 Service unavailable” responses on Tuesday 2025-07-01 from 09:09:51to 09:15:14.

During testing of this change we did not see this kind of service disruption. The same change was also rolled out in Lupfig (LPG) with a backend service interruption of less than 1s. Such short interruptions are buffered by our front-end proxy and not noticeable by the end-user.

Closer investigation showed that the configuration change automatically caused all backend servers to pause for a configuration reload. Before the reload the servers block new connections but wait for all ongoing requests to finish. Unfortunately all backend servers were processing some long running downloads. This caused new connections to be blocked until these downloads finished and the configuration could be reloaded.

Usually we try to minimize service disruption by restarting backend servers one at a time. Because of the special nature of the configuration change which affected the storage cluster as a whole, the configuration reload was automatic and not triggered by an operator.

For future configuration changes we will modify our deployment procedure to minimize service disruption by shutting down long running requests if needed to speed up configuration reloads and keep the downtime minimal.

We sincerely apologize for the inconvenience this outage has caused you and your customers.

July 1, 2025 · 17:24 CEST
Issue

An configuration change caused our object storage in our RMA region to be unavailable between 09:10 and 09:15. We are still investigating the cause of this outage.

In the meantime, the object storage status is back to normal.

July 1, 2025 · 09:14 CEST

← Back