10th June 2021

NFS Managed Platform NFS partial outage in DC Berlin Kitzingstraße

We are currently facing a service failure in our central storage appliance in our datacenter Berlin Kitzingstraße. This leads to problems in access to NFS shares. We are working on a solution with high pressure.

[update 10/06/2021 13:44]

Storage Cluster seems to have switched off due to a power outage. Power supply is fixed and nodes are booting up.

[update 10/06/2021 14:23]

The cluster could not yet be recovered completely. We are still working on the solution.

[update 10/06/2021 15:17]

We're still working on the complete recovery of all cluster nodes.

[update 10/06/2021 17:27]

The functionality of the Isilon Storage Cluster could not yet be fully restored. After 2 of the 6 cluster nodes could not be started successfully, the cluster was initially not writable. In the meantime, we are observing successful write operations in the cluster and are taking further measures to further stabilize the status by remounting the shares.

While we continue to work with the manufacturer's support to restore functionality, we have activated the emergency plan and are preparing to move the clients to the second Isilon storage cluster at the Berlin Lützowstraße site. There, the shares are available with a daily synchronized status, i.e. up to 24h old. We try to synchronize the changes from the original cluster or, as an alternative, provide the old share as an additional read-only mountpoint.

If the timely recovery of the functionality is hopeless, we will start pivoting the shares soon, without additional announcement.

[update 10/06/2021 18:40]

Systems are beginning to recover. We are still working with high pressure to permanently fix the problem.

[update 10/06/2021 19:14]

We are currently preparing the announced move to the cluster in the second data center.

[update 10/06/2021 19:50]

At 19:50, we started moving customer by customer to the cluster in the other data center. We do this one at a time so that the load on the cluster does not increase to fast.

[update 10/06/2021 22:30]

We have restored all systems. At the moment, rework is still in progress.

[update 11/06/2021 00:03]

We were able to restore functionality for the most part by migrating the shares to the replacement system. Where migration did not take place, we contacted you directly. According to our monitoring systems, all services have been restored.

If you still experience problems, please contact us.

We regret the outage and are continuing to investigate the exact cause and may contact you to take further action.

[update 12/06/2021 12:00]

We monitored the situation in detail overnight and were able to observe stable operations. We will continue to analyze the cause and the effects and contact our affected customers with detailed error analysis.

If you have any questions or unexpected behavior occurs in your setup, please directly contact our support.