Restricted Internet Connectivity

Minor incident General Core Network Infrastructure
2021-01-27 13:20 CET · 5 hours, 55 minutes

Updates

Post-mortem

Management Summary

In the afternoon of Wednesday, 2021-01-27, cloudscale.ch was the target of a DDoS attack which was announced by a blackmail message just minutes before the attack started. Thanks to our documented and tested procedures, impact on our customers’ services could be avoided for much of the attack’s duration. However, in addition to a short period of link saturation caused by the attack traffic directly, some of our mitigation measures had unwanted side effects, limiting connectivity for a subset of virtual servers and situations. As an immediate action, we have further extended our monitoring and will thoroughly review our DDoS mitigation strategies.

Please accept our apologies for the inconvenience this incident may have caused you and your customers.

Detailed Incident Report

13:20 - 19:15 CET: Overall incident duration

Situation

cloudscale.ch was targeted by a volume-based DDoS attack using a number of attack techniques. Inbound attack traffic started at 13:20 CET, just minutes after we received a blackmail message at 13:07 CET. Over time, the attack details changed, involving different IPv4 addresses in multiple subnets, and growing in traffic volume.

Thanks to our monitoring, including alert thresholds for utilization of individual links, we were able to immediately execute our documented mitigation procedure, responding to the attack characteristics we were observing.

At about 18:30 CET, the attack traffic faded. However, as a precautionary measure, we decided to keep and adapt our mitigation measures for a little longer.

Impact

For most of the total incident duration, thanks to our redundant, amply sized uplinks and quick mitigation efforts, there was no relevant impact to our customers’ services. During specific periods, however, effects of the attack and our countermeasures affected parts of our customers’ external connections and/or servers (see below for details).

17:13 - 17:28 CET: Saturation of certain links

Situation

At 17:13 CET, we noticed a change in the attack pattern. It started to include a larger number of different IP addresses in our network, and grew further in traffic volume. As a consequence, certain links were fully utilized, causing congestion and connectivity failures for connections using one of those paths.

Measures taken

Building on the measures which were already in place, we adapted and reinforced our attack mitigation in order to move traffic away from the saturated links and to achieve more effective filtering of attack traffic based on the new target addresses as we began seeing them in the attack.

Impact

Saturation of certain links caused connection failures or degraded performance for connections between systems within our cloud infrastructure and external systems. These issues potentially affected all customers, but only for connections routed through one of the saturated links. Affected connections included traffic of virtual servers, DNS lookups using our resolvers, requests to our object storage from external sources as well as access to our website, Cloud Control Panel, and API.

Not affected

Network traffic within our cloud infrastructure (both using public IP addresses and connections through private networks) was not affected by the attack.

17:39 - 17:51 CET and 18:21 - 19:15 CET: Side effects from mitigation measures

Situation

The attack mitigation measures taken so far proved to be effective in keeping attack traffic under control. However, traffic distribution across links was far from ideal. There was significant utilization on some of the links, which posed a risk to stable operation in case of a further increase in (attack) traffic, while legitimate traffic was not using available capacity on other links.

Measures taken

We tried to re-engineer traffic distribution across uplinks by multiple changes in BGP announcements and options in order to move traffic to underutilized links while keeping traffic filtering in place to mitigate the ongoing DDoS attack. Unfortunately, some of these changes had unwanted side effects on some of the network segments which were involved in the attack.

Impact

As a consequence of the traffic redistribution efforts, external connections from and to virtual servers in the RMA region using IPv4 addresses in either 5.102.145.0/24, 5.102.146.0/24, or 5.102.147.0/24 were not possible.

Not affected

Given the specific scope of the impact mentioned above, the following use cases and variations were not affected by this issue:

  • Network traffic within our cloud infrastructure (both using public IP addresses and connections through private networks)
  • External connections from and to virtual servers in the RMA region using Floating IPs; in combination with internal traffic being unaffected (see above), this means that HA/load-balancing and similar setups remained fully available as long as external connections were established th
January 29, 2021 · 16:47 CET
Update

Since about 19:15 CET, traffic has been back to normal levels and no further outages have been observed. A detailed report on this incident will be published at a later date. Please accept our apologies for any inconvenience this incident may have caused.

January 27, 2021 · 22:12 CET
Update

We are seeing another peak of inbound traffic, and are adapting our measures to ensure stable connectivity.

January 27, 2021 · 17:22 CET
Update

Internet connectivity is stable again. We keep watching the state and will further update this incident ticket if necessary.

January 27, 2021 · 13:40 CET
Issue

We are facing inbound traffic at unusual patterns and volumes.

Traffic to/from certain external targets might be affected by degraded performance (throughput, latency, packet loss) to varying degrees. This includes traffic of virtual servers, DNS lookups using our resolvers, requests to our object storage from external sources as well as access to our website, Cloud Control Panel, and API.

We keep watching the state and will update this incident ticket if necessary.

Please accept our apologies for the inconvenience this issue may cause you and your customers.

January 27, 2021 · 13:20 CET

← Back