Early morning on Monday, May 16, our Managed DNS network, along with many other network and infrastructure resources across our platform, came under a series of concerted assaults from a determined attacker. Over the rest of last week, and persisting even now, we sustained dozens of individual attacks leveraging a variety of strategies, exhibiting a rare degree of sophistication and scale.

First, a brief summary:

Over the course of last week, we sustained dozens of large DDoS attacks, ranging in strategy from simple volumetric attacks, to complex direct DNS lookup attacks, to concentrated attacks against our upstream network providers and other vendors. These attacks are an escalation above and beyond a recently observed increase in malicious activity broadly targeting the DNS, CDN, and internet infrastructure industries.
Unfortunately, our customers were impacted early in the attacks by partial loss of service for brief periods on Monday and Tuesday, primarily in Europe. As we analyzed and adjusted our response to the shifting attacks, we prevented further impact, despite continuing attacks through the remainder of the week and into this week.
Indications are that NS1 is the target of the attacks, and not any specific customer.
We have implemented a number of adjustments to our mitigation strategies as a result of these attacks. We have also implemented, and are continuing to implement, network, systems, and software changes based on analysis of the attack models and failure modes we observed.
We recommend operators that depend upon DNS as a mission critical component of their infrastructure consider introducing redundancy via multiple authoritative Managed DNS networks. Redundancy can be achieved by working with multiple Managed DNS providers, or by deploying multiple delivery networks from a single provider.

In the rest of this article, I will share additional details about last week's events, how we handled them and changes we’re making, and what operators of critical web and application infrastructure that depend upon Managed DNS platforms like NS1's can do in the face of such events. Recently, many major DNS service providers have experienced service impact because of increasingly frequent and virulent attacks. It’s time to recognize that these threats are here to stay and will continue to escalate -- our industry, and operators of websites and applications that depend upon DNS, need to change the game.

What happened?

Last week's attacks mostly followed a distributed denial of service (DDoS) pattern. In a DDoS, the attacker attempts to overwhelm the resources of a target with a vast flood of malicious traffic. At NS1, we frequently mitigate large-scale DDoS attacks targeted at us or our customers, which include many well-known online properties. Over the several months prior to last week, for example, we weathered over a dozen attacks above 20-30 Gbps, with most above 10-20M packets/sec (pps) and some significantly larger, without any impact to our customers.

We maintain close ties with many other network service providers, including peers in the DNS, CDN, hosting, and other industries, and many have reported a marked increase in malicious activity over the last few months, ranging from volumetric DDoS attacks to elevated probing and vulnerability scanning against their networks.

The attacks of this past week were different in form and function than others we have sustained recently, combining an unusual degree of complexity, velocity, and persistence. Unfortunately, our customers were impacted. We observed a number of types of DDoS attack traffic. The attacks combined high volume traffic floods with more sophisticated strategies including malicious direct DNS queries, random label attacks, and malformed packet attacks. The attacks included broadly sourced queries for real customer domains and variations, as well as precise attacks against upstream network resources, resulting in some cases in impact on Tier 1 carriers providing the capacity we leverage to absorb attacks.

The attack traffic was also mobile – with the largest ongoing assaults directed at our European infrastructure, but also migrating to attack our network in the western US and Asia. By combining multiple highly targeted techniques, the attackers created a complex and evolving situation that did result in brief periods of packet loss at several of our nodes. This caused partial DNS delivery service failure during those periods, particularly in Europe.

Who was the target of the attacks?

It is often difficult to determine the motivation of an attacker in a DDoS scenario or who is responsible. By their nature, DDoS attacks are widely distributed and use compromised or poorly configured systems as vectors from which to carry out attacks. However, in this case we have little doubt that the target of the attacker was us, and not any specific customer of NS1.

This is clear for a number of reasons. The attackers targeted not just our Managed DNS delivery network, but many other resources used by our platform and customers, including the hosting provider of our ns1.com website, the third party DNS and hosting providers of our system status website (our apologies and thanks to StatusPage.io, who handled the situation like pros), providers of core NS1 command-and-control systems used by our customers, and more. Attacks against these auxiliary systems do not have a direct impact on the websites or applications of our customers. In addition, patterns observed in the direct DNS attack traffic indicated the attacker had advanced knowledge of NS1's customers, likely obtained by controlling compromised DNS resolvers operated by one or more ISPs, and was targeting the platform broadly, not attempting to bring down any individual customer.

Unless a group ultimately steps forward and claims responsibility for a DDoS attack, it can be difficult or impossible to ever ascertain the underlying motivation or the responsible party. Attacks can be motivated by any number of things, ranging from political intentions to business motivations to outright malice. We will not speculate further. However, we have contacted the appropriate law enforcement authorities and are working with them to investigate.

What did NS1 do to mitigate the attacks and prevent further impact?

We maintain a large-scale global network that includes specific elements designed to assist in mitigating DDoS attacks. In addition to leveraging hardware and networking configurations to filter common attack patterns, we employ a number of well-known tools for real-time network visibility and filtering, and have also developed a wide array of DNS and NS1 specific tools for rapidly recognizing and mitigating more advanced attacks. Finally, specifically to aid in mitigating complex dynamic attacks like last week's, we partner with an anti-DDoS vendor, Zenedge, to augment our own capacity and technology with their network resources and expertise.

During last week's attacks, the primary customer impact came from malicious direct DNS query traffic designed specifically to look like legitimate DNS traffic. In some cases, this traffic resulted in service impacting load on our DNS delivery systems due to the unique nature and volume of the traffic. While we maintain massively overbuilt capacity to absorb traffic spikes, direct DNS DDoS traffic is generally thousands or tens of thousands of times larger than legitimate traffic, which is why we make every effort to filter the traffic as far upstream as possible. Exacerbating the issue, we also encountered several "backpressure" style failures in our systems because of severely elevated levels of non-attack traffic concentrated in our absorption facilities, above and beyond malicious DNS traffic. This occurs because upstream DNS resolvers aggressively "retry" when a DNS lookup against our systems fails, resulting in a large spike in legitimate traffic in addition to attack traffic. We have already implemented software and configuration changes, and brought online additional filtering and delivery capacity, to relieve this type of pressure more effectively.

In addition, both in our own network and in partnership with Zenedge, we have implemented a number of new filtering strategies in real-time as we have analyzed the types of attack traffic directed at our platform. This has resulted in full mitigation of all the attacks we have seen in the latter half of the incident, without customer impact.

What immediate steps are we taking to protect NS1’s platform and customers?

In every platform incident, whether caused by malicious attack or any other reason, it is our policy to conduct an internal postmortem analysis of the incident immediately upon resolution. We've already conducted our postmortem for last week's events, and will be following up with additional analysis as we review the data and timelines from the attacks and our mitigation efforts.

The result of a postmortem incident analysis is a prioritized plan for changes we need to make to our platform and processes, and over the next several weeks we will execute on that plan. Our analysis of last week's incident produced a number of areas where we will implement changes, including making many of our newly implemented filtering strategies permanent or more rapidly accessible, adjusting our approach for routing traffic through our anti-DDoS vendor, modifying our real-time analysis tools to better surface some of the attack patterns we observed, and further analyzing and adjusting our DNS delivery software and systems to ensure we eliminate sources of backpressure.

In addition to post-incident analysis and the resulting changes, we always maintain a regular cadence of incident response preparation. This includes fire drills to test our response to new types of events, proactive brainstorming, and testing and probing of our network and systems for susceptibility to new types of attacks. This approach typically keeps us ahead of attackers, although we of course occasionally first see new attacks in the wild.

How can website and application operators mitigate the risk posed by DDoS against DNS infrastructure?

DDoS attacks against critical infrastructure like DNS are not going away. Over the last several months, we've seen an increasing rate of large scale attacks against our own infrastructure, and we've also communicated with many other service providers in the DNS, CDN, and related spaces who have come under similarly concerning attacks resulting in impact to customers.

When a DNS platform is impacted by an attack, there are consequences for the domains serviced by the platform, with users seeing much slower page loads due to DNS timeouts and retries, or in the worst case, unable to resolve a domain altogether, resulting in outage.

As I've spoken with our customers over the last few days, the same question keeps coming up: "What can I do to be better prepared for these kinds of situations?" Most of our customers are themselves experienced and sophisticated operators and understand that incidents do impact systems, no matter how well designed, managed, and protected they are.

The trend we have observed over the last several years, and one that I believe must accelerate, is toward the deployment of redundant authoritative DNS delivery networks to service critical websites and applications. If your DNS is serviced by two independent networks, then it is unlikely both will be impacted by an incident at the same time. If one of the DNS networks is itself the target, as we experienced last week, then the other network is less likely to also be a target, and DNS is built to retry across the different nameservers to which you delegate your domain. Even if your property is the target of an attack, the attacker will be forced to divide their efforts across networks with independent resources and mitigation strategies, reducing the likelihood of impact.

Unfortunately deploying redundant DNS networks has been complicated by the fact that DNS is leveraged in more advanced ways today than it has been in the past. At NS1, of course, we believe that intelligence in DNS lookup is a powerful way to make decisions to control internet traffic. Several other providers have also implemented technology to enable traffic management with DNS, but the challenge is that every platform has different capabilities, with different configuration semantics, making advanced configurations difficult to translate across providers.

If your domains use only simple, "static", RFC compliant DNS records then you can rely on the long established approach for introducing DNS redundancy: zone transfer between providers to enable a primary-secondary topology. If your websites and applications use advanced features like traffic management tools, then you may consider implementing automation to generate synchronized configurations across multiple providers, pushing changes to their APIs -- we have several customers who take this approach. Alternatively, some providers can help you deploy multiple independent DNS delivery networks with a unified technology stack.

Regardless of the approach you choose, if DNS is critical to your website or application -- and for most operators, it is -- we encourage you to consider options for introducing redundancy in your authoritative DNS setup. We are happy to discuss approaches that meet your application’s requirements for reliability, performance, and functionality.

Building a better internet

I am privileged to have spent many years in the internet infrastructure industry, supporting customers and use cases of all sizes across all kinds of technologies. We have a wonderful industry that is smaller than it seems. At NS1 we work with best in class vendors, like our anti-DDoS partner Zenedge; Packet, who last week enabled us to rapidly scale and shift our mitigation infrastructure and even sent us coffee and bagels to keep our team cranking; and many more whose expertise and professionalism aligns with that of our own world-class team, which has responded rapidly, effectively, and steadily in the face of a persistent attacker. We maintain close ties with our peers in the DNS, CDN, and other infrastructure spaces, and have shared detailed information about these attacks with many of them over the last week.

As an industry, DNS service providers can do better to limit the impact of malicious attacks on our customers and the internet at large. Over the last decade, as the threat of attacks has escalated, we have entered an arms race, with DNS providers continually evolving the scale and sophistication of our networks to match those of the attacks themselves. We have treated the issue as one for individual providers to solve independently, and even as a commercial opportunity. This approach has exacerbated the impact of DDoS attacks because of the lack of a cohesive, industry-wide approach to solving such attacks and limiting the impact on customers.

The problem is both technical -- because of divergence from the historically interoperable nature of DNS as new, more advanced features have been brought to market -- and educational -- because vendors in the space focus on all-or-nothing wins instead of enabling customers to introduce the redundancy that would limit the impact of DDoS and other issues. As I have spoken with customers, it’s become clear to me they are looking to our industry to work together to solve these problems and enable redundancy and resiliency simply and transparently, without compromising on the modern functionality that makes DNS lookup such a powerful tool for application delivery management and optimization.

I am an infrastructure engineer, and as such I know this is a problem we as an industry can solve. I hope this article escalates the dialog among our peers in the DNS space and helps shift the discussion back toward solutions that enable our collective customers to leverage our powerful technologies interoperably, which will result in a more secure and better internet for all.