A members-only session during the OARC 34 workshop was held 2021-02-05. Giovane Moura of SIDN presented a finding where loops in the name servers for zones can cause certain DNS resolvers to enter into an infinite loop. This was given during the members-only session since it was considered a possible security issue; all OARC members have signed non-disclosure agreements to allow this kind of responsible disclosure. Like all great vulnerabilities in recent times, a catchy name (tsuNAME) and glossy web site - tsuNAME.io - were created.
NS1 is an OARC member (indeed the current chair of the program committee, Jan Včelák, works at NS1), and learned about the tsuNAME issue during this session. We reviewed the threat and came up with mitigation options based on the ideas outlined by the researchers who identified it.
At the OARC 35 workshop the issue was made public on 2021-05-06, and we are able to discuss it and our efforts to deal with the problem.
DNS is a deeply self-referential protocol, which is a fancy way to say that a lot of DNS uses DNS to work. For example, each DNS domain has a set of name servers that provide answers about the domain, described by NS records.
|zone.example NS ns1.zone.example |
zone.example NS ns2.zone.example
zone.example NS ns3.some-other-zone.example
As you can see, these name servers are identified by name, which can be either inside the domain or part of a different domain.
In the case of tsuNAME, either through misconfiguration or malicious intent, these NS records cause a loop. A simple example:
# all of Alice's name servers are in Bob's domain
|# all of Bob's name servers are in Alice's domain|
bob.example NS ns1.alice.example
bob.example NS ns2.alice.example
A resolver trying to look up anything in alice.example will need to get the address of the name servers, and then look that up in bob.example, which will need the address of the name servers, and try to look that up in alice.example, and so on.
Some DNS resolvers will continue to follow this loop forever, or at least for a long time (most modern resolvers will detect the loop and return an error to the user). This causes a lot of load on all of the servers involved, both the recursive resolvers sending the queries and the authoritative servers answering them.
As with most loops in the DNS, tsuNAME loops can be any length, so the entire chain of NS must be checked for loops.
Preventing tsuNAME Loops
NS1 operates authoritative name servers, which may be one or all of the domains in a tsuNAME loop. We can modify our systems to prevent a new or updated zone from creating a loop.
However, it is also possible that domains in a name server loop are hosted by other systems. So even if NS1 was able to prevent a tsuNAME loop when a zone was created or modified, there is no way to guarantee that domains on other systems are not modified to create a loop.
In order to create a single system which would help catch all tsuNAME loops we adopted the code of CycleHunter, provided by the researchers who identified the tsuNAME vulnerability. The entire process for domains that we host looks like this:
- Extract all NS records used by NS1-hosted domains
- Query those NS records, and output those that timeout
- Scrutinize each timed out NS and see which ones are really cyclic dependent
- Identify only fully cyclic dependent ones
The first step (extracting NS records) is customized from the CycleHunter codebase, since we do not use zone files to store our zone data. We also made some enhancements to improve the performance of the code, which we pushed upstream to the researchers, and these changes are now merged.
NS1 runs this detection mechanism regularly against all of our hosted zones. No loops have been detected yet, and if we do detect them they will be broken to prevent unwanted traffic.
Everything Old is New Again
When I discussed tsuNAME with some colleagues one response that I got was, “isn’t it just an NS loop?”. Indeed as the tsuNAME.io page points out, such NS loops were identified as a problem by RFC 1536, which was published in 1993. Is this an actual vulnerability that should be a cause for concern?
During the initial presentation in OARC 34 several ccTLD operators indicated that they had experienced this problem before; apparently the expectation was that eventually any large-scale authoritative operator would see traffic bursts from this and update their systems to deal with the issue. This seems to be a somewhat inefficient way to deal with operational problems, so whether tsuNAME is considered a security issue or merely a best practice, it is good that it has been brought to the attention of the wider DNS community.
Of special interest in the case of tsuNAME may be the role of Google’s Public DNS (also known as 220.127.116.11). Most of the resolvers vulnerable to this issue are either old, not widely-deployed, or both. The Google Public DNS resolvers were also vulnerable, and thus a potential source of a huge amount of traffic that an authoritative operator could not stop merely by blocking the particular servers involved, since Google accounts for such a large portion of the Internet. On the other hand, since Google is a hosted service it was relatively easy to get all of the servers patched with software which avoids tsuNAME loops, compared to a traditional deploy of a fix for a DNS bug.
Overall, the researchers reacted responsibly and did a nice job in both their analysis and the way they brought it forward to the rest of the DNS community. This helped us at NS1 keep our systems protected, which helps both our customers and everyone using their systems.