Recursive DNS resolvers query authoritative nameservers to answer DNS queries. Some recursive resolvers query the authoritative nameservers in a “round robin” fashion. For queries routed through these resolvers, traffic will be evenly distributed among all identified nameservers for the domain in question.
Other resolvers are smarter about how they handle their choices. These recursive resolvers employ a methodology known as SRTT, or “Smoothed Round Trip Time”. SRTT prefers the lowest latency nameserver from the available pool while still sending some queries to the other nameservers in the pool.
SRTT’s algorithm can be implemented in different ways. For this example, we’ll rely on the algorithm used in BIND based on this work by Yingdi Yu and Matt Larsen.
Each nameserver is initially assigned a random low latency SRTT value between 1 and 32 milliseconds.
The authoritative nameservers are queried in the order of their SRTT values. The lowest number gets queried first.
The recursive resolver measures how long the nameserver takes to respond and updates that name server’s SRTT value by taking a weighted average of the old & new measurements.
At the same time, any non-queried authoritative nameservers have their SRTT values multiplied by a decay factor of 98%.
NOTE: This ensures the recursive resolver is frequently rechecking all nameservers in the pool to prevent data starvation. Should a name server become unresponsive, the SRTT value gets punished by boosting it’s standing by 200ms (with an upper boundary of 1 second).
Round Robin Style Example
Here’s an example to show you the results of both round-robin and SRTT queries to nameservers.
These 8 nameservers are authoritative for example.com. Four servers are NS1’s nameservers and 4 servers are from a second, redundant DNS provider.
1. dns1.p02.nsone.net 5. ns1.secondnetwork.com
2. dns2.p02.nsone.net 6. ns2.secondnetwork.com
3. dns3.p02.nsone.net 7. ns3.secondnetwork.com
4. dns4.p02.nsone.net 8. ns4.secondnetwork.com
Round Robin nameserver assignment starts at the top of the list and goes to the end, then it repeats:
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 . . . .
SRTT Style Example
Here are the same 8 nameservers. This time they have been assigned latency values based on SRTT calculations. The values aren’t really random here, but it will work for our purposes.
1. dns1.p02.nsone.net 12ms 5. ns1.secondnetwork.com 20ms
2. dns2.p02.nsone.net 16ms 6. ns2.secondnetwork.com 32ms
3. dns3.p02.nsone.net 8ms 7. ns3.secondnetwork.com 28ms
4. dns4.p02.nsone.net 4ms 8. ns4.secondnetwork.com 24ms
Round 1 queries:
Server 4 has the lowest latency, so the Round 1 query goes to server 4. The answer responds in 8ms. All other servers’ latency is multiplied by the 98% decay factor. After round 1 here are the latencies of all 8 servers:
1. dns1.p02.nsone.net 11.76ms 5. ns1.secondnetwork.com 19.60ms
2. dns2.p02.nsone.net 15.68ms 6. ns2.secondnetwork.com 31.36ms
3. dns3.p02.nsone.net 7.84ms 7. ns3.secondnetwork.com 27.44ms
4. dns4.p02.nsone.net 8ms 8. ns4.secondnetwork.com 23.52ms
After the latency adjustments in round 1, server 3 is now the lowest latency and will be selected in round 2 to answer the next query.
SRTT does a good job of sending traffic to the more performant networks, but in real world applications some small portion of your traffic will inevitably end up getting served by your slowest nameservers. How large this portion is depends primarily on the performance gap among the networks serving your answers. Two sets of nameservers with similar response times will likely share the burden of traffic 50/50, whereas nameserver sets with significant differences in performance will have the faster network answering the majority of queries.
NS1 has put considerable effort into tracking which resolvers are using SRTT and therefore, prefer more performant endpoints. Using our Pulsar product, we’ve been able to map end users to the DNS resolver IPs they use. This allows us to monitor which authoritative nameserver is selected by the resolver and determine the probability of selecting the fastest nameserver rather than simply choosing the next one in sequence. Our data would suggest that while ~49% of the total resolvers actually employ SRTT, between 80% - 90% of Internet traffic is serviced by SRTT enabled resolvers.