Skip to main content Skip to search
Carl Levine
Posted by
Carl Levine on
February 9, 2018

Leveraging Visibility Around DNS For Auditing + Optimization

We are fortunate to live in an age where we have more data than what we know what to do with. Of course, this is a double-edged sword in many respects. Discerning what data is most important to provide context around a given piece of infrastructure, or even a complete layer of the stack remains one of the key challenges. Specifically focusing on the DNS, having visibility into the state of this essential layer of the stack is critical to the optimization and tuning of how queries are answered. (If you want to learn more about how NS1 can help you do this, speak with an expert today)

Thinking of this another way, leveraging a metric like zone-level Queries Per Second (QPS) and using that specifically to gauge the effectiveness of your DNS solution is like solely using the trip odometer in your car to estimate how much fuel is left in your tank. You may know roughly how many miles per tank your car is good for, but you don’t have the visibility into the rest of the vehicle to understand factors (worn spark plugs, engine computer calibration, fuel quality) that may cause that to be inaccurate. Returning to the DNS with this parable in mind, being able to use all of the data that your DNS provider offers to get a clear picture of the state of your infrastructure is a key strategy to ensuring the best possible performance and reliability of your tech stack.

At NS1, we ensure that every one of our customers has access to the data, resources, and tools to augment visibility into the state of the DNS. Here are a few impactful ways that NS1 can provide a level of granularity that outdated DNS solutions struggle to achieve, and use data you already have to make things run even smoother:

API

Having an API is table stakes in today’s data-driven world. When the DNS was originally conceived, the use cases that would be used 20-30 years later had not been envisaged, meaning that a lot of programmatic features and interfaces were simply bolted on after the fact. The vast array of managed DNS providers do not provide the degree of flexibility needed to get meaningful and impactful data out of their DNS management offering. With NS1, the API was the first thing built when we set out to write our own name server software. We realized the importance of the data inside our platform had immense value in the context of modern application delivery, so we made it able to be piped out to other tools to augment other parts of the overall site reliability view.

Record Level Reporting

This degree of granularity allows an NS1 user, either via the API or Portal, to understand the trends and query volume of a specific hostname. Outdated solutions rarely provide this degree of granularity, in fact, if you’re lucky you might just get the number of queries for the whole zone - that’s it.

To that end, being able to see what a specific record has for traffic allows for optimizations to be made. Tuning TTLs for the optimization of query consumption is made easier by understanding the specific traffic profile of a given hostname. The TTL value on a record that isn’t subject to frequent changes, such as a Mail Exchanger (MX) Record, or Text (TXT) record, may have had a really low TTL attached to the record set initially. Naturally, MX records and TXT records are frequently queried as part of the normal course of business for the delivery of email to a domain, but the information that the record provides does not change all that often. Using information derived from record level reporting, a user can ascertain whether or not the overall query count for the zone can be tuned for lower overall consumption.

Another way to leverage the record level reporting data comes in the context of automation. Where this data is rendered in near real-time, meaning that the data is available as a feed from the NS1 API, it makes a great integration point with monitoring/alerting solutions you may already have in place. As an example, record level data can be used for alerting and diagnostics. If the nominal average query count for a specific record is around .5 QPS and all of a sudden it starts returning erratic results indicative of anomalous behavior, it should trigger an alert for being out of band from the norm. With that alerting in place, a site reliability engineer can not only be made known of a potential attack vector, but also have the context to make that determination in a much more expedient manner.

Native Monitoring

All of the data that is available as a visual element in our portal can be exported as usable metrics to derive a more comprehensive view into the state of your infrastructure. This isn’t just limited to the query counts for zones and records, this also applies to our built-in monitoring. API calls are in place within the NS1 platform to extract information about decision criteria that a monitoring job used to set the status of a specific endpoint.

This kind of information can be used as the customer sees fit, but certainly lends a unique context around the relative availability of a given endpoint. Using this information, business and logistical decisions can be made about where to stand up data centers to improve service to customers and understand trends regarding availability.

Data Feeds

Data coming out of the DNS is important, but the ability to ingest data and use it to inform endpoint selection is part and parcel of the NS1 platform. Every part of your tech stack generates some sort of information that can be used to tune the DNS, and the data feeds which supply our Filter Chain™ feature enable this. Data from a physical or virtual machine about its performance, connectivity, and availability can be correlated to metadata fields attached to DNS records. Criteria such as high or low watermarks, up or down metrics, and more can all be used to arbitrate the endpoint choice on a query-by-query basis.

Combining the power of the data feed with the information we’ve covered so far, a comprehensive, real-time algorithm can be built in a Filter Chain™ to enable the best possible end user experience. One example of how this can be accomplished looks a little something like this:

  1. UP Filter fed by NS1’s built in monitoring to understand which endpoints aren’t responding at all and should be dropped from consideration.
  2. GEOTARGET either on a regional or more granular basis to keep queries in the relative region that they are coming from. The GEOTARGET filters use data from a carefully curated geographic database, which map queries to endpoints whose GEO IP data indicates that the endpoint is in the same relative area as the requestor’s resolver, thus shortening (BGP notwithstanding) the time to first byte. Additional granularity can be gained by keeping EDNS(0) Client Subnet enabled, which can return additional information about the client/user’s location based on the last octet of the IP address.
  3. SHED_LOAD Filter fed by machine data (CRON pumping out collectd client connection metrics from a Linux server, for example) to understand which endpoints are doing the heavy lifting, and shift traffic to other endpoints to keep the server from becoming overloaded. Use of a high watermark criteria sets the upper bounds of when the filter starts considering other options from the available endpoints.
  4. SELECT_FIRST_N is the last logical step in any Filter Chain algorithm. This allows the customer to set how many answers are returned to the resolver. As a rule, aside from A/AAAA records, only one answer should be returned.

Having a solid understanding of the context around your application delivery ecosystem, in short, can make a tremendous difference in how decisions are made around infrastructure spend, load balancing, and end user performance. All of these factors ultimately boil down to a business objective, regardless of the industry or sector your business operates in.

If you are keen to learn more about how NS1 can help move your business to the next strata of intelligence, be sure to check out some of this suggested content for further reading, or contact us to get a conversation started about the DNS That Moves™.

Suggested Content (All links open in new tabs):

Understanding TTL Values In DNS Records

What Is The Lowest TTL I Can Get Away With?

NS1 API Reference