words by Angelique Medina, Sr. Product Marketing Manager, ThousandEyes
A few weeks ago, NS1 wrote about the importance of DNS-to-user experience, and how even a single second could be the difference between high-quality engagement and customer churn. Technologically mature organizations understand this well. They know that DNS performance impacts app performance, as well as the devastating consequences when their DNS is slow, compromised, or goes down altogether.
Consider the recent Azure outage that disrupted service and led to customer data loss. The root cause was announced as a DNS outage by an external provider. Microsoft’s infrastructure was up and running, but the lack of DNS effectively cut them off from their users.
The apparent lack of auto-redundancy prolonged remediation efforts, leading customers to publicly express frustration and possibly costing hundreds of thousands of dollars in SLA penalties.
Perhaps even more alarming than a service outage is the very real prospect of your users getting compromised because your DNS records are mapping to IP addresses not under your control. The recent wave of DNS hijackings impacting both enterprises and government agencies has roiled the Internet community. FireEye and Cisco published findings on the techniques and targets of the attacks, which led to the US Department of Homeland Security to issue an emergency directive for government and defense agencies to perform a DNS audit. ICANN, the organization that oversees many core Internet tasks, recently warned that hijackers are “going after the Internet infrastructure itself.”
If you’re offering a digital experience (particularly for global users), it’s more critical now than ever to protect and ensure the performance of your DNS. Unfortunately, ensuring DNS availability, performance and security can be challenging given the vast number of things that can go wrong. Many dependencies outside of your control, such as user network connectivity, various DNS infrastructures, and multiple Internet service providers can impact DNS service — and each of these dependencies can be different depending on variables such as where your users are located. For example, packet drops or congestion in local or ISP networks can impact performance; compromised records could steer a subset of your users to infrastructure you don’t control; a BGP hijacking or route leak could improperly reroute DNS queries. Even a faulty DNS server could degrade performance, as public DNS resolver, Quad9, recently recounted.
Using a robust managed DNS provider, such as NS1, is a good step in building a scalable, performant DNS — but it doesn’t immune you from performance or security issues, particularly ones that arise from all of the external infrastructure and networks (and bad actors) that can degrade or disrupt user experience.
Key Ingredients of Modern DNS Visibility
If it’s safe to assume that things can and will go wrong with DNS, then the logical question to ask is, when they do, how will you diagnose and quickly resolve issues before they impact your users?
There are certainly standard network utilities, such as dig and traceroute, that you can use to gain visibility. The problem is that these are difficult to scale to the global coverage needed to measure performance and validate DNS integrity from everywhere your users are located. They’re also more challenging to operationalize for trend analysis, end-to-end troubleshooting, and real time integrity awareness. If you’re running a digital business, these tools are better than nothing, but they’re not going to be robust enough for business-critical service assurance.
Enterprises that take a holistic approach to assuring their digital experience delivery are embracing a modern approach to DNS visibility. Here are three dimensions that leading enterprises are including in their DNS monitoring checklist:
At a most fundamental level, modern visibility requires that you have a continuous pulse on the following aspects of sound DNS operations, so enterprise teams are looking to monitor:
- Availability: Are DNS servers responsive to requests?
- Performance: What’s the resolution time for DNS queries?
- Integrity: Are DNS records correct and free from hijacking or cache poisoning?
Given the sensitivity of DNS to your digital experience, most enterprises are monitoring these aspects of DNS operations every 2-5 minutes. Operations teams are using monitoring data to set baselines and alert on significant anomalies, and integrating these alerts with their IT service management systems such as ServiceNow, Slack and PagerDuty. Aside from real-time views, for effective troubleshooting it’s also of course essential to have a historical timeline view of changes.
Geography matters when it comes to DNS visibility because wherever your users or markets are located, you’ll want to ensure that you can maintain a proper monitoring pulse from all of those locations. This also means that it’s important to get visibility from Internet vantage points that represent where users live (Tier 2, Tier 3 and broadband ISPs) and not just public cloud provider data center locations.
It’s not enough to know that DNS is having an issue, at least not if you have responsibility for digital operations. You need to know why things are going wrong. For example, in the case of a famous DNS hijacking event that occurred in 2018, the real cause was a BGP hijack.
DNS as a service is itself obviously dependent on the proper functioning of other networks, not to mention that if you’re using a cloud-based managed DNS service, the service infrastructure and software of your DNS provider. When issues arise, you need to know if the issue is due to the managed provider, a specific transit network having localized packet loss or latency, a major Internet routing or traffic outage problem. So, underneath your basic DNS availability, performance and integrity monitoring it’s important to be able to see:
- Network paths
- End-to-end as well as hop-by-hop network performance metrics (packet loss, latency, jitter)
- ISP outage events
- BGP routing changes and problems
Ultimately, the reason why you need all these ingredients for modern DNS visibility is because we no longer live in a world where traditional “find and fix” works. Given the sheer number of externalities that every aspect of digital experience delivery relies on, including DNS, you need to have more data at your disposal on external factors like ISP networks and Internet routing, so you can assemble the “evidence” you need to escalate effectively to external providers and keep as much control over your destiny as is possible in a cloud-based ecosystem.
Is Your Enterprise Ready with Modern DNS Visibility?
Enterprise organizations that are heavily dependent on digital business are increasingly catching on to the need for modern levels of visibility into DNS. At ThousandEyes, we’ve historically seen DNS as a key focus of monitoring in highly digitized industry verticals such as banking and finance, media and entertainment and telecommunications. More recently we’ve seen an uptick in attention paid to DNS by less obvious verticals such as manufacturing. Yet IT teams in many industries still remain under-educated and under-equipped to manage DNS. Get a better understanding of DNS with our DNS ebook. If you’re ready for modern DNS visibility, request a demo and our team will help walk you through how ThousandEyes delivers the industry’s deepest DNS insights.
About the Author
Angelique has worked in technical marketing roles related to network infrastructure and network visibility for the past ten years, most recently at ThousandEyes, where she works on multi-layer visibility spanning application, DNS, L3, and BGP. Prior to joining ThousandEyes, she spent time working on data center networking at Big Switch Networks and visibility switching at VSS Monitoring. You can follow her on Twitter @bitprints.
ThousandEyes is a network intelligence solution that provides visibility into all of the dependencies (e.g. DNS, CDNS, ISPs) that enterprises and SaaS providers rely on to deliver a good digital experience.