February 26, 2024 By Ben Ball 4 min read

Network service outages happen. It’s not a matter of if but when. Cloud platforms and content delivery networks (CDNs) with 100% uptime SLAs aren’t immune. They experience outages just like everything else.

The question is: what do you do when one of your network services goes down? Will the lack of redundant services knock you offline? Or will you failover to another provider, maintaining a seamless user experience? On the back-end, how will that failover process work? Will it be automated or manual?

Most midsize and large organizations have redundant systems in place to help them survive an outage. What they might or might not have in place is the automated mechanism that redirects traffic to those redundant systems when a core service goes down.

IBM NS1 Connect Filter Chain™ technology uses the power of DNS to automatically reroute traffic between service providers when there is a network service disruption. With a few basic rules in place, NS1 Connect monitors your network’s status and switches endpoints as needed. You set the rules and the priorities upfront; everything after that happens automatically.

On the NS1 platform, filter chain configurations are applied to individual records within DNS zones. Filter chains determine how NS1 handles queries against each record—specifically, which answers to return. Each filter chain uses a unique logic to process queries. You can create combinations of filters to achieve a specific outcome based on your operational or business needs.

Of course, not everyone wants to direct failover traffic in the same way. So, we’ve put together a quick guide on how to build active-active, active-passive and manual failover systems by using filter chains.

Active-active failover

In this use case, NS1 or third-party data sources monitor the status of individual endpoints in your application delivery infrastructure. When the data indicates an outage on one system, NS1 automatically routes traffic to the secondary systems you choose. It’s called “active-active” because those secondary systems are probably up and running as part of your load balancing system anyway. When there is an outage in one system, NS1 just rebalances the load toward the already active systems.

The first filter in the chain is “Up”. This filter tells the system whether the service provider’s endpoint is operational or not.

The second filter in the chain is either “Shuffle” or “Weighted Shuffle”. If the “Up” filter returns a “false” answer for any endpoint, it automatically distributes traffic to other providers. Shuffle distributes traffic randomly, while Weighted Shuffle distributes it based on weights you provide.

Finally, specify how many answers you want DNS to provide to inbound queries. RFC 1912 requires that only one answer should be returned for every CNAME query. The “Select First N” filter allows you to specify the number of answers that are returned to the requesting client, but the default must be one.

Active-passive failover

As in the active-active use case, NS1 or third-party data sources monitor the status of your application delivery infrastructure and route traffic to secondary systems in the event of a primary system outage. The difference here is that the secondary systems may not be handling traffic already—they’re only spun up when needed as a redundant option.

As in the previous example, the first filter in this chain is “Up”. Drawing from monitoring data, NS1 figures out which of the underlying services are online.

The second filter in this chain is “Priority”. This filter creates a logic that prioritizes active systems over passive or backup systems. If the higher priority answers are available, they will sort to the first position on the possible answer list. If not, NS1 continues down the priority list until it finds an available resource.

Finally, “Select First N” dictates the number of answers to deliver. The answer you’d want it to deliver in this case is one.

Manual failover

Sometimes you want to make failover decisions only after you know more about the situation. In these cases, the filter chain is the implementation mechanism that you use once you’ve determined where you want traffic to go. Instead of pointing a data feed to NS1, you’ll manually turn the filter on when it’s needed by using the active-passive logic.

The first filter in this chain is “Up”, with the difference here that you manually define which services are up and down (instead of a data feed doing that for you).

The second filter in this chain is “Priority”, starting with active systems over passive or backup systems. If the higher priority answers are available, they sort to the first position on the possible answer list. If not, NS1 continues down the priority list until it finds an available resource.

Finally, “Select First N” dictates the number of answers to deliver. The answer you’d want it to deliver in this case is one.

Multi-cloud or multi-CDN availability

In the “active-active” scenario above, the filter chain uses a simple up/down metric to steer traffic. However, sometimes service availability is more nuanced. For example, services sometimes experience regional outages that result in poor service quality—while the service as a whole is technically “up”, it may not be performing at optimal capacity. This filter chain lets you add some nuance to what is considered “up”, using NS1 Connect’s advanced analytics tool as the data source.

The first filter in this chain is “Pulsar Availability Threshold”. This filter allows you to set a percentage value that will determine the usage of a service based on availability metrics.

The second filter in the chain is “Weighted Shuffle”, which distributes traffic to other providers that meet the definition of “available” from the first filter. Traffic is distributed based on weights that you provide.

The third filter is “Pulsar Performance Sort”, which takes the weighted distribution from the previous filter and directs traffic to the fastest available service, eliminating low-performing services based on a threshold you define.

Finally, “Select First N” will dictate the number of answers to deliver. The answer you’d want it to deliver in this case is one.

For more information on how to use filter chains to improve performance and resilience, decrease costs and more, explore more below.

Guard against outages with resilient, redundant network services
Was this article helpful?
YesNo

More from Automation

Understanding glue records and Dedicated DNS

3 min read - Domain name system (DNS) resolution is an iterative process where a recursive resolver attempts to look up a domain name using a hierarchical resolution chain. First, the recursive resolver queries the root (.), which provides the nameservers for the top-level domain(TLD), e.g.com. Next, it queries the TLD nameservers, which provide the domain’s authoritative nameservers. Finally, the recursive resolver  queries those authoritative nameservers.   In many cases, we see domains delegated to nameservers inside their own domain, for instance, “example.com.” is delegated…

Using dig +trace to understand DNS resolution from start to finish

2 min read - The dig command is a powerful tool for troubleshooting queries and responses received from the Domain Name Service (DNS). It is installed by default on many operating systems, including Linux® and Mac OS X. It can be installed on Microsoft Windows as part of Cygwin.  One of the many things dig can do is to perform recursive DNS resolution and display all of the steps that it took in your terminal. This is extremely useful for understanding not only how the DNS…

The future of application delivery starts with modernization

5 min read - IDC estimates that 750 million cloud native will be built by 2025. Where and how these applications are deployed will impact time to market and value realization. The reality is that application landscapes are complex, and they challenge enterprises to maintain and modernize existing infrastructure, while delivering new cloud-native features. Three in four executives reported disparate systems in their organizations and that a lack of skills, resources and common operational practices challenge business objectives. Executives know they must modernize. In…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters