How to Improve Network Resilience with NS1 Filter Chains Skip to main content Skip to search
Ben Ball
Posted by
Ben Ball on
April 6, 2022
Tech Innovation

How to Improve Network Resilience with NS1 Filter Chains

Sign Up for Our Newsletter

Network service outages will happen. It’s not a matter of if. It’s a matter of when. Cloud platforms and content delivery networks (CDNs) with 100% uptime SLAs aren’t immune. They experience outages just like everything else.

The question is: what will you do when one of your network services goes down? Will the lack of redundant services knock you offline? Or will you failover to another provider, maintaining a seamless user experience? On the back-end, how will that failover process work — will it be automated or manual?

Most mid-size and large organizations have redundant systems in place to help them survive an outage. What they may or may not have in place is the automated mechanism that redirects traffic to those redundant systems when a core service goes down.

NS1’s Filter Chain™ technology uses the power of DNS to automatically reroute traffic between service providers in the event of a network service disruption. With a few basic rules in place, NS1 will monitor your network’s status and switch endpoints on the fly. You set the rules and the priorities upfront; everything after that happens automatically.

On the NS1 platform, Filter Chain configurations are applied to individual records within DNS zones. Filter Chains determine how NS1 will handle queries against each record — specifically, which answer(s) to return. Each filter chain uses a unique logic to process queries. You can create combinations of filters to achieve a specific outcome based on your operational or business needs.

Of course, not everyone wants to direct failover traffic in the same way. So we’ve put together a quick guide on how to build active-active, active-passive, and manual failover systems using Filter Chains.

Active-active failover

In this use case, NS1 or third-party data sources monitor the status of individual endpoints in your application delivery infrastructure. When the data indicates an outage on one system, NS1 automatically routes traffic to the secondary system(s) you choose. It’s called “active-active” because those secondary systems are probably up and running as part of your load balancing system anyway. In the event of an outage in one system, NS1 would just rebalance load toward the already active systems.

The first filter in the chain is “Up”. This will tell the system whether the service provider’s endpoint is currently operational or not.

The second filter in the chain is either “Shuffle” or “Weighted Shuffle”. If the “Up” filter returns a “false” answer for any endpoint, it will automatically distribute traffic to other providers. Shuffle will distribute traffic randomly, while Weighted Shuffle will distribute it based on weights you provide.

Finally, you’ll want to specify how many answers you want DNS to provide to inbound queries. RFC 1912 requires that only one answer should be returned for every CNAME query. The “Select First N” filter allows you to specify the number of answers returned to the requesting client, but the default should be one.

Active-passive failover

As in the active-active use case, NS1 or third-party data sources monitor the status of your application delivery infrastructure and route traffic to secondary systems in the event of a primary system outage. The difference here is that the secondary systems may not be handling traffic already - they’re only spun up when needed as a redundant option.

If you want to learn more, take a look at our Grubhub case study on active-passive resilience.

As in the previous example, the first filter in this chain is “Up”. Drawing from monitoring data, NS1 will figure out which of the underlying services are online.

The second filter in this chain is “Priority”. This filter creates a logic that prioritizes active systems over passive or backup systems. If the higher priority answers are available, they will sort to the top of the possible answer list. If not, NS1 will continue down the priority list until it finds an available resource.

Finally, “Select First N” will dictate the number of answers to deliver. The answer you’d want it to deliver in this case is one.

Manual failover

Sometimes you want to make failover decisions only after you know more about the situation. In these cases, the Filter Chain is the implementation mechanism that you use once you’ve made a determination about where you want traffic to go. Instead of pointing a data feed to NS1, you’ll manually turn the filter on when it’s needed, using the active-passive logic.

The first filter in this chain is “Up”, with the difference here that you would manually define which services are up and down (instead of a data feed doing that for you).

The second filter in this chain is “Priority”, starting with active systems over passive or backup systems. If the higher priority answers are available, they will sort to the top of the possible answer list. If not, NS1 will continue down the priority list until it finds an available resource.

Finally, “Select First N” will dictate the number of answers to deliver. The answer you’d want it to deliver in this case is one.

Multi-cloud or multi-CDN availability

In the “active-active” scenario above, the Filter Chain uses a simple up/down metric to steer traffic. Yet sometimes service availability is more nuanced. For example, services sometimes experience regional outages that result in lower service quality - while the service as a whole is technically “up”, it may not be performing at optimal capacity. This Filter Chain lets you add some nuance to what is considered “up”, using Pulsar, NS1’s advanced analytics tool, as the data source.

The first filter in this chain is “Pulsar Availability Threshold”. This filter allows you to set a percentage value that will determine the usage of a service based on availability metrics.

The second filter in the chain is “Weighted Shuffle”, which distributes traffic to other providers which meet the definition of “available” from the first filter. Traffic is distributed based on weights you provide.

The third filter is “Pulsar Performance Sort”, which takes the weighted distribution from the previous filter and directs traffic to the fastest available service, eliminating low-performing services based on a threshold you define.

Finally, “Select First N” will dictate the number of answers to deliver. The answer you’d want it to deliver in this case is one.

For more information on how to use Filter Chains to improve performance and resilience, lower costs, and more, check out:


Further Reading

Request a Demo

Contact Us

Looking for help? Please email [email protected]

Get Pricing

Learn More About our Partner Program