In this day and age, automated tools are developed all the time to remove the human element from the grunt work of scaling infrastructure, and many other things. It often comes as a shock when we hear about services going down, and more so when we hear that human error was to blame.
“At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.” - Amazon, in response to their US-EAST-1 S3 Outage on February 28th, 2017
While it’s unfortunate what happened with Amazon in this case, it could have been easily prevented with automation. Thankfully, later on in the root cause analysis from Amazon, it was revealed that changes would be made to the core algorithm that controls provisioning and deprovisioning of services to prevent illogical values from being inserted.
Granted, this outage was not at the DNS layer, where we at NS1 are typically talking, but it has highlighted the need for stronger automation and resiliency throughout the stack. The need for a robust, globally distributed DNS solution that works in concert with your network tools has become mission critical.
Chances are, these automation tools are chosen to speed up deployments and greatly reduce the likelihood of human error on mission critical services. Regardless of where automation is leveraged in the technology stack, speed is a critical metric that must always be considered alongside precision of the changes being executed.
One advantage to having a tight integration with your managed DNS provider as part of your overall automation strategy is that changes to the DNS can be made precisely when a new resource is ready to serve.
Tip: Set Your TTLs Short
A simple lesson in economics about opportunity costs: Yes, you will incur a higher billable query rate with a DNS provider by setting your TTLs short on records that change frequently. However, consider the opportunity cost of not being able to have your changes propagate quickly. In other words, it’s likely less expensive in the long run to incur that additional query volume than it is to not have a mission critical change happen.
Automating Application Delivery
To help cut down on the time required to get this all working properly, NS1 offers an integration with Ansible that allows for DNS changes at NS1 to be plugged in as an Ansible role. The most up-to-date information on its implementation can be found in GitHub.
Whether you are using NS1 to augment or replace a front end load balancer, or just using NS1’s Managed DNS to put a friendly name on a resource as it’s deployed, the Ansible integration has the pieces you need to make the appropriate DNS changes. By design, NS1’s Managed DNS will push the changes to the anycast edge quickly, ensuring that the DNS changes are reflected and that the new service that was deployed is available.
If you’re not using Ansible, there are many other similar solutions that do the same basic thing. NS1’s API plays nicely with just about any configuration management tools that you are using, so regardless of how you are implementing continuous integration, automation, agile, DevOps, or whatever you choose to call it, we’re able to integrate.