Working on a product with the goal to simplify intelligent and automated routing across complex, multihomed infrastructure environments, means you get to connect with customers who run impressive traffic management and DevOps teams. These teams have strong connections to their business goals and work to solve complex business problems with infrastructure design.
Being exposed to the cutting edge of traffic management teams reinforces the notion that, while no two companies’ goals are exactly alike, the need to fine-tune their infrastructure and operations to support optimal performance is critical to their success. This common driver has highlighted a lot of similarities in the journey traffic management teams undertake to evolve and optimize their traffic management infrastructure.
In this article, we'll touch on the paths some of our most sophisticated customers’ traffic teams have taken to optimize the availability and performance of their applications to provide the best global user experience.
- Expanding from one to multiple CDNs
- Deploying multi and hybrid-cloud strategies
- Incorporating self-operated edge infrastructure into existing application and CDN deployments
Engaging with a third-party edge delivery network is a common way for traffic teams to move their content closer to their global user base, with minimal effort. CDNs provide the benefits of a well thought out and supported delivery network for those that don’t have the scope to build their own.
Build global redundancy by introducing additional CDNs
The need to add redundancy at every level of an application makes CDN a logical place to add a second vendor, as putting your entire site behind one CDN creates a single point of failure. Adding an additional CDN provider, AKA going multi-CDN, often means a second -- and sometimes a third or fourth -- CDN provider. Teams introduce additional providers because even the best CDNs experience latency, service disruptions, or regional outages that can impact their customer commitments and service SLAs. Introducing competition is also a great lever in procurement negotiations.
In its most basic form, a multi-CDN setup ensures that traffic can be served from any of the active CDNs. If one CDN network experiences an outage, all of their traffic can be manually switched over to the unaffected CDN.
These teams usually find that they can hit their CDN commits by simply weighting their traffic to achieve the required traffic spread across the CDNs.
Augment content delivery with regional CDNs or targeted data centers
Teams that don’t require global redundancy might still find that their CDN doesn’t have the latency or throughput performance profile that they need in certain geographical regions, to deliver their application in a consistent and reliable manner. By identifying key strategic areas to business expansion goals or poorly performing regions, traffic teams are able to augment their CDN with the introduction of a cloud or data center point of presence (POP) or a regional CDN with a strong profile in their targeted region. Either of these options provides redundancy and regional alternatives for target markets.
These supplemented parts of the world are traditionally geo-fenced to route between different sets of CDNs (i.e. global CDN and the regional CDN or POP), and resources are selected based on cost or capacity.
Optimize for intelligent traffic routing across CDNs
Building in resiliency with additional CDNs (global or regional) is a great step towards optimizing infrastructure. What a lot of teams have come to realize, however, is that a multi-CDN approach on its own is still effectively choosing resources at random. They find that users could still be directed towards outages and that they’re not really benefiting from the performance strengths of each CDN.
In order to achieve the best performance possible from their multi-CDN setup, the most advanced teams seek out automated optimal decision-making capabilities. When their traffic is directed by an intelligent routing engine that is able to ingest real-time performance telemetry (for availability, latency or throughput) as being experienced by their users, they feel relief knowing that each user will be sent to the best-performing CDN for them, based on current internet conditions. At this point, users are experiencing better performance and IT teams are more comfortable knowing that their users will never be diverted towards outages.
Some teams optimize for both performance and business logic. They’re able to automate complex decision making such as preferring the cheaper CDN until the performance telemetry finds that it is not performing at the required threshold, at which point a user will be automatically routed to the better performing CDN.
Cloud and Data Centre Journeys
A lot of teams opt to build their own cloud footprint to service their application traffic. By analyzing where their users are located and how the application is performing for these users, they are able to self-service the provisioning and de-provisioning of POP’s in certain locations to maintain effective coverage and resiliency. Price variation for different regions is very common and can play a role in the number and location of POPs provisioned.
Introduce additional cloud providers or support hybrid-Cloud
Many traffic teams find that any one cloud provider’s footprint may not service their distributed user base sufficiently, so they introduce an additional provider to service users from certain regions. This also satisfies their desire to not be locked in with any one specific provider. Load balancing over multiple cloud providers, however, can be difficult.
When the primary strategy is on premises, it’s common to have some cloud regions on standby for scaling or burstability when the data centers don’t have the capacity to support additional load. This is a good strategy for disaster recovery.
Build an in-house edge application delivery network
Using cloud providers is a great way to build up infrastructure to support a growing user base quickly with a known entity. Those that continue to scale, however, sometimes find that continuing to add more POPs and cloud regions hits a point of diminishing returns and eventually the costs become too prohibitive.
The teams that pursue building their own edge network with multiple points of presence close to their users, generally support a lot of user traffic and have use cases that are a little more unique than the standard. By building their own edge network, they’re able to move their application logic and code to the edge for faster delivery of dynamic application content.
Some teams are so well advanced in the collection of metrics and data about their users’ performance that they’re able to establish peering relationships and take other measures to minimize transport costs for the most beneficial regions.
Building your own, in-house application delivery network or CDN requires time and resources; but organizations that do this effectively notice substantial cost savings by bringing this functionality in-house.
Optimize traffic routing across cloud POPs or data centers
Whether built in-house or via a cloud provider, cloud and data center infrastructures based off geographic points of presence have traditionally lent themselves naturally to geo-routing, sending users to the closest POP.
Geo-routing is a good start for rudimentary approximations of best performance, however, it’s widely understood now that the closest POP will not always be the best performing at that point in time. It’s not uncommon for cloud providers to experience regional issues that might result in a whole region experiencing latency or shared nodes going completely down. Using geo-routing without factoring in performance can essentially send a lot of users on their way to a poor experience.
This is why advanced teams are now leveraging real-time user measurements of availability, latency or throughput to automate optimal decision-making; sending users to the most performant POP for them at that moment in time. This ensures that the footprint they’ve worked so hard to build is actually being used at its peak potential. Advanced configurations also factor in load and cost, ensuring the right balance is struck for achieving performance SLAs and not under or over-utilizing resources.
The performance measurements collected from their own users also gives them a strong understanding of how well their infrastructure configuration is delivering content. This visibility into actual user performance allows for better capacity planning over the longer term.
What comes next?
Recently, even more sophisticated trends have started to emerge resulting in customized and unique traffic optimization decisions being made.
- Traffic teams are collaborating more closely with data science teams, experimenting with routing by unique metrics that are meaningful to their business.
- Businesses that have a very strong understanding of performance vs revenue have been able to use monetization metrics to determine how to route traffic in real-time, resulting in some impressive revenue gains.
- Our most sophisticated customers are building engines that incorporate real user telemetry, raw network data, and business rules to create granular routing maps at the IP level to express the optimal answer for a given user. We are delivering tech to our customers to help them manage these exciting new use cases.
It’s an exciting time to be in traffic management and we look forward to continuing to work with the most advanced teams out there, to make the best use of the complex and well thought out infrastructure they’ve put in place.
Pulsar is the only RUM steering solution that is integrated with authoritative DNS. What does that mean to you? Simple. Boost application performance and reliability without redirecting queries to a separately-managed third-party platform. It’s all natively integrated into NS1’s Managed DNS.