Software defined networking (SDN) has become a prolific buzzword and a topic of heated debate throughout the networking community at large, especially with the advent of dockerized systems. However, as a DevOps engineer who has implemented multiple SDNs, I have found that there are really three questions left largely unanswered. What exactly is an SDN? Secondly, how does it function? Finally, do I even need one? While it may seem that these three questions are simple, they carry with them far reaching implications for the future of any platform or system they are integrated with.
What is an SDN?
At a high level, an SDN is a network configuration management tool that is specifically designed to implement a fully interconnected network among your various services and hosts. When using an SDN, each and every one of the services and/or hosts will be directly linked with all other services and hosts without the use of a middleman. This design decision is the key that makes an SDN stand apart from standard VPNs. Standard VPNs generally rely on various middlemen to connect two or more fully segregated networks together in a point to point fashion.
SDNs in general are nothing new; they have been around since the advent of virtual networking. SDNs rely on the same technology that you already have in your infrastructures, but they add a layer of automatic configuration management on top of those technologies.
The fact that SDNs use the same technology you currently use makes them inherently good at crossing over many types of hardware and hosting platforms. This is arguably the best part about using an SDN. You can connect multiple different cloud providers, private collocations, data centers, and everything in between, in any combination, and, generally, it will just work.
How Does an SDN Function?
The first thing to keep in mind when thinking about how SDNs function, is that the process runs on each host participating in the network. SDNs can be configured in one of two different modes of operation, depending on the choice of technology and its configuration. The first mode is in kernel networking, which relies entirely on in-kernel networking technologies like vxlan or clever use of routing tables. The second mode is userland networking, which relies on using a TUN/TAP-like device to facilitate the network communication through a userland based process. Each of these modes of operation comes with its own set of pros and cons that should be carefully considered before picking one over the other. One thing that needs to be called out, neither mode of operation is particularly performant or efficient when compared to fully hardware backed networking. That's not to say they cannot support the majority of use cases, but if you need 10 Gbit networking among all of your hosts you may not want to use an SDN. However, if that is your use case, you are unlikely to be looking at using an SDN at all, and probably already have the specialized hardware and teams required to support that kind of network.
Kernel mode networking has some major advantages over the userland networking modes, specifically when talking about performance and efficiency. Since it relies entirely on in-kernel networking, there very few extra context switches for each packet traversing your network, and it uses hardened C code to manipulate and handle those packets. The tradeoff, though, is that there is more reliance on the backend network and, specifically, the kernels being used. For instance, if you need network encryption, your kernel will likely require the ability to handle and work with IPSEC.
Userland mode networking has the advantage and disadvantage of processing each individual packet. The advantage here comes from being able to manipulate those packets in many more ways that are not available in the kernel. The disadvantage is the large number of context switches that each packet will go through to successfully traverse the network.
The real key to an SDN is the way they store and manage configuration. Most SDNs use CoreOS’s Etcd or Hashicorp’s Consul, but a few technologies have proprietary stores built in to facilitate the same function. This key/value store is how an SDN manages to add, remove, and update networking configuration on the fly and then propagate those changes to the rest of the network, in near real time.
Why Would You Use an SDN?
Using an SDN offers three tangible benefits that make them increasingly attractive to enterprises of all sizes: reducing costs, reducing complexity, and accelerating your network and development operation teams.
An SDN reduces costs, simply by serving the same function as networking hardware. Most SDN technologies are fully open source, so they don’t incur any kind of licensing costs and allow for easy manipulation if you find there is a feature that is missing or doesn’t quite work the way you want it to.
The complexity savings come from the automatic configuration and propagation of network settings across the entirety of the network it is managing, removing the need to reconfigure switches and routers as new hosts or services are added, updated, and removed. They are also far easier to keep up to date and hotfix than their hardware counterparts. I am sure everyone here has had to manage a switch or router upgrade, and then witnessed the woes of what can happen during those upgrades. An SDN is just a software upgrade and can usually be done in place and rolled back at the first sight of a problem.
Both the reductions in costs and complexity allow your network and development operations teams move faster. Since an SDN will automatically configure the networking for the new server that was just created, your network teams will no longer need to worry about making sure that all the tedious configurations needed are in place. Your development operations teams can directly integrate the key/value stores with their larger configuration management pipelines. This allows them to manage the more important parts of designing and building new architectures to help solve real problems, instead of worrying about which IP address is assigned to which server.
The State of SDNs
The current state of SDNs is a fractured one of marketing and buzzwords that need to be cut through to get to the real heart of the technology.
The landscape is currently dominated by three technologies that I am sure all of you have heard of, flanneld, weave, and calico. Each one has its own set of functionality and its own set of pros and cons. However, they all suffer from similar shortcomings when it comes to specific network topologies.
There a few specific triumphs for all of the current SDN technologies that exist and they come down to two things. They allow for fully automatic configuration. This, in turn, allows for teams to move quickly. SDNs are currently being used by smaller companies to allow their teams to focus on the most important issues, giving them an edge over slower moving large enterprises.
The biggest shortcomings surround the fact that the current technologies are designed specifically to be run in a LAN based environment. This fact has pretty far reaching issues, such as little support for handling hybrid IPv6 and IPv4 setups, and enforcing, in my opinion, the anti-pattern of giving every container a unique IP address.
Because of the shortcomings of the current technologies I decided to build out a new SDN called Quantum. It’s based in Golang, and specifically designed to handle some of the key challenges that I have faced when using the other technologies on the market.
I wanted to specifically overcome a few key challenges when building Quantum. First, and foremost, I focused on WAN based topologies, and specifically hybrid WAN topologies where IPv6 and IPv4 are exceedingly important. The second challenge was to ensure the hosts came first and not containers; for instance, NS1 uses no docker networking at all and relies purely on host networking. Lastly, I tried to tackle the issues of having more than just networking technologies as options. I built in a middleware layer to ensure that if things like compression, or some other type of mangling, was required it is easily possible and the integration time would be simple.
The most apparent successes of Quantum revolve around its ability to gracefully handle network partitions, which happen regularly and without any warning. This is due, in part, to how Quantum interacts with the backend datastore, but also how it manages packets. As a side effect of how it handles packets, the ability to easily plugin new frontend and backend network interfaces is paramount and follows a simple and easy to understand interface. Quantum also has the ability to fully support public IPv6 and IPv4 addressing, and any combination thereof. This is mainly due to the configuration aspects of Quantum and its ability to automatically determine which address to use. Lastly, Quantum was designed to run on the host and not in a container. It is a single static binary that has the ability to easily perform a rolling restart without dropping packets. The rolling restart will not only pick up new configurations, but also fully supports binary changes as well.
The main tradeoffs that were made throughout the development of Quantum to date have been due to development time and personnel constraints. These include the fact that a tough choice was made early on to support development speed over raw performance and a second choice to use hard coded plugins vs. allowing for injectable plugins. There is also a dependency on the CoreOS Etcd key/value store which, again, was chosen due to the time it would take to properly handle multiple key/value stores and their inherent differences.
The tradeoffs also come from differing capabilities of operating systems, and one key success of focusing on hosts first vs. containers. The differences among operating system options have meant that Quantum currently only supports the Linux operating system, and it also means in the future that it will be difficult to implement other operating systems. As for focusing on hosts first, there is currently no ability to use docker network plugins with Quantum. This is something that could be done in the future, but is not a priority right now.
The future of Quantum is about overcoming some of the tradeoffs made previously. Most importantly, focusing on improving the raw performance of Quantum and ensuring that Quantum can be run on more than just the Linux operating system. Other operating systems targeted for Quantum's future include: BSD, Darwin, and Solaris. Solving these issues are crucial for wider adoption and maximum flexibility, and will be the focus of Quantum development for the rest of its life span.
Interested in learning more? Check out Quantum on GitHub!