Delivering an optimal application experience has never been an easy task from a network perspective. Traditional networks have been designed to move packets without caring too much about the applications.
As a network engineer myself, I have managed many Traditional WAN environments and know how little visibility and control over the app flows there is. The task has got a lot harder with the introduction of the Internet-native and Cloud-native apps. In traditional WAN deployments that are operated in a box-to-box fashion, typically there is a 3, 5, or 8 queues QoS policy applied on the egress transport interfaces and that's about it. Is it enough though? The answer is obviously not.
The Business Need
Let's imagine that you are a network engineer in a big enterprise that has a traditional WAN architecture. One day the company starts streaming a real-time event from a remote branch to the data center and out to the Internet. From the company's perspective, this video stream is a business-critical application and must work as optimally as possible. Just think for a few seconds about each of these real-world problems.
- How do you make sure that if the quality of a circuit drops and packet loss appears, the stream is immediately sent over another circuit?
- What if there is congestion on one of the WAN links. How do you make sure that low-priority traffic is automatically moved to lower bandwidth links?
- What if the stream viewers are mostly on the Internet. Maybe it is a better solution to just steer the traffic directly to the Internet from the remote branch. How do you do that with traditional WAN?
These are just a few examples of hard problems that cannot be easily solved at scale using the legacy box-to-box operational model. In today's enterprises where everything is getting digital, there are at least several business-critical applications. Ensuring optimal experience for these applications is a multidimensional problem that cannot be solved with a single network tool such as QoS.
Cisco SD-WAN solution has a set of capabilities that can improve the overall Application Quality of Experience (AppQoE) of business-critical applications. The package of tools includes some well-known network protocols working in conjunction with some innovative technologies to create the following collection of AppQoE features:
- Bidirectional Forwarding Detection (BFD)
- Quality of Service (QoS)
- Forward Error Correction (FEC)
- Packet Duplication
- Fragmentation Avoidance
- Software-Defined Application Visibility and Control (SD-AVC)
- Application-aware routing (AAR)
- TCP Flow Optimization
- Cloud onRamp for SaaS (We will have another lesson dedicated to this feature, so it is just briefly mentioned here)
You may have dealt with some of these features before. In this lesson, we are going to try to briefly go through most of them.
Quality of Service
Cisco SD-WAN solution creates a transport-independent overlay fabric, leveraging tunneling techniques such as GRE and IPsec to encapsulate and encrypt traffic before it is sent over all available circuits on the WAN edge routers. By default, this traffic encapsulation "hides" the original packet inside a new one, and therefore the end-to-end QoS marking is lost. Cisco WAN edge routers have the ability to copy the DSCP value of the original IP packet into the outer IP header. There is also the ability to rewire the DSCP value to match a specific class of service of the service provider's circuit. This gives the ability to map specific applications into the correct QoS classes on the SP side.
Avoiding packet fragmentation
As we all know, the IP protocol was designed for use on a variety of links. In the wide-area networks especially we can see that this is true. We have edge routers connected to the WAN via many different access technologies such as SDH, DSL, Ethernet, LTE, satellite links, etc. Each one of these links may enforce a different maximum transmission unit (MTU) value. Overlay features such as tunneling and IPsec can further lower the MTU. The IP protocol accommodates these differences by allowing devices to break larger packets into a number of pieces that can be reassembled later on. This process is called IP packet fragmentation and leads to inefficiencies in the packet flow by adding unnecessary latency and jitter. MTU issues in traditional WAN are well-known to network engineers.
Applications have the option to explicitly prohibit fragmentation by setting a DF (do-not-fragment) flag in the IP header. But this relies on the success of another process called path MTU discovery (PMTU) that is used to discover the smallest end-to-end MTU value along the traffic path. If this process fails and the DF-bit is set, the application flow would not be able to traverse the network and reach its destination. Cisco SD-WAN proactively discovers the path MTU across the overlay fabric and participate in the hosts' PMTU process by notifying them of the available MTU as shown in Figure 3.
Circuits Quality
One of the main advantages of SD-WAN is that it can use any available transport at any location in an active-active fashion. This typically means that all Internet links are utilized for application traffic. However, we all know that Internet circuits do not have guaranteed quality, and packet loss may occur at any given time. Cisco SD-WAN provides the following features that protect business-critical apps from packet loss and allows them to work reliably over the Internet.
Forward Error Correction (FEC)
The Forwarding Error Correction (FEC) feature allows critical apps to work well over unreliable WAN links usually Internet circuits. The mechanism behind it is borrowed from RAID arrays logic. For example, in a RAID4 array, if one disk fails it can be replaced with a new one and the information can be reconstructed based on the metadata stored in the parity disk. The FEC follows the same logic, for each group of four packets, one "parity packet" is inserted. At the receiver end, if one of the four packets is lost, it can be reconstructed based on the parity metadata. It is basically a trade-off between CPU cycles and circuit reliability. The process is visualized in figure 4.
In summary, the FEC capability protects applications from incurring packet loss on the transient network path. The feature has the following characteristics:
- Per tunnel - It is enabled on a per tunnel basis. This gives the flexibility to be enabled only on unreliable WAN links.
- Dynamically invoked - FEC can be turned on permanently or it can be dynamically invoked if the SD-WAN fabric detects a certain amount of packet loss.
- Application traffic only - The feature can only be used for application traffic and not for control plane flows such as BFD.
- Only one packet out of four can be reconstructed - It cannot remedy high packet loss.
Packet Duplication
Packet duplication is another SD-WAN capability that is used to increase application reliability. When turned on, the sending WAN edge router can transmit the same traffic flow across multiple WAN links ultimately sending at least two copies of each packet. At the receiving side, the vEdge device can compensate for lost packets by using these multiple copies of the same flow and discard the unnecessary duplicates.
In summary, the Packet Duplication capability protects against packet loss for critical applications such as Voice at the expense of increased bandwidth consumption. The feature has the following characteristics:
- Protocol agnostic - It works for any transport protocol TCP or UDP.
- Works only over multiple tunnels.
- Duplicates are discarded on the receiver.
Software-Defined Application Visibility and Control (SD-AVC)
Software-Defined Application Visibility and Control (SD-AVC) is a service that uses the capabilities of Cisco WAN Edge devices to identify, aggregate, and communicate application data in order to make decisions like prioritizing app traffic using QoS, group applications based on business relevance, or choose different network paths based on real-time SLA statistics.
Cisco SD-WAN devices have a Deep Packet Inspection engine integrated that can go up to Layer 7 of the OSI model and recognize thousands of applications. This is absolutely necessary in order to be able to apply policies against a particular app or service.
Some engineers may argue that we have always had DPI engines and app recognition features like Cisco NBAR. However, the key point here is that updating large volumes of new application signatures across the device fleet is not feasible at all using the legacy box-to-box configuration model. You have to operate the network As-a-System to be able to do that at scale, and that is what SD-WAN allows us to do.
Application-aware routing (AAR)
Application-aware routing is a feature that dynamically chooses the optimal path for a business-critical application based on a pre-defined SLA policy. These policies can be defined in two major ways:
- A specific path is configured to be taken while the path meets the SLA. For example, an MPLS circuit is configured as primary for VoIP traffic.
- Any path that is compliant with the SLA can be used. For example, if the Internet circuit meets the latency, jitter, and packet loss requirements, it can be used for Voice traffic as well.
Let's look at the example shown in figure 7. Application X has a pre-defined SLA policy - latency <= 200ms, packet loss below 3%, and jitter below 15ms. At the moment only paths 2 and 3 are meeting this SLA though. Therefore, only these paths can be used for this app.
TCP Optimization
The Cisco SD-WAN TCP Optimization feature terminates TCP connections locally at the WAN edge routers and uses TCP Selective Acknowledgment (SACK) in order to better control the TCP-Window-Size and maximize the throughput through the WAN links. Every network engineer has seen a bandwidth consumption graph of a TCP flow. It typically has the following pattern - steep increase, sharp fall down with 50%, steep increase again and 50% fall down again, and so on. The goal of this optimization tool is to normalize this graph by dynamically controlling the Window Size. However, this must be used with caution because it breaks some fundamental network principles such as the end-to-end transport layer connectivity.
Summary
The Application Quality of Experience (AppQoE) is a multidimensional problem that cannot be solved by a single tool. The Cisco SD-WAN solution has introduced a set of features and capabilities that can improve the overall application experience and optimize the network reliability for business-critical apps. Let me try to make a short summary of all AppQoE tools in the following table.
SD-WAN Feature | Description |
---|---|
Bidirectional Forwarding Detection (BFD) | BFD is a well-known network protocol used to detect faults between two WAN edge devices connected by an IPsec tunnel and to measure the tunnel characteristics. |
Quality of Service (QoS) | QoS is a well-known network tool that is used for the classification and marking of application traffic. |
Forward Error Correction (FEC) | The FEC capability protects applications from incurring packet loss when traversing unreliable WAN links. It works by inserting one parity packet in every group of 4 and then using the metadata in this parity packet to reconstruct any lost one. |
Packet Duplication | The Packet Duplication capability protects against packet loss for critical applications such as Voice at the expense of increased bandwidth consumption by sending two copies of each packet via two different WAN links. |
Fragmentation Avoidance | The SD-WAN overlay fabric detects the MTU value on all tunnels and helps end hosts successfully identify what MTU value to use. |
Software-Defined Application Visibility and Control (SD-AVC) | SD-AVC is a service that uses the DPI engine of Cisco WAN Edge devices to identify, aggregate, and communicate application data in order to make decisions like prioritizing app traffic using QoS, group applications based on business relevance, or choose different network paths based on real-time SLA statistics. |
Application-aware routing (AAR) | AAR dynamically chooses the optimal path for a business-critical application based on a pre-defined SLA policy |
TCP Flow Optimization | This a feature that terminates TCP sessions at the local WAN edge devices and aggregates them into one optimized TCP session. The goal is to better utilize the available WAN bandwidth by controlling the TCP windows size. |