NAT Detection
Cisco SD-WAN solution is designed to run over any kind of WAN transport that is available to the WAN edge devices including all different public networks such as Broadband, 4G/5G, LTE, Business Internet, and so on. This implies that the overlay fabric should be able to form through all flavors of Network Address Translations that these public networks utilize. In practice, any Cisco SD-WAN device may be unknowingly sitting behind one or more NAT devices. In order to discover the public IP addresses/ports allocated by NAT, Cisco SD-WAN devices use the Session Traversal Utilities for NAT (STUN) protocol defined in RFC5389.
STUN is a client-server protocol that uses a request/response transaction in which a client sends a request to a server, and the server returns a response. As the request (called STUN Binding Request) passes through a NAT, the NAT will modify the source IP address/port of the packet. Therefore, the STUN server will receive the request with the public IP address/port created by the closest NAT device. The STUN server then copies the public address into an XOR-MAPPED- ADDRESS attribute in the STUN Binding response and sends it back to the client. Going back through the NAT, the public address/port in the IP header will be un-NATted back to the private ones, but the public address copy in the body of the STUN response will remain untouched. In this way, the client can learn its IP address allocated by the outermost NAT with respect to the STUN server.
As it is shown in Figure 1, all Cisco SD-WAN devices have an embedded STUN client and the vBond orchestrator acts as a STUN Server. When the initial control communication to vBond takes place, the SD-WAN device performs the STUN operations and discovers its public IP address and port. Once determined, this information is then advertised as part of the TLOC routes to the vSmart controllers and then re-advertised to all other SD-WAN devices.
NAT Types
In a typical production SD-WAN deployment, we would probably have many remote sites connected via many different Internet connections to a centralized data center or a regional hub. In most regions in the world, Internet providers will always use some type of private-public address translation due to a shortage of public IPv4 addresses. Let's look at the NAT classifications according to the STUN protocol and how they can affect whether sites can form connections and communicate directly with each other or not.
Full-Cone NAT
A full-cone is one where all packets from the same internal IP address are mapped to the same NAT IP address. This type of address translation is also known as One-to-One.
Additionally, external hosts can send packets to the internal host, by sending packets to the mapped NAT IP address.
Restricted-Cone NAT
A Restricted-Cone network address translation is also known as Address-Restricted-Cone. It is a network translation technique where all packets from the same internal IP address are mapped to the same NAT IP address. The difference to a Full-Cone is that an external host can send packets to the internal host only if the internal host had previously sent a packet to the IP address of the external destination. It is important to note that once the NAT mapping state is created, the external destination can communicate back to the internal host on any port.
Port-Restricted-Cone NAT
A Port-Restricted-Cone is similar to the Restricted-Cone address translation, but the restriction includes also port numbers. The difference is that an external destination can send back packets to the internal host only if the internal host had previously sent a packet to this destination on this exact port number. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT).
Symmetric
Symmetric NAT is also known as Port Address Translation (PAT) and is the most restrictive of all other types. It is a network translation technique where all requests from the same internal IP address and port to a specific destination IP address and port, are mapped to a unique NAT IP address and NAT port. Furthermore, only the external destination that received a packet can send packets back to the internal host. In a typical Cisco IOS/IOS-XE or Cisco ASA configuration, this feature is known as Port Address Translation (PAT) with port-randomization.
Best-Practices
Although Cisco SD-WAN supports several types of Network Address Translations, to create a full mesh overlay fabric, at least one side of the WAN Edge tunnels is recommended to be able to initiate a connection inbound to the second WAN Edge. This means that at least one side of the tunnel is recommended to have a public IP address or to be behind a Full-Cone (1-to-1). It is also strongly recommended to configure full-cone, or one-to-one address translation at the data centers or regional hub sites so that, regardless of what NAT type is running at the remote sites (restricted-cone, port-restricted cone, or symmetric ), they can send traffic to the hubs without issues.
vEdge-1 | vEdge-2 | IPsec tunnel can form | GRE tunnel can form |
---|---|---|---|
No-NAT (Public IP) | No-NAT (Public IP) | YES | YES |
No-NAT (Public IP) | Symmetric | YES | NO |
Full Cone (One-to-one) | Full Cone (One-to-one) | YES | YES |
Full Cone (One-to-one) | Restricted-Cone | YES | NO |
Full Cone (One-to-one) | Symmetric | YES | NO |
Restricted-Cone | Restricted-Cone | YES | NO |
Symmetric | Restricted-Cone | NO | NO |
Symmetric | Symmetric | NO | NO |
Symmetric address translation configured at the transport attached to one vEdge requires a full-cone or a public IP on the other vEdge to establish a direct IPsec tunnel between them. Sites that cannot connect directly should be set up in a hub-and-spoke topology so they can reach each other through a regional hub site or data center.
IMPORTANT Note that for overlay tunnels configured to use GRE encapsulation instead of IPsec, only public IP addressing or one-to-one address translation is supported. Any type of Network Address Translation with port overloading is not supported since GRE packets lack an L4 header.
TLOC Routes
Once every WAN edge router discovers its private-public translated address and port, it advertises them to the vSmart controller via OMP using the OMP TLOC routes. The vSmart controller then re-advertises this information across the overlay fabric.
Lastly, let's see an example of two WAN edge devices connected through a Port-Restricted-Cone. As you can verify in the combination table, they are able to form an IPsec encapsulation tunnel between themselves but if we change the encapsulation type to GRE - the data plane tunnel does not come up. Let's quickly verify that.
These are both TLOCs when the enc is set to ipsec.
---------------------------------------------------
tloc entries for 60.60.60.60
lte
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.0.3
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 256
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 60.1.1.1
public-port 12346
private-ip 192.168.1.2
private-port 12346
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 60
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 2
gen-id 0x80000010
carrier default
restrict 0
groups ( 0 )
border not set
unknown-attr-len not set
---------------------------------------------------
tloc entries for 70.70.70.70
lte
ipsec
---------------------------------------------------
RECEIVED FROM:
peer 1.1.0.3
status C,I,R
loss-reason not set
lost-to-peer not set
lost-to-path-id not set
Attributes:
attribute-type installed
encap-key not set
encap-proto 0
encap-spi 256
encap-auth sha1-hmac,ah-sha1-hmac
encap-encrypt aes256
public-ip 70.1.1.1
public-port 12426
private-ip 172.16.1.2
private-port 12426
public-ip ::
public-port 0
private-ip ::
private-port 0
bfd-status up
domain-id not set
site-id 70
overlay-id not set
preference 0
tag not set
stale not set
weight 1
version 2
gen-id 0x80000014
carrier default
restrict 1
groups ( 0 )
border not set
unknown-attr-len not set
You can clearly see that the BFD session is UP which means that the tunnel is up and running and data plane traffic is able to go through back and forth. Now let's change the encapsulation type to GRE and see what will happen.
vEdge-3(config)# vpn 0
vEdge-3(config-vpn-0)# interface ge0/0
vEdge-3(config-interface-ge0/0)# tunnel-interface
vEdge-3(config-tunnel-interface)# encapsulation ?
Possible completions:
gre ipsec
vEdge-3(config-tunnel-interface)# encapsulation gre
vEdge-3(config-tunnel-interface)# commit
Commit complete.
vEdge-4(config)# vpn 0
vEdge-4(config-vpn-0)# interface ge0/0
vEdge-4(config-interface-ge0/0)# tunnel-interface
vEdge-4(config-tunnel-interface)# encapsulation gre
vEdge-4(config-tunnel-interface)# commit
Commit complete.
Now if we check the status of the BFD sessions we can clearly see that the GRE tunnel is down.
vEdge-4# show bfd sessions
SOURCE REMOTE DST PUBLIC
SYSTEM IP SITE ID STATE COLOR COLOR SOURCE IP IP ENCAP UPTIME
------------------------------------------------------------------------------------
50.50.50.50 50 up mpls mpls 10.70.1.1 10.50.1.1 ipsec 0:01:51
60.60.60.60 60 down lte lte 172.16.1.2 60.1.1.1 gre NA