Configuring & Verifying Dual Cloud & Dual Hub FlexVPN Failover Scenarios

"The What?" - I want to expand on some of the other FlexVPN related blogs shared earlier. This lab will be expanding from an earlier lab where I utilized a Dual Hub FlexVPN in a single cloud topology. For reference that post is here: Configuring & Verifying FlexVPN Redundancy with Dual Hub & Single Cloud

"The Why?" - In this post I will be focusing on FlexVPN redundancy with dual hub & dual cloud. This configuration will be a tad different from the previous FlexVPN redundancy post shared above with a single cloud topology. This setup is known as the dual cloud approach where spokes have active tunnels to both hubs at all times whereas in single cloud configurations (failover approach) an active tunnel is with one hub at a time. The dual cloud topology provides the following benefits:

  • Faster recovery during failure

  • Greater possibilities to distribute traffic among hubs, since there are active tunnels to each hub

"The How?" - Most of the configuration from the PKI for vpn authentication, IKEv2 config, and everything else for the most part remains the same. The main config deployment/changes will be introducing iBGP as the routing protocol used in the overlay, an additional DVTI on the spokes (1 for each hub and cloud), two separate IP pools on each respective hub to support each cloud, and update the secondary hub to use the new overlay range in its virtual template that will be referenced when bringing up DVTIs.

Here is the topology for this lab & post:

Notice how the same configuration is depicted from the FlexVPN single cloud lab, but with an additional cloud to emulate two separate ISP clouds. During normal operation, both spokes maintain a relationship with both hubs. Then when a failure occurs, the routing protocol switches from one hub to another (in this case iBGP; Cisco recommended).

Hub 2 Configuration Changes:

First, I will cover the configuration changes/adds needed in regard to establishing the dual cloud FlexVPN. Once that is completed I will dive into deploying iBGP & verifying it. So, I added the additional overlay range deployed/used, which is The pool was modified on CSR 12 (hub2; red cloud) to utilize that range. I then modified the loopback 1 which is referenced in the virtual-template1:

interface Loopback1
 ip address
ip local pool CIF_POOL

Now on each respective spoke I deployed an additional SVTI that utilizes the same underlay, but our secondary Flex cloud. From spoke1 (CSR13) I deployed an additional tunnel interface & modified configuration on the existing to change/specify the exact tunnel destination as follows:

interface Tunnel0
 ip address negotiated
 ip mtu 1400
 ip nhrp network-id 10
 ip nhrp shortcut virtual-template 1
 ip tcp adjust-mss 1360
 tunnel source GigabitEthernet1.254
 tunnel mode ipsec ipv4
 tunnel destination   #Points to Hub1 
 tunnel protection ipsec profile CIF_IPSEC_PROF
interface Tunnel1
 ip address negotiated
 ip mtu 1400
 ip nhrp network-id 10
 ip nhrp shortcut virtual-template 1
 ip tcp adjust-mss 1360
 tunnel source GigabitEthernet1.254
 tunnel mode ipsec ipv4
 tunnel destination   #Points to Hub2
 tunnel protection ipsec profile CIF_IPSEC_PROF

Remember that our underlay is not really the big concern here. Technically I could have configured a more complex underlay, but for the purpose of the lab there is no need. The big thing here is putting an emphasis on the dual cloud overlay. The major config changes needed to support the dual cloud overlay are needed on the hubs (IP pool used to distribute spoke tunnel interface IP in IKEv2 Config payload & update to the loopback that will be referenced in the virtual template on hub2).

Next I deployed iBGP on spoke1 and both hubs. Note: Omitting Hub2/Spoke2 config as it is essentially the same.

Spoke1 (CSR13) iBGP Config:

router bgp 65001
 bgp log-neighbor-changes
 neighbor remote-as 65001
 neighbor fall-over
 neighbor remote-as 65001
 neighbor fall-over
 maximum-paths ibgp 2

I defined each hub as a neighbor in the same AS. Specifying the fall-over command lets us enable & optimize BGP convergence. Fall-over will occur and BGP convergence will take place as soon as the route to the neighbor disappears from the spoke route table. The catch here is that iBGP will rely on timers of the underlay IGP (in this post case static routes) instead of the default BGP timers. I enabled max-paths for BGP load sharing since we have always up dual links to each respective hub. Without this BGP will only install one route to our remote emulated lan. With it our route table looks like this:

And our BGP table looks like this when load sharing is enabled:

Hub1 (CSR1) iBGP Config:

router bgp 65001
 bgp log-neighbor-changes
 bgp listen range peer-group SPOKES
 redistribute ospf 10 match internal external 2 route-map OSPF_ROUTES
 neighbor SPOKES peer-group
 neighbor SPOKES remote-as 65001
 neighbor SPOKES fall-over
 neighbor SPOKES next-hop-self all
 neighbor remote-as 65001
 neighbor route-reflector-client
 neighbor next-hop-self all

On each hub I defined a BGP peer group so I did not have to define each neighbor with the same policies (see more on BGP Peer Groups here: BGP Peer Groups Tidbit). The group was defined in the same AS since we are using iBGP. I ended up using OSPF as the IGP between CSR11 ( emulated lan), & CSR1/CSR12. In order to get the OSPF learned route into iBGP I redistributed OSPF and setup a route-map so I could filter what was redistributed into BGP. That configuration for the route-map & prefix-list is as follows:

ip prefix-list OSPF_ROUTES seq 5 permit
route-map OSPF_ROUTES permit 10 
 match ip address prefix-list OSPF_ROUTES

I then enable fall over on the group. Lastly for the defined spoke group I configured next-hop-self all which changes the next hop address when advertising routes to itself (each respective hub). This is important in this topology because without the command enabled the hubs advertise the remote lan ( with the next hop unchanged, which both spokes cannot reach.

Spoke1 BGP table WITH next-hop-self all enabled on hubs:

Spoke1 BGP table WITHOUT next-hop-self all enabled on hubs:

You can see when it is not enabled the next hop is unchanged. Both spokes have no routes and are unaware of the 192 (112/111) networks therefore connectivity to the remote is nonexistent.

The last part of the hub config is defining the other hub in the same AS, as a route reflector client, and the remaining necessary config such as next-hop-self. In this demo I don't dive into this too much since I am not advertising spoke networks. However, this is important in actual non-lab topologies as it allows the hubs to advertise learned routes to each other. My main focus in this lab is primarily to understand dual cloud failover from a spoke perspective.

Note: I also omitted OSPF config due to the goal of this lab and the simplicity of the configuration. OSPF was activated on the interfaces connecting CSR11(remote lan) to both hubs (CSR1 & CSR12). Then on the hubs I advertise the flexvpn pool networks.

Dual Cloud Failover Verification:

So now that everything is configured I begin with verifying failover & recovery. This first scenario walks through initiating traffic via thousands of pings to our remote lan ( learned from our iBGP peering to the hubs all while shutting down the physical link of hub1 (CSR1) underlay interface to emulate a circuit failure to spoke1 (CSR13) to see how fast we recover. You will see that with no DPD configured with IKE that our crypto sessions remain UP to both hubs. However, physical connectivity to hub1 was terminated. Note that the crypto session will not terminate until the SA lifetime expires, which is not ideal. In this first scenario the failover is ONLY triggered when the BGP timers expire on both our spoke1 & hub1. Once this occurs the BGP table is updated, and pings resume to the remote lan via Hub2 (CSR12).

First initiated traffic from Spoke1 & then immediately terminated the underlay interface on hub1 which kills the traffic:

Now because there is no IKE DPD is enabled and no other mechanism used to speed up the failover process we wait for both BGP sessions to expire/update on hub1 & spoke1. During this time our crypto session remains up:

Note that the default BGP timers will trigger failover way before an IKE SA lifetime expires:

While we wait the BGP table on spoke1 remains the same meaning that it still has both routes to our targeted remote lan via both hubs:

Once BGP timers expire we see the following debugs both ways:

Now the BGP table on the spoke has updated:

And FINALLY traffic continues to the remote LAN through Hub2 (CSR12):

The big problem here is that there is a lot of traffic lost with the described scenario with no IKE DPD. So in the next scenario I add DPD to the spoke IKEv2 Profile config to speed up failover. The added config is as follows on the spokes/hubs:

#dpd 10 2 on-demand

Note: on-demand triggers Dead Peer Detection when IPsec traffic is sent but no reply is received from the peer.

Once that config is added I cleared crypto sessions to ensure that DPD gets enabled on the SA (to verify use: #show crypto ikev2 sa detail)

I am now ready to test the failover scenario again with DPD enabled. Again, I initiated traffic from spoke1 & immediately shutdown the hub1 underlay interface. Thanks to DPD & the bgp fall-over config we have much faster convergence:

The IKE SA is terminated thanks to DPD:

The BGP table on spoke1 depicts 1 path:

Note: you could also tweak BGP timers as an option to aide in quicker failover. In the first scenario with no DPD all default BGP timers were used. Also, if you want to see the rest of the FlexVPN config see the post via the link in "The What?" section above.

That wraps up this post where I covered dual cloud flexvpn topology failover scenarios. To summarize I walked through the necessary configs needed to be added to a flexvpn topology & stepped through testing failover with & without dead peer detection with BGP fall-over. Cheers!


Recent Posts

See All

In the post I want to cover understanding IKEv1 status messages & debugging IKEv1 main mode. It is important as each message has its own unique meaning, and these messages are typically seen when att

"The What?" - In this post I will cover configuring & verifying a basic site-to-site VPN tunnel between two Cisco ASAs using IKEv1 with pre-shared keys (PSK). These types of VPNs are also known as L2