"The What?" - I want to expand on some of the other FlexVPN related blogs shared earlier. This lab will be expanding from an earlier lab where I utilized a Dual Hub FlexVPN in a single cloud topology. For reference that post is here: Configuring & Verifying FlexVPN Redundancy with Dual Hub & Single Cloud
"The Why?" - In this post I will be focusing on FlexVPN redundancy with dual hub & dual cloud. This configuration will be a tad different from the previous FlexVPN redundancy post shared above with a single cloud topology. This setup is known as the dual cloud approach where spokes have active tunnels to both hubs at all times whereas in single cloud configurations (failover approach) an active tunnel is with one hub at a time. The dual cloud topology provides the following benefits:
Faster recovery during failure
Greater possibilities to distribute traffic among hubs, since there are active tunnels to each hub
"The How?" - Most of the configuration from the PKI for vpn authentication, IKEv2 config, and everything else for the most part remains the same. The main config deployment/changes will be introducing iBGP as the routing protocol used in the overlay, an additional DVTI on the spokes (1 for each hub and cloud), two separate IP pools on each respective hub to support each cloud, and update the secondary hub to use the new overlay range in its virtual template that will be referenced when bringing up DVTIs.
Here is the topology for this lab & post:
Notice how the same configuration is depicted from the FlexVPN single cloud lab, but with an additional cloud to emulate two separate ISP clouds. During normal operation, both spokes maintain a relationship with both hubs. Then when a failure occurs, the routing protocol switches from one hub to another (in this case iBGP; Cisco recommended).
Hub 2 Configuration Changes:
First, I will cover the configuration changes/adds needed in regard to establishing the dual cloud FlexVPN. Once that is completed I will dive into deploying iBGP & verifying it. So, I added the additional overlay range deployed/used, which is 10.0.100.0/24. The pool was modified on CSR 12 (hub2; red cloud) to utilize that range. I then modified the loopback 1 which is referenced in the virtual-template1:
interface Loopback1 ip address 10.0.100.2 255.255.255.255 ! ip local pool CIF_POOL 10.0.100.3 10.0.100.254
Now on each respective spoke I deployed an additional SVTI that utilizes the same underlay, but our secondary Flex cloud. From spoke1 (CSR13) I deployed an additional tunnel interface & modified configuration on the existing to change/specify the exact tunnel destination as follows:
interface Tunnel0 ip address negotiated ip mtu 1400 ip nhrp network-id 10 ip nhrp shortcut virtual-template 1 ip tcp adjust-mss 1360 tunnel source GigabitEthernet1.254 tunnel mode ipsec ipv4 tunnel destination 192.168.254.1 #Points to Hub1 tunnel protection ipsec profile CIF_IPSEC_PROF ! interface Tunnel1 ip address negotiated ip mtu 1400 ip nhrp network-id 10 ip nhrp shortcut virtual-template 1 ip tcp adjust-mss 1360 tunnel source GigabitEthernet1.254 tunnel mode ipsec ipv4 tunnel destination 192.168.254.12 #Points to Hub2 tunnel protection ipsec profile CIF_IPSEC_PROF end
Remember that our underlay is not really the big concern here. Technically I could have configured a more complex underlay, but for the purpose of the lab there is no need. The big thing here is putting an emphasis on the dual cloud overlay. The major config changes needed to support the dual cloud overlay are needed on the hubs (IP pool used to distribute spoke tunnel interface IP in IKEv2 Config payload & update to the loopback that will be referenced in the virtual template on hub2).
Next I deployed iBGP on spoke1 and both hubs. Note: Omitting Hub2/Spoke2 config as it is essentially the same.
Spoke1 (CSR13) iBGP Config:
router bgp 65001 bgp log-neighbor-changes neighbor 10.0.10.1 remote-as 65001 neighbor 10.0.10.1 fall-over neighbor 10.0.100.2 remote-as 65001 neighbor 10.0.100.2 fall-over maximum-paths ibgp 2
I defined each hub as a neighbor in the same AS. Specifying the fall-over command lets us enable & optimize BGP convergence. Fall-over will occur and BGP convergence will take place as soon as the route to the neighbor disappears from the spoke route table. The catch here is that iBGP will rely on timers of the underlay IGP (in this post case static routes) instead of the default BGP timers. I enabled max-paths for BGP load sharing since we have always up dual links to each respective hub. Without this BGP will only install one route to our remote emulated lan. With it our route table looks like this:
And our BGP table looks like this when load sharing is enabled:
Hub1 (CSR1) iBGP Config:
router bgp 65001 bgp log-neighbor-changes bgp listen range 10.0.10.0/24 peer-group SPOKES redistribute ospf 10 match internal external 2 route-map OSPF_ROUTES neighbor SPOKES peer-group neighbor SPOKES remote-as 65001 neighbor SPOKES fall-over neighbor SPOKES next-hop-self all neighbor 192.168.121.12 remote-as 65001 neighbor 192.168.121.12 route-reflector-client neighbor 192.168.121.12 next-hop-self all
On each hub I defined a BGP peer group so I did not have to define each neighbor with the same policies (see more on BGP Peer Groups here: BGP Peer Groups Tidbit). The group was defined in the same AS since we are using iBGP. I ended up using OSPF as the IGP between CSR11 (188.8.131.52 emulated lan), & CSR1/CSR12. In order to get the OSPF learned route into iBGP I redistributed OSPF and setup a route-map so I could filter what was redistributed into BGP. That configuration for the route-map & prefix-list is as follows:
ip prefix-list OSPF_ROUTES seq 5 permit 184.108.40.206/32 route-map OSPF_ROUTES permit 10 match ip address prefix-list OSPF_ROUTES
I then enable fall over on the group. Lastly for the defined spoke group I configured next-hop-self all which changes the next hop address when advertising routes to itself (each respective hub). This is important in this topology because without the command enabled the hubs advertise the remote lan (220.127.116.11) with the next hop unchanged, which both spokes cannot reach.
Spoke1 BGP table WITH next-hop-self all enabled on hubs:
Spoke1 BGP table WITHOUT next-hop-self all enabled on hubs:
You can see when it is not enabled the next hop is unchanged. Both spokes have no routes and are unaware of the 192 (112/111) networks therefore connectivity to the remote 18.104.22.168 is nonexistent.
The last part of the hub config is defining the other hub in the same AS, as a route reflector client, and the remaining necessary config such as next-hop-self. In this demo I don't dive into this too much since I am not advertising spoke networks. However, this is important in actual non-lab topologies as it allows the hubs to advertise learned routes to each other. My main focus in this lab is primarily to understand dual cloud failover from a spoke perspective.
Note: I also omitted OSPF config due to the goal of this lab and the simplicity of the configuration. OSPF was activated on the interfaces connecting CSR11(remote lan) to both hubs (CSR1 & CSR12). Then on the hubs I advertise the flexvpn pool networks.
Dual Cloud Failover Verification:
So now that everything is configured I begin with verifying failover & recovery. This first scenario walks through initiating traffic via thousands of pings to our remote lan (22.214.171.124) learned from our iBGP peering to the hubs all while shutting down the physical link of hub1 (CSR1) underlay interface to emulate a circuit failure to spoke1 (CSR13) to see how fast we recover. You will see that with no DPD configured with IKE that our crypto sessions remain UP to both hubs. However, physical connectivity to hub1 was terminated. Note that the crypto session will not terminate until the SA lifetime expires, which is not ideal. In this first scenario the failover is ONLY triggered when the BGP timers expire on both our spoke1 & hub1. Once this occurs the BGP table is updated, and pings resume to the remote lan via Hub2 (CSR12).
First initiated traffic from Spoke1 & then immediately terminated the underlay interface on hub1 which kills the traffic:
Now because there is no IKE DPD is enabled and no other mechanism used to speed up the failover process we wait for both BGP sessions to expire/update on hub1 & spoke1. During this time our crypto session remains up:
Note that the default BGP timers will trigger failover way before an IKE SA lifetime expires:
While we wait the BGP table on spoke1 remains the same meaning that it still has both routes to our targeted remote lan via both hubs:
Once BGP timers expire we see the following debugs both ways:
Now the BGP table on the spoke has updated:
And FINALLY traffic continues to the remote LAN through Hub2 (CSR12):
The big problem here is that there is a lot of traffic lost with the described scenario with no IKE DPD. So in the next scenario I add DPD to the spoke IKEv2 Profile config to speed up failover. The added config is as follows on the spokes/hubs:
#dpd 10 2 on-demand
Note: on-demand triggers Dead Peer Detection when IPsec traffic is sent but no reply is received from the peer.
Once that config is added I cleared crypto sessions to ensure that DPD gets enabled on the SA (to verify use: #show crypto ikev2 sa detail)
I am now ready to test the failover scenario again with DPD enabled. Again, I initiated traffic from spoke1 & immediately shutdown the hub1 underlay interface. Thanks to DPD & the bgp fall-over config we have much faster convergence:
The IKE SA is terminated thanks to DPD:
The BGP table on spoke1 depicts 1 path:
Note: you could also tweak BGP timers as an option to aide in quicker failover. In the first scenario with no DPD all default BGP timers were used. Also, if you want to see the rest of the FlexVPN config see the post via the link in "The What?" section above.
That wraps up this post where I covered dual cloud flexvpn topology failover scenarios. To summarize I walked through the necessary configs needed to be added to a flexvpn topology & stepped through testing failover with & without dead peer detection with BGP fall-over. Cheers!