A couple of years ago I configured a topology for a business partner extranet much like the one sketched below.
No dynamic routing was allowed on the firewall. Layer 9 didn't trust it to run an IGP, so the firewall was configured with static routes:
- Known internal nets (registered and 1918 space) pointed in
- Default route pointed out
Two eBGP sessions were configured to learn business partner prefixes (not shown) from the external switch, and redistribute them into the IGP. It was a small number of prefixes, and they were thoroughly filtered and quantity-limited, making things safe for the IGP.
But it didn't work correctly: Only one BGP session could be brought up at a time, but never both at once.
The cause of the error took me more hours of head-scratching than I care to admit. In my defense, the topology was actually quite a bit more complicated than depicted here. Presented here is the bare minimum required to recreate the problem.
The problem was neither a firewall policy issue, nor a typo. Any typos here are just typos.
Can you spot my mistake? Which session comes up, and what's wrong with the other one?
Remote AS number is incorrect.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteRemote AS number fixed. The problem lies elsewhere.
ReplyDeleteRunning HSRP? Which is active/passive. You can't create a BGP session with the standby router?
ReplyDeleteebgp multihop should be 3
ReplyDeleteI'm guessing it has something to do with your HSRP setup on on vlan20 on both internal routers. The default route on the firewall for internal subnets points to ~20.1, which is the HSRP virtual IP, and that IP only maps to one of the routers at a time. Since BGP is using loopbacks for src/dest IPs, the path to both internal routers will head through default route ~20.1. I bet the firewall drops the traffic to one of the routers because the incoming and outgoing traffic to it are using different interfaces (asymmetric). Removing HSRP should fix it.
ReplyDeleteRemoving HSRP isn't an option: The firewall doesn't run a routing protocol, so some FHRP is required.
ReplyDeleteThe firewall isn't dropping any packets.
Is it not the ebgp multihop either? Very curious on the answer to this.
ReplyDeleteThe problem was a combination of HSRP and eBGP multihop.
ReplyDeleteBecause of the firewall's routes pointing at the HSRP address, one of the internal routers was 3 hops away (in the inbound direction only):
A -> external: 2 hops
B -> external: 2 hops
external -> HSRP primary: 2 hops
external -> HSRP secondary: 3 hops
I fixed this by adding 32-bit routes on the firewall:
192.168.255.1 -> 192.168.20.2
192.168.255.2 -> 192.168.20.3
Routing traffic from the VLAN 20 (or 30) interface to the Lo0 interface doesn't count as a hop, so "multihop 2" is sufficient so long as we're not taking the *extra* hop across VLAN 10.
I would also have been concerned about BGP through the firewall. Certain firewalls may randomise the TCP Sequence number which breaks BGP. (e.g. Cisco ASA).
ReplyDeleteHey Greg, thanks for your comment.
ReplyDeleteThese were ScreenOS boxes. I don't know if they play games with the TCP ISN, but I didn't have any problems in that regard.
Either way, I /think/ that the ISN randomization is only an issue if MD5 authentication is configured between the neighbors.
I'm not completely sure about that, but can't see how else the BGP session would notice ISN games played by an intermediate device.
Chris,
ReplyDeleteGreat blog; I've enjoyed reading through your archives. The problem you describe here is also present when you use 'vpc peer-gateway' on the firewall-facing VLAN. Not that this has much to do with your issue here; just an FYI.
Jeremy Filliben
Hey Jeremy, thank you for the compliment.
DeleteDoes 'vpc peer-gateway' decrement TTL when bridging through the "wrong" Nexus 7K?
Maybe that's not what you meant.