Effects of packet drop and latency on IPSEC tunnels
When I was a junior engineer, I used to go to customer sites to install leased line modems and perform the initial quality checks of the lines. The most critical moment after provisioning the line was sending the first 100 ICMP packets to see if there is any packet loss or not and even if there is a single packet loss, it was a nightmare for us to find where the packet was really lost. Starting from physical layer i.e checking cablings or if it is a wireless link checking weather conditions etc 🙂 and then protocol level investigations. If we were lucky enough, it was our problem as Telco involvement wasn’t required. If it was a Telco problem unfortunately it was worse as convincing Telco that they have a problem wasn’t really an easy task during that period.
I have started with a little story but in today’s networks, packet losses aren’t so common but it happens or if you have a satellite link, latencies might go up to 1000ms even more. I will do an experimental work here as to how IPSEC tunnel establishment might be affected by packet loss/latency.
IPSEC LAB 13:
For this lab, we are going to use the above topology in which there is an established IPSEC tunnel between branchG and CO-A-1 SRX devices.
root@branchG> show security ipsec sa Total active tunnels: 1 ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway <131073 ESP:3des/sha1 5070fa96 7130/ unlim - root 500 192.168.9.2 >131073 ESP:3des/sha1 d56e13a8 7130/ unlim - root 500 192.168.9.2
First of all we test the round trip times between each of the IPSEC end points.
root@branchG> ping 192.168.9.2 rapid count 5 PING 192.168.9.2 (192.168.9.2): 56 data bytes !!!!! --- 192.168.9.2 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 9.944/12.016/15.021/2.447 ms
According to the test, we have average 12ms round trip time. Jitter is also negligible I believe.
Now I am restarting KMD and check how long it may take for the IPSEC tunnel to come up.
root@branchG> show security ipsec sa Nov 03 21:41:36 Total active tunnels: 1 ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway <131073 ESP:3des/sha1 b06e9121 7188/ unlim - root 500 192.168.9.2 >131073 ESP:3des/sha1 87b80723 7188/ unlim - root 500 192.168.9.2 root@branchG> restart ipsec-key-management Nov 03 21:41:39 IPSec Key Management daemon started, pid 1321 root@branchG> show security ipsec sa Nov 03 21:41:40 Total active tunnels: 1 ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway <131073 ESP:3des/sha1 c63e500f 7199/ unlim - root 500 192.168.9.2 >131073 ESP:3des/sha1 418754f 7199/ unlim - root 500 192.168.9.2
Almost in the blink of an eye, tunnel comes up.
Now wan emulation (netem) part comes into play in Linux, this is a wonderful feature, believe me! we are adding 1000ms of delay to every packet on our uplink of branchG towards the central office device.
root@debian1:~# tc qdisc add dev eth1.956 root netem delay 1000ms
Now we do the same round trip test to see how the response time is affected.
root@branchG> ping 192.168.9.2 count 5 PING 192.168.9.2 (192.168.9.2): 56 data bytes 64 bytes from 192.168.9.2: icmp_seq=0 ttl=62 time=1014.999 ms 64 bytes from 192.168.9.2: icmp_seq=1 ttl=62 time=1010.173 ms 64 bytes from 192.168.9.2: icmp_seq=2 ttl=62 time=1010.145 ms 64 bytes from 192.168.9.2: icmp_seq=3 ttl=62 time=1010.195 ms 64 bytes from 192.168.9.2: icmp_seq=4 ttl=62 time=1010.182 ms --- 192.168.9.2 ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev = 1010.145/1011.139/1014.999/1.930 ms
and we can see that, our link latency has increased quite dramatically. Average round trip has become 1011ms.
Let’s see how long IPSEC tunnel establishment takes.
root@branchG> set cli timestamp Nov 03 21:46:47 CLI timestamp set to: %b %d %T root@branchG> show security ipsec sa Nov 03 21:46:49 Total active tunnels: 1 ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway <131073 ESP:3des/sha1 c63e500f 6890/ unlim - root 500 192.168.9.2 >131073 ESP:3des/sha1 418754f 6890/ unlim - root 500 192.168.9.2 root@branchG> restart ipsec-key-management Nov 03 21:46:54 IPSec Key Management daemon started, pid 1353 root@branchG> show security ipsec sa Nov 03 21:46:55 Total active tunnels: 0 root@branchG> show security ipsec sa Nov 03 21:46:56 Total active tunnels: 0 root@branchG> show security ipsec sa Nov 03 21:46:58 Total active tunnels: 1 ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway <131073 ESP:3des/sha1 837b4b74 7199/ unlim - root 500 192.168.9.2 >131073 ESP:3des/sha1 aead0657 7199/ unlim - root 500 192.168.9.2
Now we have a couple of seconds delay but still reasonable I believe. We can increase the delay but don’t see the point as more than 1000ms delay shouldn’t be an acceptable latency.
Now we are deleting the delay and simulating 20% packet drop.
root@debian1:~# tc qdisc show dev eth1.956 qdisc netem 8001: root refcnt 2 limit 1000 delay 1.0s root@debian1:~# tc qdisc del dev eth1.956 root netem root@debian1:~# tc qdisc add dev eth1.956 root netem loss 20% root@debian1:~# tc qdisc show dev eth1.956 qdisc netem 8002: root refcnt 2 limit 1000 loss 20%
And now let’s ping the remote IPSEC peer
root@branchG> ping 192.168.9.2 rapid count 100 PING 192.168.9.2 (192.168.9.2): 56 data bytes !!!!!!.!!!!!!!!.!!.!.!!!.!...!!.!.!.!!!!!!!!!!!!.!!!!!.!!!.!!!.!!!!!!!!!!!!!!!!!!!!!...!.!.!!!!.!!.! --- 192.168.9.2 ping statistics --- 100 packets transmitted, 78 packets received, 22% packet loss round-trip min/avg/max/stddev = 9.822/11.811/15.227/2.263 ms
We have wonderfully lost 22% of packets as you can see from the outputs. You may get slightly different drop rates actually.
Once we have added this packet drop emulation, we check packet capture between these two end points to see the packet drops.
If you closely look at the pcap outputs taken on two interfaces i.e eth1.251 and eth1.956 which are downstream and upstream interfaces, you can see the packet drops but dropped packet is re-sent by the peer and these drops delay Quick Mode completion around 30 seconds.
If you keep increasing the packet drop rate to even 40%, SRX establishes the IPSEC tunnel. For example, if initiator doesn’t get any response packet to its first packet containing proposals, it keeps sending with 10seconds interval. With 40% rate, it can take up to 1,5-2 minutes to establish the tunnel. I have even made things worse by adding 1000ms latency but tunnel got established. However I recall that in several of my tests, 35-40% packet loss was really causing trouble but I haven’t seen any issue on my latest tests.
One last thing I need to mention is the MTU. During the Phase 1 packet exhanges, the largest packet is the first packet sent by the initiator containing proposals, DPD support etc and the total IP packet size probably won’t be larger than 350bytes hence small MTU shouldn’t become a concern here either.
In a nutshell, this experiment who knows how much perfect it is, indicates that neither packet loss nor latency isn’t a threat for IPSEC tunnel establishment including a small MTU size.
If you have any story, please do share here!
I wish you days without any packet loss 🙂
Do you have a SRX multipoint-to-multipoint (full mesh) example I can borrow: HUB to B1/B2 and B1 to B2. without B1 or B2 to reach each other.
Ray, I don’t have full mesh but would this help ?
http://rtoodtoo.net/2013/08/21/jncie-sec-multipoint-tunnelspolicy-and-route-based-vpns/
Thanks for the excellent post
You’re welcome Jeff!