Effects of packet drop and latency on IPSEC tunnels

When I was a junior engineer, I used to go to customer sites to install leased line modems and perform the initial quality checks of the lines. The most critical moment after provisioning the line was sending the first 100 ICMP packets to see if there is any packet loss or not and even if there is a single packet loss, it was a nightmare for us to find where the packet was really lost. Starting from physical layer i.e checking cablings or if it is a wireless link checking weather conditions etc 🙂 and then protocol level investigations. If we were lucky enough, it was our problem as Telco involvement wasn’t required. If it was a Telco problem unfortunately it was worse as convincing Telco that they have a problem wasn’t really an easy task during that period.
I have started with a little story but in today’s networks, packet losses aren’t so common but it happens or if you have a satellite link, latencies might go up to 1000ms even more. I will do an experimental work here as to how IPSEC tunnel establishment might be affected by packet loss/latency.

IPSEC LAB 13:
IPSEC_13_effects_of_packet_drop_jitter_latency

For this lab, we are going to use the above topology in which there is an established IPSEC tunnel between branchG and CO-A-1 SRX devices.

root@branchG> show security ipsec sa    
  Total active tunnels: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <131073 ESP:3des/sha1 5070fa96 7130/ unlim   -   root 500   192.168.9.2     
  >131073 ESP:3des/sha1 d56e13a8 7130/ unlim   -   root 500   192.168.9.2     

First of all we test the round trip times between each of the IPSEC end points.

root@branchG> ping 192.168.9.2 rapid count 5 
PING 192.168.9.2 (192.168.9.2): 56 data bytes
!!!!!
--- 192.168.9.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 9.944/12.016/15.021/2.447 ms

According to the test, we have average 12ms round trip time. Jitter is also negligible I believe.

Now I am restarting KMD and check how long it may take for the IPSEC tunnel to come up.

root@branchG> show security ipsec sa 
Nov 03 21:41:36
  Total active tunnels: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <131073 ESP:3des/sha1 b06e9121 7188/ unlim   -   root 500   192.168.9.2     
  >131073 ESP:3des/sha1 87b80723 7188/ unlim   -   root 500   192.168.9.2     

root@branchG> restart ipsec-key-management 
Nov 03 21:41:39
IPSec Key Management daemon started, pid 1321

root@branchG> show security ipsec sa          
Nov 03 21:41:40
  Total active tunnels: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <131073 ESP:3des/sha1 c63e500f 7199/ unlim   -   root 500   192.168.9.2     
  >131073 ESP:3des/sha1 418754f  7199/ unlim   -   root 500   192.168.9.2     

Almost in the blink of an eye, tunnel comes up.

Now wan emulation (netem) part comes into play in Linux, this is a wonderful feature, believe me! we are adding 1000ms of delay to every packet on our uplink of branchG towards the central office device.

root@debian1:~# tc qdisc add dev eth1.956 root netem delay 1000ms

Now we do the same round trip test to see how the response time is affected.

root@branchG> ping 192.168.9.2 count 5          
PING 192.168.9.2 (192.168.9.2): 56 data bytes
64 bytes from 192.168.9.2: icmp_seq=0 ttl=62 time=1014.999 ms
64 bytes from 192.168.9.2: icmp_seq=1 ttl=62 time=1010.173 ms
64 bytes from 192.168.9.2: icmp_seq=2 ttl=62 time=1010.145 ms
64 bytes from 192.168.9.2: icmp_seq=3 ttl=62 time=1010.195 ms
64 bytes from 192.168.9.2: icmp_seq=4 ttl=62 time=1010.182 ms

--- 192.168.9.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1010.145/1011.139/1014.999/1.930 ms

and we can see that, our link latency has increased quite dramatically. Average round trip has become 1011ms.

Let’s see how long IPSEC tunnel establishment takes.

root@branchG> set cli timestamp 
Nov 03 21:46:47
CLI timestamp set to: %b %d %T

root@branchG> show security ipsec sa 
Nov 03 21:46:49
  Total active tunnels: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <131073 ESP:3des/sha1 c63e500f 6890/ unlim   -   root 500   192.168.9.2     
  >131073 ESP:3des/sha1 418754f  6890/ unlim   -   root 500   192.168.9.2     

root@branchG> restart ipsec-key-management 
Nov 03 21:46:54
IPSec Key Management daemon started, pid 1353

root@branchG> show security ipsec sa          
Nov 03 21:46:55
  Total active tunnels: 0

root@branchG> show security ipsec sa    
Nov 03 21:46:56
  Total active tunnels: 0

root@branchG> show security ipsec sa    
Nov 03 21:46:58
  Total active tunnels: 1
  ID    Algorithm       SPI      Life:sec/kb  Mon lsys Port  Gateway   
  <131073 ESP:3des/sha1 837b4b74 7199/ unlim   -   root 500   192.168.9.2     
  >131073 ESP:3des/sha1 aead0657 7199/ unlim   -   root 500   192.168.9.2     

Now we have a couple of seconds delay but still reasonable I believe. We can increase the delay but don’t see the point as more than 1000ms delay shouldn’t be an acceptable latency.

Now we are deleting the delay and simulating 20% packet drop.

root@debian1:~# tc qdisc show dev eth1.956
qdisc netem 8001: root refcnt 2 limit 1000 delay 1.0s

root@debian1:~# tc qdisc del dev eth1.956 root netem
root@debian1:~# tc qdisc add dev eth1.956 root netem loss 20%
root@debian1:~# tc qdisc show dev eth1.956
qdisc netem 8002: root refcnt 2 limit 1000 loss 20%

And now let’s ping the remote IPSEC peer

root@branchG> ping 192.168.9.2 rapid count 100    
PING 192.168.9.2 (192.168.9.2): 56 data bytes
!!!!!!.!!!!!!!!.!!.!.!!!.!...!!.!.!.!!!!!!!!!!!!.!!!!!.!!!.!!!.!!!!!!!!!!!!!!!!!!!!!...!.!.!!!!.!!.!
--- 192.168.9.2 ping statistics ---
100 packets transmitted, 78 packets received, 22% packet loss
round-trip min/avg/max/stddev = 9.822/11.811/15.227/2.263 ms

We have wonderfully lost 22% of packets as you can see from the outputs. You may get slightly different drop rates actually.

Once we have added this packet drop emulation, we check packet capture between these two end points to see the packet drops.

ipsec_packet_drop_pcap

If you closely look at the pcap outputs taken on two interfaces i.e eth1.251 and eth1.956 which are downstream and upstream interfaces, you can see the packet drops but dropped packet is re-sent by the peer and these drops delay Quick Mode completion around 30 seconds.

If you keep increasing the packet drop rate to even 40%, SRX establishes the IPSEC tunnel. For example, if initiator doesn’t get any response packet to its first packet containing proposals, it keeps sending with 10seconds interval. With 40% rate, it can take up to 1,5-2 minutes to establish the tunnel. I have even made things worse by adding 1000ms latency but tunnel got established. However I recall that in several of my tests, 35-40% packet loss was really causing trouble but I haven’t seen any issue on my latest tests.
One last thing I need to mention is the MTU. During the Phase 1 packet exhanges, the largest packet is the first packet sent by the initiator containing proposals, DPD support etc and the total IP packet size probably won’t be larger than 350bytes hence small MTU shouldn’t become a concern here either.

In a nutshell, this experiment who knows how much perfect it is, indicates that neither packet loss nor latency isn’t a threat for IPSEC tunnel establishment including a small MTU size.

If you have any story, please do share here!

I wish you days without any packet loss 🙂

About: rtoodtoo

Worked for more than 10 years as a Network/Support Engineer and also interested in Python, Linux, Security and SD-WAN // JNCIE-SEC #223 / RHCE / PCNSE


4 thoughts on “Effects of packet drop and latency on IPSEC tunnels”

  1. Do you have a SRX multipoint-to-multipoint (full mesh) example I can borrow: HUB to B1/B2 and B1 to B2. without B1 or B2 to reach each other.

You have a feedback?

Discover more from RtoDto.net

Subscribe now to keep reading and get access to the full archive.

Continue reading