IPsec TCP-MSS, DF-BIT and Fragmentation
In my previous ipsec troubleshooting post, I haven’t talked about how we approach performance issues. Which is probably not a JNCIE-SEC topic but this is a very important topic for the real networks.
In this topology I will examine how throughput changes between two end points of an IPSEC tunnel depending on the configuration of IPSEC tunnel.
Change 1) Setting DF-BIT to copy
An IPsec tunnel between J23 and J41 is established and no extra configuration is done. I initiate a huge 1.6GB file download via HTTP
root@ubuntu3-vm:~# wget http://websrv/test.iso --2013-08-23 22:25:22-- http://websrv/test.iso Resolving websrv (websrv)... 212.45.63.2 Connecting to websrv (websrv)|212.45.63.2|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1735393280 (1.6G) [application/x-iso9660-image] Saving to: `test.iso' 4% [====> ] 75,443,320 2.78M/s eta 12m 33s
It is only around 3MB/sec which seems low. To see why it is low, I took a packet capture at Internet facing interface of J41 i.e capturing packets after they are encrypted. Look at this horrible picture;
Do you see the fragmented packets? Link is totally under utilized as one packet is filled up to its MTU and the second fragment of the original packet is less than 100 bytes. Plus extra processing overhead caused by fragmentation and then assembly. So the question is why the packets are fragmented, don’t we have PMTU discovery in place. Actually packets are marked with DF bits but SRX doesn’t copy this flag by default to the outer header. Look;
root@J41-Amsterdam> show security ipsec sa index 131073 ID: 131073 Virtual-system: root, VPN Name: vpn-23 Local Gateway: 212.45.64.2, Remote Gateway: 192.168.179.2 Local Identity: ipv4_subnet(any:0,[0..7]=0.0.0.0/0) Remote Identity: ipv4_subnet(any:0,[0..7]=0.0.0.0/0) Version: IKEv1 DF-bit: clear Bind-interface: st0.0 Port: 500, Nego#: 3, Fail#: 0, Def-Del#: 0 Flag: 600a29 Tunnel Down Reason: Config Change Direction: inbound, SPI: 2cb2d314, AUX-SPI: 0 , VPN Monitoring: - Hard lifetime: Expires in 2410 seconds Lifesize Remaining: Unlimited Soft lifetime: Expires in 1770 seconds Mode: Tunnel(0 0), Type: dynamic, State: installed Protocol: ESP, Authentication: hmac-sha1-96, Encryption: 3des-cbc Anti-replay service: counter-based enabled, Replay window size: 64 Direction: outbound, SPI: 7271da9f, AUX-SPI: 0 , VPN Monitoring: - Hard lifetime: Expires in 2410 seconds Lifesize Remaining: Unlimited Soft lifetime: Expires in 1770 seconds Mode: Tunnel(0 0), Type: dynamic, State: installed Protocol: ESP, Authentication: hmac-sha1-96, Encryption: 3des-cbc Anti-replay service: counter-based enabled, Replay window size: 64
Do you see the “DF-bit: clear” in this output. Because of this if packet exceeds the tunnel MTU, instead of sending fragmentation needed ICMP feedback back to the source, packet is fragmented and sent through the tunnel. You can also take a look at KB25625 for some more details. Now what I will do is I will change this behavior and I will force SRX to send fragmentation needed responses for the websrv Linux device to reduce its packet size for this destination.
#set security ipsec vpn vpn-23 df-bit copy
Now SRX shouldn’t fragment the packets.
root@ubuntu3-vm:~# wget http://212.45.63.2/test.iso --2013-08-23 22:58:45-- http://212.45.63.2/test.iso Connecting to 212.45.63.2:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1735393280 (1.6G) [application/x-iso9660-image] Saving to: `test.iso' 4% [====> ] 114,070,890 4.00M/s eta 6m 59s
We have seen some improvement around %40 percent. This figure can change of course depending on how long you keep this download. I have seen 8-9MB/sec after waiting a bit more.
Change 2) Modifying TCP-MSS for IPsec-VPN
Another change I will do is I will force TCP-MSS to a specific value e.g 1350 bytes on J41 and J23 devices (though I don’t have to do this on J41 on this traffic pattern) by this I will have asked ubuntu3 device not to send any segment payload more than 1350bytes.
WARNING!!! Following change will flap your tunnel. At least, it flaps during my tests.
#set security flow tcp-mss ipsec-vpn mss 1350
Once this command is active, SRX will replace TCP-MSS option exchanged during three way handshake with this value so that peer device will lower its packet size to circumvent extra overhead caused by encryption and tunnel header.
root@ubuntu3-vm:~# wget http://212.45.63.2/test.iso --2013-08-23 23:28:12-- http://212.45.63.2/test.iso Connecting to 212.45.63.2:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1735393280 (1.6G) [application/x-iso9660-image] Saving to: `test.iso.2' 4% [====> ] 72,131,898 5.33M/s eta 6m 34s
Now we can see that this modification also improved the throughput compared to default config. Also in three way handshake, we can see the modified MSS (1350) of J23. As this capture is taken between J41 and ubuntu2, you see ubuntu2’s MSS is 1460 but it will be replaced by SRXJ41 before it is sent to J23.
I hope to have demonstrated why an IPsec tunnel configuration shouldn’t be left in default config but must be adjusted for the highest performance. If you have an active network having IPsec tunnels, you can tweak these settings and find the best one for you.
Hi rtoodtoo,
I have read many of your posts and find them very useful and knowledgeable ,they provide some
very good insights about TCP and SRX product .
Hope you continue sharing your knowledge and
all the very best for your JNCIE-Sec exam !
Thank you for this nice feedback Sky. I will try to do my best:)
Hi rtoodtoo,
Nicely explained with the PCAPS !
I was wondering if we can also utilize window scaling to adjust the throughput(speed)
of a connection (an IPSEC or a normal TCP data transfer)
Please share your ideas.
Sky
Hi Sky,
I am not %100 sure but window scaling won’t make much difference in this scenario. To the best of my knowledge, If we speak about Linux OS for example, you can only turn on or off scaling. You don’t really much control over. Besides, you don’t really use all the available window size anyway. If I give an example, with window scaling enabled, my linux OS advertise around 13-20K bytes window size and in a 100Mbit/s network I can see that sender can push 5-6 TCP segments before it gets an ACK. (i.e 7-8K bytes max) At least my tests show that having a larger window size with the help of scaling doesn’t make a big difference but this scaling shows its benefit in big latency networks as far as I can see which I haven’t tested in that kind of scenario.
If you have any tests or real life experience, I would love to hear of course.
Hi,
I need to reproduce a scenario where tcp-fragmentation happens, what is the easiest way to acheive this?
regards,
Madhav
Hi
tcp-mss can be used only for TCP traffic , what about icmp and udp traffic , if i change the mtu under [edit interfaces st0.0 family inet ] on both peers , this will result to change the mtu size for any traffic traverse the tunnel interface ,
by the way , mtu for st0 interface , contains only L4 and L3 hearder , I don’t understand why ipsec overhead is not part of the calculation.
Regards
Red1
Red1, unfortunately to the best of my knowledge, there isn’t any method for ICMP and UDP as protocols themselves don’t provide some built-in payload adjustment as we have MSS for TCP.
The best thing that can be done is to copy DF bit and let the source to reduce its packet size by relying on PMTU discovery.
Hello and thank you for your post!
“As this capture is taken between J41 and ubuntu3, you see ubuntu3’s MSS is 1460 but it will be replaced by SRXJ41 before it is sent to J23.”
Maybe here you meant that ubuntu2’s MSS is 1460?
Very good catch I must say. Thank you very much for the feedback indeed. I have read the post again and replaced both ubuntu3 by ubuntu2.
Genco.
Thank you for your comments. I came across the Juniper KB article you referenced today and it was nice to see some extra details on how this has improved your VPN performance. In addition to performance this was the fix for my tunnel not coming up properly after going down between two of my location. Having the df-bit set to clear in my case was causing phase2 negotiation issues. I was wanting to blame our ISP for MTU settings issues in their network but this ended up resolving the issue.
thanks for the feedback Jared.
Excellent Explanation
Setting DF-BIT to copy – Will this disconnect my IPsec tunnel? Should this be done on both SRX?
good point. I have updated the post. In my tests, I had seen that this change was actually flapping the tunnel so better to do it during a maintenance window if this behavior not changed in any newer release.
as for both SRX, fragmentation is expected from the site sending oversize payload. If both site send this type of traffic then you need on both sites.