IPsec TCP-MSS, DF-BIT and Fragmentation

In my previous ipsec troubleshooting post, I haven’t talked about how we approach performance issues. Which is probably not a JNCIE-SEC topic but this is a very important topic for the real networks.

tcp-mss-df-bit-topology2

In this topology I will examine how throughput changes between two end points of an IPSEC tunnel depending on the configuration of IPSEC tunnel.

Change 1) Setting DF-BIT to copy

An IPsec tunnel between J23 and J41 is established and no extra configuration is done. I initiate a huge 1.6GB file download via HTTP

It is only around 3MB/sec which seems low. To see why it is low, I took a packet capture at Internet facing interface of J41 i.e capturing packets after they are encrypted. Look at this horrible picture;

ipsec_fragmented

Do you see the fragmented packets? Link is totally under utilized as one packet is filled up to its MTU and the second fragment of the original packet is less than 100 bytes. Plus extra processing overhead caused by fragmentation and then assembly. So the question is why the packets are fragmented, don’t we have PMTU discovery in place. Actually packets are marked with DF bits but SRX doesn’t copy this flag by default to the outer header. Look;

Do you see the “DF-bit: clear” in this output. Because of this if packet exceeds the tunnel MTU, instead of sending fragmentation needed ICMP feedback back to the source, packet is fragmented and sent through the tunnel. You can also take a look at KB25625 for some more details. Now what I will do is I will change this behavior and I will force SRX to send fragmentation needed responses for the websrv Linux device to reduce its packet size for this destination.

Now SRX shouldn’t fragment the packets.

We have seen some improvement around %40 percent. This figure can change of course depending on how long you keep this download. I have seen 8-9MB/sec after waiting a bit more.

Change 2) Modifying TCP-MSS for IPsec-VPN

Another change I will do is I will force TCP-MSS to a specific value e.g 1350 bytes on J41 and J23 devices (though I don’t have to do this on J41 on this traffic pattern) by this I will have asked ubuntu3 device not to send any segment payload more than 1350bytes.

Once this command is active, SRX will replace TCP-MSS option exchanged during three way handshake with this value so that peer device will lower its packet size to circumvent extra overhead caused by encryption and tunnel header.

THREE-WAY HANDSHAKE
tcp-mss-ipsec

Now we can see that this modification also improved the throughput compared to default config. Also in three way handshake, we can see the modified MSS (1350) of J23. As this capture is taken between J41 and ubuntu2, you see ubuntu2’s MSS is 1460 but it will be replaced by SRXJ41 before it is sent to J23.

I hope to have demonstrated why an IPsec tunnel configuration shouldn’t be left in default config but must be adjusted for the highest performance. If you have an active network having IPsec tunnels, you can tweak these settings and find the best one for you.

12 thoughts on “IPsec TCP-MSS, DF-BIT and Fragmentation

  1. Sky

    Hi rtoodtoo,

    I have read many of your posts and find them very useful and knowledgeable ,they provide some
    very good insights about TCP and SRX product .

    Hope you continue sharing your knowledge 🙂 and
    all the very best for your JNCIE-Sec exam !

    Reply
  2. Sky

    Hi rtoodtoo,

    Nicely explained with the PCAPS !

    I was wondering if we can also utilize window scaling to adjust the throughput(speed)
    of a connection (an IPSEC or a normal TCP data transfer)

    Please share your ideas.

    Sky

    Reply
    1. rtoodtoo Post author

      Hi Sky,
      I am not %100 sure but window scaling won’t make much difference in this scenario. To the best of my knowledge, If we speak about Linux OS for example, you can only turn on or off scaling. You don’t really much control over. Besides, you don’t really use all the available window size anyway. If I give an example, with window scaling enabled, my linux OS advertise around 13-20K bytes window size and in a 100Mbit/s network I can see that sender can push 5-6 TCP segments before it gets an ACK. (i.e 7-8K bytes max) At least my tests show that having a larger window size with the help of scaling doesn’t make a big difference but this scaling shows its benefit in big latency networks as far as I can see which I haven’t tested in that kind of scenario.
      If you have any tests or real life experience, I would love to hear of course.

      Reply
  3. Madhav

    Hi,
    I need to reproduce a scenario where tcp-fragmentation happens, what is the easiest way to acheive this?

    regards,
    Madhav

    Reply
  4. red1

    Hi

    tcp-mss can be used only for TCP traffic , what about icmp and udp traffic , if i change the mtu under [edit interfaces st0.0 family inet ] on both peers , this will result to change the mtu size for any traffic traverse the tunnel interface ,

    by the way , mtu for st0 interface , contains only L4 and L3 hearder , I don’t understand why ipsec overhead is not part of the calculation.

    Regards
    Red1

    Reply
    1. rtoodtoo Post author

      Red1, unfortunately to the best of my knowledge, there isn’t any method for ICMP and UDP as protocols themselves don’t provide some built-in payload adjustment as we have MSS for TCP.
      The best thing that can be done is to copy DF bit and let the source to reduce its packet size by relying on PMTU discovery.

      Reply
  5. xxx

    Hello and thank you for your post!

    “As this capture is taken between J41 and ubuntu3, you see ubuntu3’s MSS is 1460 but it will be replaced by SRXJ41 before it is sent to J23.”

    Maybe here you meant that ubuntu2’s MSS is 1460?

    Reply
    1. rtoodtoo Post author

      Very good catch I must say. Thank you very much for the feedback indeed. I have read the post again and replaced both ubuntu3 by ubuntu2.

      Genco.

      Reply
  6. Jared

    Thank you for your comments. I came across the Juniper KB article you referenced today and it was nice to see some extra details on how this has improved your VPN performance. In addition to performance this was the fix for my tunnel not coming up properly after going down between two of my location. Having the df-bit set to clear in my case was causing phase2 negotiation issues. I was wanting to blame our ISP for MTU settings issues in their network but this ended up resolving the issue.

    Reply

You have a feedback?