Traceroute behaviour in MPLS

Traceroute is a great tool to discover the path a packet traverses in outgoing direction but if you have an MPLS cloud, you may have some unexpected behavior if you don’t do some tweaks. First of all let’s see how traceroute discovers a path when there isn’t any MPLS cloud.

traceroute-ttl

The network above is using IP to route packets and we are running traceroute on GW2 device towards Debian1 device.

root@GW2> traceroute no-resolve 144.2.3.2    
traceroute to 144.2.3.2 (144.2.3.2), 30 hops max, 40 byte packets
 1  212.6.1.1  19.899 ms  20.021 ms  19.920 ms
 2  144.2.3.2  19.890 ms  20.093 ms  19.964 ms

We can clearly see the two hops in our traceroute. IP addresses displayed on the output are from ingress interface of our probe packets. For this traceroute I also took a packet capture on ingress interface of GW1 i.e 212.6.1.1 side.

Junos and Linux traceroute by default use UDP to send probe packets and each hop receives 3 UDP segments.

UDP-tracerouteIf we look at the 1st UDP segment, it is coming from source port 34498 to destination port 33434 with IP TTL=1 because of which GW1 device can’t forward it and sends back to the original source i.e 212.6.1.2 an ICMP Time to live exceeded message after which traceroute displays 19.899ms which is the Round Trip Time i.e the time delta between sending the probe and receiving ICMP TTL exceeded message. Source host does this 3 times with different source and destination ports for each hop. Returned ICMP packet also contains the IP/UDP header of the incoming probe so that source device can correlate request/response.

Source host increments TTL by 1 and sends three more UDP segments with IP TTL=2 to the same destination once again. Because TTL is 2 this time, GW1 device forwards it to Debian1 device.

traceroute-ttl-2 As you can see TTL=2 with the same source/destination IP however this time response is a bit different. Instead of ICMP TTL exceeded message, we receive “Destination port unreachable” message. It is because packet has arrived at the ultimate destination and must be processed by transport layer too. Because there isn’t any socket at 144.2.3.2:33437, destination device returns this error. Source host receives this message and detects probe has arrived the ultimate destination.

Now, everything so far was in a pure IP network. Now we will see what happens if we have a MPLS backbone.

mpls-cloud-traceroute-testOn this setup, we will run traceroute from the source device HostE(60.60.60.2) towards Debian1(144.2.3.2). This flat hexagon:) network is a single AS and all routers are MPLS aware. Packets are entering this cloud via J35 following J34->J30->J29 and exits but J34 and J30 devices are BGP-free i.e they only route packets by MPLS labels. Now let’s run the traceroute.

root@hostE:~# traceroute -n 144.2.3.2
traceroute to 144.2.3.2 (144.2.3.2), 30 hops max, 60 byte packets
 1  60.60.60.1  2.395 ms  2.417 ms  2.410 ms
 2  31.1.1.1  7.200 ms  7.211 ms  7.201 ms
 3  * * *      <<< This is J34
 4  * * *      <<< This is J30
 5  192.168.196.1  46.993 ms  46.999 ms  46.991 ms
 6  87.1.1.1  46.982 ms  49.451 ms  49.449 ms
 7  144.2.3.2  64.412 ms  59.686 ms  59.675 ms

There is something wrong. We don't get response from two P routers. Let's check how J34 routes traffic back to 60.60.60.2

root@ff34> show route 60.60.60.2 

root@ff34>

Yes, we don't have any route. Device can't really send the response back the source device. The same applies for J30 as well but there is a workaround called ICMP tunneling. Once we enable this on both devices, we should receive response.

root@ff34# set protocols mpls icmp-tunneling 
root@ff30# set protocols mpls icmp-tunneling

Now we will try traceroute once again

root@hostE:~# traceroute -e -n 144.2.3.2
traceroute to 144.2.3.2 (144.2.3.2), 30 hops max, 60 byte packets
 1  60.60.60.1  6.408 ms  6.361 ms  6.377 ms
 2  31.1.1.1  11.371 ms  11.358 ms  11.336 ms
 3  172.40.1.1 ;  56.383 ms  56.378 ms  56.364 ms
 4  192.168.198.1 ;  56.330 ms  56.317 ms  56.295 ms
 5  192.168.196.1  56.275 ms  56.252 ms  56.230 ms
 6  87.1.1.1  56.180 ms  68.306 ms  68.283 ms
 7  144.2.3.2  68.258 ms  63.125 ms  63.091 ms

Heyy, it works!  but how?

Note: This time I run traceroute with "-e" option to see MPLS label from this host as the label is carried in an ICMP Multi-Part extension e.g
mpls-icmp-stackUpps, let's go back to the main question. How does ICMP tunneling work?
To see how it works, I am running traceroute once again and take packet capture between J34 and J30. The reason is coming shortly.

root@ff34> show mpls lsp transit 
Transit LSP: 2 sessions
To              From            State   Rt Style Labelin Labelout LSPname 
10.1.1.2        10.1.1.7        Up       0  1 FF  299856   299856 J35-J29
10.1.1.7        10.1.1.2        Up       0  1 FF  299792        3 J29-J35
Total 2 displayed, Up 2, Down 0

Above output indicates that if J34 device wants to route traffic back to J35 (i.e where IP packet comes from) it should chose LSP J29-J35 i.e it should remove MPLS label and send it back but this isn't what happens here. Packet capture indicates;

mpls-icmp-tunneling-returnthat we label it with 299856 and send it in the reverse direction i.e towards the end of the LSP. This means, once ICMP tunneling is enabled, MPLS device J34 sends the ICMP TTL exceeded message in MPLS frame through the LSP because of which packet first travels till the end of LSP to J29. After that J29 device routes it back to the source as it has the IP routing table.This is pretty interesting, packet traverses the same device two times:) I hope ICMP tunneling is clear now but we aren't done yet. There is one more thing I would like to show which is no-decrement-ttl feature which I also found quite intriguing. First enable no-decrement-ttl on Ingress router

[edit]                                  
root@ff35# set protocols mpls no-decrement-ttl

then run traceroute once more.

root@hostE:~# traceroute -e -n 144.2.3.2 -q 1
traceroute to 144.2.3.2 (144.2.3.2), 30 hops max, 60 byte packets
 1  60.60.60.1  2.364 ms
 2  31.1.1.1  6.909 ms
 3  192.168.196.1  31.868 ms
 4  87.1.1.1  31.855 ms
 5  144.2.3.2  31.847 ms

Hmm, pure MPLS routers J34 and J30 disappeared from the list as if they don't exist. Isn't it cool?

About: rtoodtoo

Worked for more than 10 years as a Network/Support Engineer and also interested in Python, Linux, Security and SD-WAN // JNCIE-SEC #223 / RHCE / PCNSE


5 thoughts on “Traceroute behaviour in MPLS”

  1. Very cool. I’ve come across this troubleshooting when a traceroute finished but had missing hops and had come to the same conclusion. Nice to see you can either hide it or make it show up properly. Found your blog while researching firefly clustering. Great job!

    1. Thanks for the feedback. MPLS is fun to work with actually and sometimes with little surprises:)

  2. hi,
    thanks for the post.
    I came here while searching,
    how mpls aware routers reply the trace-route.
    cause we are using pipe mode in our network and inside vpn customers can never see the mpls routers ip.
    here is my question,
    why in this topology in you example, the interface(192.168.196.1) between P and PE router return reply to the trace-route?
    in which implementation of mpls this behavior happens?

    1. Hi Erdem,
      I am not an MPLS expert but activating no-decrement-ttl puts the network in pipe model I believe. As for your question why we see the
      MPLS router’s IP, this is always what I have seen so far. I can see Ingress and Egress router’s IP and bear in mind this is a lab setup
      only. So what you are saying is assuming that this setup is like yours you don’t see the ingress/egress routers (31.1.1.1 and 192.168.196.1) in the traceroute?
      I need to check this in my lab in my next study as I am interested too.

      1. hi “rtoodtoo” ,
        Im not sure if your lab setup like our topology or not.
        let me briefly explain our topology:
        it consist of Alcatel 7750 SR routers.
        alcatel directly use pipe mode.
        there is no chance to implement uniform-mode on alcatel SR.
        we are distributing l3-vpn routes via mpbgp-v4.
        so from customer point of view they never see PE to P and P to P interface ips, system-loopback ips etc.
        they can only see the ips that their CE routers VRF gateways on PE router.

        trace from customer interface connected on PE-A to customer interface connected on PE-B:

        ################traceroute router 65494 172.25.22.1
        traceroute to 172.25.22.1, 30 hops max, 40 byte packets
        1 172.25.22.1 (172.25.22.1) 20.1 ms 20.0 ms 20.0 ms

        there is only one hop that is PE-B to customer interface.

        I hope, I can explain the case.
        thanks for the reply.

Leave a Reply to erdemCancel reply

Discover more from RtoDto.net

Subscribe now to keep reading and get access to the full archive.

Continue reading