Category Archives: clustering

Hostname config in Chassis cluster

This is a small post to inform followers of this blog about a common mistake done in SRX cluster configuration. This is something I really need to write. A cluster has two various configuration stanza

  • Node specific
  • Global

If you want to set something which is specific to only one node e.g node0 you configure it under groups level.

This command above means that “set the hostname only on node0 i.e not on both nodes” Here comes the mistake:

When you have a SRX Chassis cluster and you set the following host-name command under global config;

then this overrides your node specific configuration. This not only causes the same hostname SRX-FW to be displayed on the cli prompt of both devices but also syslog messages written to each box take the same name. Why is this so important?

It is important because it hinders the troubleshooting dramatically. So the bottom line is, if you have a chassis cluster, no host-name should be configured under global config. At least in my humble opinion:)

Firefly Perimeter Cluster (VSRX) Setup on Vmware ESX

As you might know Firefly Perimeter aka VSRX which is the virtual firewall running on Vmware ESX and KVM can be downloaded as evaluation at here I believe it is great to test most functionality for example an SRX cluster (High Availability) which is the topic of this post. Here I will show

  • How you can install two firefly instances and operate them in a cluster.
  • How you can setup redundancy groups

Installation of Firefly instances
First download your evaluation copy. You must have a Juniper Networks account to download one. At the time of this writing, current version is 12.1X46-D10.2. Once you downloaded the OVA file to your disk, deploy it into your ESX server via File->Deploy OVF Template.

2014-02-16 14_37_09-Deploy OVF Template Give name e.g firefly00 for the first instance. Continue and you can choose whatever suggested in the wizard. Now you should have two firefly instances ready to be configured as below:

Continue reading

SRX cluster ip-monitoring

In an SRX chassis cluster setup, in addition to interface monitoring you can also use
IP monitoring to monitor the health of your upstream path.

srx-chassis-ip-monitoring

I have a simple topology to explain how ip monitoring works. In this setup node0 and node1
are part of an srx chassis cluster. reth0.0 interface is part of the redundancy group 1 (RG1)
Currently node0 is the primary for RG1 as you can see from the output below;

Now lets configure IP monitoring to detect any failure in network layer.

Continue reading

error: the routing subsystem is not running

If you haven’t seen this error message, you will see one day when you are dealing with SRX chasssis clusters.
It may baffle you having a firewall in which you can’t display routes. It is all because of the fact that chassis cluster considers two nodes as a single data plane and routing functionality is handled on the primary node or let´s say the node having the active routing engine.

First of all this error message is by design. Don’t panic! The question is how we can reach a network that we need to? In this case “backup-router” configuration comes into play. Here is groups configuration from my SRX cluster. With this statement secondary node, can reach network 192.168.103.0/24 via the gateway 10.200.200.3. Can we forward all network ranges to this gateway? We can but according to KB http://kb.juniper.net/KB15580 this is not recommended.

Chassis cluster file operations

There are a couple of handy commands which you can use if you have a JSRP cluster.

For example following two commands can be used for copying a file or directory e.g from node0 to node1 :

If you want to copy directory you should also include “-r” option in rcp.

The other command which is also good to know is ;

This switches you from node0 to node1 in the terminal.

SRX cluster

You can find step by step instructions to set up an SRX firewall chassis cluster in different branch models. Before starting your cluster config, please make sure you have installed the JTAC recommended release which you can find at http://kb.juniper.net/KB21476

Please note that these instructions below belong to several branch models each of which has slightly different configuration. Pick the one you have. You can also use HA configuration tool developed by Juniper for easier configuration at here.

1) In branch SRX devices (but only 1XX and 2xx models)  ethernet switching must be disabled before enabling cluster.

***ethernet switching must be disabled on other interfaces as well not only ge-0/0/0.0 which is an example.

These changes aren’t sufficient. Delete control link and management ports as well. For example,

in srx210 cluster;

To remove management interface

To remove control link interface

in SRX650 cluster
management (fxp0)

control link

if you don’t delete these interfaces you will receive the following type of warning during boot or commit.

2) Once you issue;

on node 0

on node 1

Nodes will be rebooted, cluster may not come up if there is a configuration error.

3) 
After the systems are booted, you will have such an output;

If this is the case, configure the following management interface (fxp0) only on the primary as the config will be pushed to secondary automatically.

Setup host names and management IP addresses as follow.

fxp0 interface is the new interface name in the cluster environment and one dedicated port is assigned in each branch device. For example in an SRX210 cluster, fe-0/0/6 interface of each node must be used as the management interface. To check for other branch devices look at TABLE1

Configuration will look like below;

4) Configure fabric links (data-plane): Fabric interface is a dedicated interface in each node and you pick one available  in branch SRX devices. It is used to sync RTO’s (Real-Time Object) e.g sessions and can also pass traffic.

One thing to mention is  if we take SRX240 as an example, ge-5/0/4 is indeed ge-0/0/4 interface of node1. Don’t think that it is a mistake. Look at TABLE2 to see why it changes so.

SRX 240
First make sure there is no logical unit on fabric interface.

You have to delete the logical interface otherwise you will get the following error;

Once committed, the fabric link modifications must  be propagated to the node1 automatically if the cluster is UP.

SRX210 (only node1’s fabric interface starts with fe-2)

SRX650 (if I choose ge-0/0/2 on both nodes as fabric links)

Here is how the configuration looks like for SRX650;

 Check status;

[REDUNDANCY GROUPS]
Assume we have two uplinks connected to two SRX 210 devices.  Node0 is primary and node1 is secondary.


The above topology is so simplistic as it is to show how redundancy group works.

Below is the configuration according to which there are two redundancy groups. RG0 is for
control plane which no preempt is available. In RG1 node0 has higher priority and primary.
ge-0/0/0 interface is monitored actively and has a weight 255 which means if it fails,
its weight will be subtracted from 255 which results 0 and RG1 will fail over.

Redundancy Group Config 
reth-count defines how many reth interfaces we have.

Redundant Ethernet Config
According to this config,  ge-0/0/1 and ge-2/0/1 (indeed ge-0/0/1 of node1) interfaces
form reth0 interface.  As RG1 also monitors ge-0/0/0 actively , if it fails,
node1 will take over RG1.

Cluster status Failover
Here we can see that node0 is primary for RG1 and preempt enabled

Once ge-0/0/0 fails, the following output occurs

As it can be seen, priority of node0 is set to zero once it fails. Because preempt is ON,
if ge-0/0/0 link is back online, RG1 will fail over to node0 and folllowing output will
be printed (note failover count is incremented)

 THINGS TO CONSIDER

In SRX 240 models:

[stextbox id=”info” caption=”TIPS”]

a) For control plane links, use ge-0/0/1 on both nodes . You can cross connect both interfaces.

b) For fabric link, you can use any interfaces on nodes but pay attention to interface numbering in chassis cluster.  ge-5/0/4 is indeed interface ge-0/0/4 of node1,
this is because all interfaces after clustering is enabled start with ge-5/0/ on node1

c) Don’t leave any logical unit on any interfaces of data plane,fabric links. If so, you can receive such an error;

[edit interfaces fab0 fabric-options member-interfaces]
‘ge-0/0/4’    Logical unit is not allowed on fabric member
error: commit failed: (statements constraint check failed)

d) If during the configuration you loose synchronization between nodes, try to run “commit full” to remedy the situation.

[/stextbox]

Here are two tables from Juniper documents regarding cluster interfaces assignments: