# Chapter 7. High Availability

Information availability is a daily part of modern society. People make phone calls, read the news, stream songs, check sports scores, and watch television all over the Internet or on their local provider’s network. At any given time, at any given location, almost any bit of information can be made available over the Internet. Today, and in the near future, it’s expected that there should be no interruptions to the access of this flow of information. Failure to provide all of the world’s information at any user’s fingertips at any time, day or night, will create great wrath on whomever’s network is in the way. Welcome to the twenty-first century.

The average user of Internet services is unable to comprehend why the information she desires is not available. All that user knows is that it isn’t, and that is no longer acceptable. Consumers clamor for compensation and complain to all available outlets. Business users call the help desk and demand explanations while escalating their lost connection to all levels. Revenue is lost and the world looks bleak. Information must always be highly available, not just available.

The most likely location of a failure somewhere in the network is typically between the client device and the server. This chapter is dedicated to training network administrators on how to ensure that their SRX is not the device that brings down the network. Firewalls are placed in the most critical locations in the network, and when problems occur, trust us, users notice.

A router handles each packet as its own entity. It does not process traffic as though the packets had a relationship to each other. The packet could be attempting to start a new connection, end a connection, or have all sorts of strange data inside. A router simply looks at the Layer 3 header and passes the packet on. Because of this, packets can come in any order and leave the router in any way. If the router only sees one packet out of an entire connection and never sees another, it doesn’t matter. If one router failed, another router can easily pick up all of the traffic utilizing dynamic routing. Designing for high availability (HA) with stateful firewalls is different because of their stateful nature.

Stateful firewalls need to see the creation and teardown of the communication between two devices. All of the packets in the middle of this communication need to be seen as well. If some of the packets are missed, the firewall will start dropping them as it misses changes in the state of communications. Once stateful firewalls came into the picture, the nature of HA changed. The state of traffic must be preserved between redundant firewalls. If one firewall fails, the one attempting to take over for it must have knowledge of all of the traffic that is passing through it. All established connections will be dropped if the new firewall does not have knowledge of these sessions. This creates the challenge of ensuring that state synchronization can occur between the two devices. If not, the whole reason for having redundancy in the firewalls is lost.

## Understanding High Availability in the SRX

The design of the SRX is extremely robust regardless of the model or platform. It has a complete OS and many underlying processors and subsystems. Depending on the platform, it could have dozens of processors. Because of this, the SRX implements HA in a radically different way than most firewalls. Common features such as configuration and session synchronization are still in the product, but how the two chassis interact is different.

### Chassis Cluster

An SRX HA cluster implements a concept called chassis cluster. A chassis cluster takes the two SRX devices and represents them as a single device. The interfaces are numbered in such a way that they are counted starting at the first chassis and then ending on the second chassis. Figure 7-1 shows a chassis cluster. On the left chassis, the FPC starts counting as normal; on the second chassis, the FPCs are counted as though they were part of the first chassis.

In Chapter 2, we discussed the concept of the route engine. In an SRX cluster, each SRX has one active RE. When the cluster is created, the two REs work together to provide redundancy. This is similar to the Juniper M Series, T Series, and MX Series routing platforms that support dual REs. The Junos OS is currently limited to supporting two REs per device. Because of this, the SRX cluster can only have one RE per chassis. When the chassis are combined and act as a single chassis, the devices reach the two-RE limit.

Multiple REs in a single SRX are only supported to provide dual control links. They do not provide any other services.

The chassis cluster concept, although new to the SRX, is not new to Juniper Networks. The SRX utilizes the code infrastructure from the TX Matrix products. The TX Matrix is a multichassis router that is considered one of the largest routers in the world. Only the largest service providers and cloud networks utilize the product. Because of its robust design and reliable infrastructure, it’s great to think that the code from such a product sits inside every SRX. When the SRX was designed, the engineers at Juniper looked at the current available options and saw that the TX Matrix provided the infrastructure they needed. This is a great example of how using the Junos OS across multiple platforms benefits all products.

To run a device in clustering mode, there are a set of specific requirements. For the SRX1400, SRX3000, and SRX5000 lines, the devices must have an identical number of SPCs and the SPCs must be in identical locations. The SRXs, however, can have any number of interface cards and they do not have to be in the same slots. Best practice suggests, though, that you deploy interfaces in the same FPCs or PIC slots, as this will make it easier to troubleshoot in the long run.

For most network administrators, this concept of a single logical chassis is very different from traditional HA firewall deployment. To provide some comparison, in ScreenOS, for example, the two devices were treated independently of each other. The configuration, as well as network traffic state, was synchronized between the devices, but each device had its own set of interfaces.

On the branch SRX Series products, Ethernet switching as of Junos 11.1 is supported when the devices are in chassis cluster mode. You will also need to allocate an additional interface to provide switching redundancy. This is covered later in the chapter.

### The Control Plane

As discussed throughout this book, the SRX has a separated control plane and data plane. Depending on the SRX platform architecture, the separation varies from being separate processes running on separate cores to completely physically differentiated subsystems. For the purposes of this discussion, however, it’s enough to know that the control and data planes are separated.

The control plane is used in HA to synchronize the kernel state between the two REs. It also provides a path between the two devices to send hello messages between them. On the RE, a process or daemon runs, called jsrpd. This stands for Junos stateful redundancy protocol daemon. This daemon is responsible for sending the messages and doing failovers between the two devices. Another kernel, ksyncd, is used for synchronizing the kernel state between the two devices. All of this occurs over the control plane link.

The control plane is always in an active/backup state. This means only one RE can be the master over the cluster’s configuration and state. This ensures that there is only one ultimate truth over the state of the cluster. If the primary RE fails, the secondary takes over for it. Creating an active/active control plane makes synchronization more difficult because many checks would need to be put in place to validate which RE is right.

The two devices’ control planes talk to each other over a control link. This link is reserved for control plane communication. It is critical that the link maintain its integrity to allow for communication between the two devices.

### The Data Plane

The data plane’s responsibility in the SRX is to pass data and processes based on the administrator’s configuration. All session and service states are maintained on the data plane. The REs, control plane, or both are not responsible for maintaining state (the RE simply requests data and statistics from the data plane and returns them to the administrator).

The data plane has a few responsibilities when it comes to HA implementation. First and foremost is state synchronization. The state of sessions and services is shared between the two devices. Sessions are the state of the current set of traffic that is going through the SRX, and services are other items such as the VPN, IPS, and ALGs.

On the branch SRX Series, synchronization happens between the flowd daemon running on the data plane. The SRX Series for the branch, as discussed in Chapter 1, runs a single multicore processor with a single multithreaded flowd process. The data center SRX distributed architecture state synchronization is handled in a similar fashion. Figure 7-2 shows a detailed example.

In Figure 7-2, two SRX data center platforms are shown. Node 0 is shown on the left and node 1 is on the right. Each device is depicted with two SPCs. SPC 0 is the SPC that contains the CP SPU and a second flow SPU. In SPC 1, both SPUs are flow SPUs. Both SRX data center platforms are required to have the same number and location of SPCs and NPCs. This is required because the SPUs talk to their peer SPU in the same FPC and PIC location. As seen in the back of Figure 7-2, the flow SPU in FPC 0 on node 0 sends a message to node 1 on FPC 0 in PIC 1. This is the session synchronization message. Once the SPU on node 1 validates and creates the session, it sends a message to its local CP. As stated in Chapter 1, the CP processors are responsible for maintaining the state for all of the exiting sessions on the SRX. The secondary device now has all of the necessary information to handle the traffic in the event of a failover.

Information is synchronized in what is known as a real-time object (RTO). This RTO contains the necessary information to synchronize the data to the other node. The remote side does not send an acknowledgment of the RTO because doing so would slow down the session creation process, and frankly, an acknowledgment is rarely needed. There are many different RTO message types. New ones can be added based on the creation of new features on the SRX. The most commonly used message types are the ones for session creation and session closure.

334 The second task the SRX needs to handle is forwarding traffic between the two devices. This is also known as data path or Z path forwarding. Figure 7-3 illustrates this. Under most configuration deployments, Z path forwarding is not necessary. However, in specific designs, this operation might be very common. (The details are further explored in the section “Deployment Concepts” later in this chapter.) In the event that traffic is received by a node, the node will always forward the traffic to a node on which the traffic will egress.

The last task for the data link is to send jsrpd messages between the two devices. The jsrpd daemon passes messages over the data plane to validate that it is operating correctly. These are similar to the messages that are sent over the control link, except that they go through the data plane. By sending these additional messages over the data plane, the RE ensures that the data plane is up and capable of passing traffic. On the branch SRX Series devices, the message exits the control plane, passes through flowd and over the data link, and then to the second device. The second device receives the packet, flowd, and passes the packet to the control plane and on to jsrpd. Depending on the platform, the rate for the messages will vary.

All of these data plane messages pass over the data link. The data link is also known as the fabric link, depending on the context of the discussion. The size of the link varies based on the requirements. These requirements consist of the amount of data forwarding between devices and the number of new connections per second:

• On the SRX100, SRX110, SRX210, and SRX220, a 100 MB Ethernet link is acceptable for the data link.

• For the SRX550, SRX650, and SRX240, it’s suggested that you use a 1 GB link.

• On the data center SRXs, a 1 GB link is acceptable unless data forwarding is going to occur.

• Even on an SRX5000 Series with a maximum of 380,000 new CPS, a 1 GB link can sustain the RTOs throughput.

• If data forwarding is in the design, a 10 GB link is suggested.

## Getting Started with High Availability

This chapter started with the concept of the chassis cluster because it’s the fundamental concept for the entire chapter. There are several important aspects to the chassis cluster; some concern how the cluster is configured, and others are simply key to the fault tolerance the chassis cluster provides. In this section, we explore the deeper concepts of the chassis cluster.

### Cluster ID

Each cluster must share a unique identifier among all of its members. This identifier is used in a few different ways, but most important it is used when two devices are communicating with each other. Fifteen cluster IDs are available for use when creating a cluster. The cluster ID is also used when determining MAC addresses for the redundant Ethernet interfaces.

### Node ID

The node ID is the unique identifier for a device within a cluster. There are two node IDs: 0 and 1. The node with an ID of 0 is considered the base node. The node ID does not give the device any sort of priority over its mastership, only in interface ordering. Node 0 is the first node for the interface numbering in the chassis cluster. The second node, node 1, is the second and last node in the cluster.

### Redundancy Groups

In an HA cluster, the goal is the ability to fail over resources in case something goes wrong. A redundancy group is a collection of resources that need to fail over between the two devices. Only one node at a time can be responsible for a redundancy group; however, a single node can be the primary node for any number of redundancy groups.

Two different items are placed in a redundancy group: the control plane and the interfaces. The default redundancy group is group 0. Redundancy group 0 represents the control plane. The node that is the master over redundancy group 0 has the active RE. The active RE is responsible for controlling the data plane and pushing new configurations. It is considered the ultimate truth in matters regarding what is happening on the device.

The data plane components for redundancy groups exist in numbers 1 and greater. The different SRX platforms support different numbers of redundancy groups. A data plane redundancy group contains one or more redundant Ethernet interfaces. Each member in the cluster has a physical interface bound into a reth. The active node’s physical interface will be active and the backup node’s interface will be passive and will not pass traffic. It is easier to think of this as a binary switch. Only one of the members of the reth is active at any given time. The section “Deployment Concepts” later in this chapter details the use of data plane redundancy groups.

### Interfaces

A network device doesn’t help a network without participating in traffic processing. An SRX has two different interface types that it can use to process traffic. The first is the reth. A reth is a Junos aggregate Ethernet interface and it has special properties compared to a traditional aggregate Ethernet interface. The reth allows the administrator to add one or more child links per chassis. Figure 7-4 shows an example of this where node 0 is represented on the left and node 1 is represented on the right.

In Figure 7-4, node 0 has interface xe-0/0/0 as a child link of reth0 and node 1 has interface xe-12/0/0. The interface reth0 is a member of redundancy group 1. The node, in this case node 0, has its link active. Node 1’s link is in an up state but it does not accept or pass traffic. After a failover between nodes, the newly active node sends out GARPs. Both nodes share the same MAC address on the reth. The surrounding switches will learn the new port that has the reth MAC address. The hosts are still sending their data to the same MAC, so they do not have to relearn anything.

The MAC address for the reth is based on a combination of the cluster ID and the reth number. Figure 7-5 shows the algorithm that determines the MAC address. In Figure 7-5, there are two types of fields: the hex field represents one bit by using a hexadecimal representation of a byte using two base-16 digits; the bit field represents a number in binary with eight bits.

The first four of the six bytes are fixed. They do not change between cluster deployments. The last two bytes vary based on the cluster ID and the reth index. In Figure 7-5, CCCC represents the cluster ID in binary. With four bits, the maximum number is 15, which is the same number of cluster IDs supported. Next, the RR represents a reserved field for future expansion. It is currently set to 0 for both bits. The VV represents the version of the chassis cluster, which today is set at 0 for both of the bits. Last is the field filled with XXXXXXXXX, and this represents the redundant Ethernet index ID. Based on Figure 7-5, it’s easy to see that collision of MAC addresses between clusters can be avoided.

When configured in a chassis cluster, the SRX is also able to support local interfaces. A local interface is an interface that is configured local to a specific node. This method of configuration on an interface is the same method of configuration on a standalone device. The significance of a local interface in an SRX cluster is that it does not have a backup interface on the other chassis, meaning that it is part of neither a reth nor a redundancy group. If this interface were to fail, its IP address would not fail over to the other node. Although this feature might seem perplexing at first, it actually provides a lot of value in complex network topologies, and it is further explored later in this chapter.

## Deployment Concepts

It’s time to apply all these concepts to actual deployment scenarios. For HA clusters, there is a lot of terminology for the mode of actually deploying devices, and this section attempts to give administrators a clear idea of what methods of deployment are available to them.

Earlier in this chapter we discussed control plane redundancy, whereby the control plane is deployed in an active/passive fashion. One RE is active for controlling the cluster, and the second RE is passive. The secondary RE performs some basic maintenance for the local chassis and synchronizes the configuration as well as checks that the other chassis is alive.

In this section, we discuss what can be done with the redundancy groups on the data plane. The configuration on the data plane determines in which mode the SRXs are operating. The SRX doesn’t have an idea of being forced into a specific mode of HA, but operates in that mode based on the configuration. There are three basic modes of operation and one creative alternative:

• Active/passive

• Active/active

• Mixed mode

• The six pack

### Active/passive

In the active/passive mode, the first SRX data plane is actively passing traffic while the second SRX data plane is sitting in a passive setting not passing traffic. On a fault condition, of course, the passive data plane will take over and begin passing traffic. To accomplish this, the SRX uses one data plane redundancy group and one or more redundant Ethernet interfaces. Figure 7-6 illustrates an example of this active/passive process.

As shown in Figure 7-6, node 0, on the left, is currently active and node 1 is passive. In this example, there are two reth interfaces: reth0 and reth1. Reth0 goes toward the Internet and reth1 goes toward the internal network. Because node 0 is currently active, it is passing all of the traffic between the Internet and the internal network. Node 1’s data plane is (patiently) waiting for any issue to arise so that it can take over and continue to pass traffic. The interfaces on node 1 that are in the reth0 and reth1 groups are physically up but are unable to pass traffic. Because node 0 is currently active, it synchronizes any new sessions that are created to node 1. When node 1 needs to take over for node 0, it will have the same session information locally.

### Active/active

In an active/active deployment, both SRXs are simultaneously passing traffic. Although it sounds difficult, the concept is simple—active/active is merely active/passive but done twice. In this case, each member of the cluster is active for its own redundancy group and the other device is passive for the redundancy group. In the event of a failure, the remaining node will take over for the traffic for the failed device. Synchronization happens between both nodes. Sessions for both redundancy groups are available on both nodes.

So, this question remains: what does this type of deployment mean for the administrator? The biggest advantage is that passing traffic over the backup node ensures that the backup data plane is ready and correctly functioning. Nothing is worse than having an HA cluster running for months and then, during the moment of truth, a failure occurs, and the second node is in a degraded state and no one discovered this ahead of time. A good example of avoiding this is to have one of the redundancy groups passing a majority of the traffic while the other redundancy group is used to pass only a single health check. This is a great design because the second device is verified and the administrator doesn’t have to troubleshoot load-sharing scenarios.

Active/active deployments can also be used to share load between the two hosts. The only downside to this design is that it might be difficult to troubleshoot flows going through the two devices, but ultimately that varies based on the administrator and the environment, and it’s probably better to have the option available in the administrator’s tool chest than not. Figure 7-7 shows an example of an active/active cluster.

Figure 7-7 shows an active/active cluster as simply two active/passive configurations. Building from Figure 7-6, the example starts with the same configuration as before. The clusters had a single redundancy group 1 and two reths, reth0 and reth1, with node 0 being the designated primary. In this example, a second redundancy group is added, redundancy group 2, and two additional reths are added to accommodate it. Reth2 is on the Internet-facing side of the firewalls and reth3 is toward the internal network. This redundancy group, however, has node 1 as the primary, so traffic that is localized to redundancy group 2 is only sent through node 1 unless a failure occurs.

### Mixed mode

Mixed mode, perhaps the most interesting HA configuration, builds on the concepts already demonstrated but expands to include local interfaces. As we discussed earlier, a local interface is an interface that has configurations local to the node for which it is attached. The other node is not required to have a backup to this interface as in the case of a reth.

This option has significance in two specific use cases.

The first use case is WAN interfaces. For this use case, there are two SRX210s, each with a T1 interface and a single reth to present back to the LAN, as depicted in Figure 7-8. Node 0 on the left has a T1 to provider A and node 1 on the right has a T1 to provider B. Each node has a single interface connected to the LAN switch. These two interfaces are bound together as reth0. The reth0 interface provides a redundant, reliable gateway to present to clients. Because of the way a T1 works, it is not possible to have a common Layer 2 domain between the two T1 interfaces, so each T1 is its own local interface to the local node.

Traffic can enter or exit either T1 interface, and it is always directed out to the correct interface. In the case shown in Figure 7-8, that would be reth0, as it is the only other interface configured. The benefit of this design is that the two T1s provide redundancy and increased capacity, and sessions between the two interfaces are synchronized. It’s great when you are using T1 interfaces as connections to a remote VPN site.

A second great use case for mixed mode is with data centers using a dynamic routing integration design. The design is similar to our previous example, but in this case all of the interfaces are Ethernet. The two SRXs each have two interfaces connected into two different M120 routers, all of which can be seen in Figure 7-9. Having two links each going to two different routers provides a better level of redundancy in case links or routers fail. The OSPF routing protocol is enabled between the SRXs and the upstream routers, allowing for simplified failover between the links and ensuring that the four devices can determine the best path to the upstream networks. If a link fails, OSPF recalculates and determines the next best path.

You can see in Figure 7-9 that the southbound interfaces connect into two EX8200 core switches. These switches provide a common Layer 2 domain between the southbound interfaces, which allows for the creation of reth0 (similar to the rest of the designs seen in this chapter).

### Six pack

It’s possible to forgo redundant Ethernet interfaces altogether and use only local interfaces. This is similar to the data center mixed mode design, except it takes the idea one step further and uses local interfaces for both the north- and southbound connections. A common name for this design is six pack. It uses four routers and two firewalls and is shown in Figure 7-10.

Much like the mixed mode design, the two northbound routers in Figure 7-10 are connected to the SRXs with two links. Each router has a connection to each SRX. On the southbound routers, the same design is replicated. This allows for a fully meshed, active/active, and truly HA network to exist. In this case, the SRXs are acting more like how a traditional router would be deployed. OSPF is used for the design to direct traffic through the SRXs, and it’s even possible to use equal cost multipath routing to do balancing for upstream hosts.

The six pack design shows just how flexible the SRXs can be to meet the needs of nearly any environment. These deployments can even be done in either the traditional Layer 3 routing mode or Layer 2 transparent mode.

## Preparing Devices for Deployment

Understanding how a chassis cluster works is half the battle in attaining acceptable HA levels. The rest concerns configuring a cluster.

To be fair, the configuration is actually quite easy—it’s just a few steps to get the cluster up and running. Setting it up correctly is the key to a stable implementation, and needless to say, rushing through some important steps can cause serious pain later on. We therefore suggest that you start with fresh configurations, if possible, even if this means clustering the devices starting with a minimal configuration and then adding on from there.

If there is an existing configuration, set it aside and then create the cluster. After the cluster is running happily, then migrate the configuration back on.

### Differences from Standalone

When an administrator enters configuration mode on a standalone SRX, all of the active users who log in to the device can see the configuration and edit it. When each user’s changes can be seen by the other users on the device, it’s called a shared configuration. Once chassis clustering is enabled, the devices must be configured in what is called configure private, or private, mode, which allows each administrator to see only her own configuration changes. This imposes several restrictions on the end administrator while using configure private mode.

The first notable restriction is that all configuration commits must be done from the root, or top, of the configuration hierarchy. Second, the option to do commit confirmed is no longer allowed, which, as you know, allows for a rollback to the previous configuration if things go wrong. Both are very nice features that are not available when in clustering mode. The reason these are disabled is simple: stability.

A lot of communication is going on between the two SRXs when they are in a clustered mode, so when committing a change, it is best to minimize the chances of differences between the two devices’ local configurations at the time of the commit. If each node had a user modifying the configuration at the same time, this would add an unneeded level of complexity to ensure that the configurations are synchronized. Because of this, private mode is required while making configuration changes.

### Activating Juniper Services Redundancy Protocol

The first step in creating a cluster is to place the device into cluster mode. By default, the SRX does not run the jsrpd daemon, so this must be triggered. To enable the jsrpd daemon and turn the device into an eligible chassis cluster member, a few special bits must be set in the Non Volatile Random Access Memory (NVRAM) on the device, triggering the SRX, on boot, to enable jsrpd and enter chassis cluster mode.

These settings are permanent until they are otherwise removed. An initial reboot is required after setting the cluster ID to get the jsrpd daemon to start. The daemon will start every time the bits are seen in the NVRAM.

It takes a single command, and it takes effect only on reboot. Although it is unfortunate that a reboot is required, it is required only once. You must run the command from operational mode and as a user with superuser privileges.

root@SRX210-H> set chassis cluster cluster-id 1 node 0 reboot
Successfully enabled chassis cluster. Going to reboot now

root@SRX210-H>
*** FINAL System shutdown message from root@SRX210-H ***
System going down IMMEDIATELY

For this command to work, we needed to choose the cluster ID and the node ID. For most implementations, cluster ID 1 is perfectly acceptable, as we discussed earlier. The node ID is easy, too: for the first node that is being set up, use node 0, and for the second node, use node 1. There isn’t a specific preference between the two. Being node 0 or node 1 doesn’t provide any special benefit; it’s only a unique identifier for the device.

Once the device comes back up, it’s easy to notice the changes. Right above the shell prompt is a new line:

{primary:node0} #<----new to the the prompt
root>

This line gives the administrator two important pieces of information. The part to the left of the colon is the current status of the cluster control plane in relevance to the cluster, and it will define which state the RE is in.

This only shows the control plane status. This does not show which device has the active data plane. This is a common mistake for those using the SRX. That message should be on its own page all by itself, as it’s that important to remember.

There are several different options for control plane status, as listed in Table 7-1. On boot, the device enters the hold state. During this state, the control plane is preparing itself to enter the cluster. Next the device enters the secondary state when the RE checks to see if there is already a primary RE in the cluster. If not, it then transitions to the primary state.

Table 7-1. Control plane states
 State Meaning Hold This is the initial state on boot. The RE is preparing to join the cluster. Secondary The RE is in backup state and is ready to take over for the primary. Primary The RE is the controller for the cluster. Ineligible Something has occurred that makes the RE no longer eligible to be part of the cluster. Disabled The RE is no longer eligible to enter the cluster. It must be rebooted to rejoin the cluster. Unknown A critical failure has occurred. The device is unable to determine its current state. It must be rebooted to attempt to reenter the cluster. Lost Communication with the other node is lost. A node cannot be in a lost state; this is only listed under the show chassis cluster status command when the other device was never detected. Secondary-hold A device enters secondary-hold when it is identified as a secondary but the configured hold-down timer has not yet expired. In the event of a critical failure, the redundancy group can still fail over.

After the primary states are three states that occur only when something goes wrong. Ineligible occurs when something happens that invalidates the member from the cluster. From there, the device enters the disabled state after a period of time while being ineligible. The last state, unknown, can occur only if some disastrous, unexpected event occurs.

Once the system is up, and in either the final primary or secondary state, there are a few steps you can take to validate that the chassis cluster is indeed up and running. First, check that the jsrpd daemon is up and running. If the new cluster status message is above the prompt, it’s pretty certain that the daemon is running.

{primary:node0}
root> show system processes | match jsrpd
863  ??  S      0:00.24 /usr/sbin/jsrpd -N

{primary:node0}
root> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status  Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   1           primary      no       no
node1                   0           lost         n/a      n/a

{primary:node0}
root>

The greatest friend of anyone using the chassis cluster is the show chassis cluster status command. It is the most common command for looking at the current status of the cluster and it is full of information. The first bit of information is the cluster ID, the one that was initially configured and will likely stay that way until cleared. Next is information regarding all of the redundancy groups that are configured; the first one in our case is redundancy group 0. This represents the control plane only and has no relevance on who is actively passing traffic.

Under each redundancy group, each node is listed along with its priorities, status, preempt status, and whether the device is in manual failover. By default, redundancy group 0 is created without user intervention. Each device is given a default priority of 1. Because of this, the first node that becomes primary will be primary until a failure occurs. Next, the status is listed. The last two columns are preempt and manual failover. Preempt is the ability to configure the device with the higher priority to preempt the device with the lower priority. The manual failover column will state if the node was manually failed over to by the administrator.

### Managing Cluster Members

Most Junos devices have a special interface named fxp0 that is used to manage the SRXs. It is typically connected to the RE, although some devices, such as the SRX100 and SRX200 Series, do not have a dedicated port for fxp0 because the devices are designed to provide the maximum number of ports for branch devices. However, when SRX devices are configured in a cluster, the secondary node cannot be directly managed unless it has either a local interface or the fxp0 port. To ease management of the SRX100 and SRX200 Series, the fe-0/0/6 port automatically becomes the fxp0 port. In the section “Node-Specific Information” later in this chapter, we discuss how to configure this port.

The fxp0 interface exists on the majority of Junos devices. This is due to the devices’ service-provider-like design. Fxp0 allows for secure out-of-band management, enabling administrators to access the device no matter what is happening on the network. Because of this, many of the capabilities and management services often operate best through the fxp0 port. Tools such as NSM and Junos Space operate best when talking to an fxp0 port. Also, updates for IDP and UTM will work best through this interface. After 12.1, this is no longer the case, and you are freed from this limitation.

Managing branch devices that are remote can often be a challenge. It might not be possible to directly connect to the backup node. This is especially an issue when using a management tool such as NSM or Junos Space. Luckily, Juniper created a way to tunnel a management connection to the secondary node through the first. This mode is called “cluster-master” mode.

This mode requires a single command to activate.

{secondary:node1}
root@SRX-HA-1# set chassis cluster network-management cluster-master

{secondary:node1}
root@SRX-HA-1# edit chassis cluster

{secondary:node1}[edit chassis cluster]
root@SRX-HA-1# show
reth-count 2;
heartbeat-threshold 3;
network-management {
cluster-master;
}
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
}

{secondary:node1}[edit chassis cluster]
root@SRX-HA-1#

### Configuring the Control Ports

Now that the devices are up and running, it’s time to get the two devices talking. There are two communication paths for the devices to talk over; the first leads to the second. The control port is the first, and by configuring the control port it’s possible to get the devices communicating early on. Then, once the devices are in a cluster, the configuration is automatically synchronized for a consistent second method. This can cut the administrator’s work in half as both devices need to be configured only once.

Different platforms have different requirements for configuring the control port. Table 7-2 lists each platform and the control port location. Because each platform has different subsystems under the hood, so to speak, there are different ways to configure the control port. The only device that requires manual configuration is the SRX5000. Some devices also support dual or redundant control ports.

Table 7-2. Control ports by platform
 Device Control port Description Dual support? SRX100, SRX110, and SRX210 fe-0/0/7 Dedicated as a control port upon enabling clustering No SRX220 ge-0/0/7 Dedicated as a control port upon enabling clustering No SRX240 and SRX650 ge-0/0/1 Dedicated as a control port upon enabling clustering No SRX1400 ge-0/0/10 and optionally ge-0/0/11 Dedicated as a control port upon enabling clustering Yes SRX3000 Both located on the SFB No user configuration required Yes SRX5000 Located on the SPC Manual configuration required Yes

When connecting control ports, you connect the control port from each device to the other device. It is not recommended that you join two primary devices together—it’s best to reboot the secondary and then connect the control port. On reboot, the two devices will begin to communicate.

For all of the SRX devices, except the SRX5000, you can do this right after the initial cluster configuration. For the SRX5000 Series, two reboots are required.

The SRX5000 Series control ports are located on the SPC, because when Juniper was creating the SRX5000, the SPC was the only part that was created from scratch and the remaining parts were taken from the MX Series. Ultimately, locating the control ports on the SPC removes the control ports from other components while adding some additional resiliency. The SPC and its underlying traffic processing are physically separate from the control ports even though they are located on the same card. The SRX5000 must use fiber SFPs to connect the two chassis.

To configure the control ports on the SRX5000, the administrator first needs to determine which ports she wants to configure based on which FPC the control port is located within. Next, the administrator must identify the port number (either port 0 or port 1).

{primary:node0}[edit chassis cluster]
root@SRX5800A# set control-ports fpc 1 port 0
root@SRX5800A# set control-ports fpc 2 port 1
root@SRX5800A# show
control-ports {
fpc 1 port 0;
fpc 2 port 1;
}
root@SRX5800A# commit

There is logic in how the control ports should be configured on the SRX5000s. The control ports can be on the same FPC, but ideally, the SRX should not be configured that way. If possible, do not place the control port on the same card as the CP or central point processor because the CP is used as a hop for the data link. If the FPC with the CP fails, and the control link is on it and it’s a single control link, the SRX cluster can go into split brain or dual mastership. Because of this, separating the two is recommended. So, if an administrator is going to utilize dual control links, it’s recommended that she place each control link on separate SPCs and the CP on a third SPC. This would require at least three SPCs, but this is the recommendation for the ultimate in HA.

Once the control links are up and running, and the secondary node is rebooted and up and running, it’s time to check that the cluster is communicating. Again, we go back to the show chassis cluster status command.

{primary:node0}
root> show chassis cluster status
Cluster ID: 1
Node                Priority      Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   1             primary        no       no
node1                   1             secondary      no       no

{primary:node0}
root>

Both devices should be able to see each other, as shown here, with one device being primary and the other secondary.

Next, because there are two devices, it’s possible to check communications between the two, this time using the show chassis cluster statistics command.

{primary:node0}
root> show chassis cluster control-plane statistics
Heartbeat packets sent: 217
Heartbeat packet errors: 0
Probes sent: 4286
Probe errors: 0

At this point in the cluster creation, you should see only heartbeat messages on the control link, such as under the statistic Heartbeat packets received:. In the preceding output, 21 packets have been received. Typically, the number of heartbeat packets sent and received will not match, as one device started before the other and did not receive messages for a period of time. But once the sent and received numbers consistently match, everything on the control plane should be in order.

The SRX1400, SRX3000, and the SRX5000 are able to use two control links that are provided for redundancy only. In the event that one of the control links on the device fails, the second is utilized. But to use the second control link, an additional component is needed. The SRX3000 uses a component called the SCM, which is used to activate and control the secondary control link. On the SRX5000, a standard RE can be used. The RE needs to have Junos 10.0 or later loaded to operate the control link. Both the SCM and the secondary RE are loaded into the second RE port on the platform. These modules do not act as an RE or backup to the RE, but rather are used only for the backup control link.

These components must be placed into the chassis while it is powered off. On boot, the secondary link will be up and functional.

A quick look at the output of the show chassis cluster control-plane statistics command shows the second control link working.

root > show chassis cluster control-plane statistics
Heartbeat packets sent: 1114
Heartbeat packet errors: 0
Heartbeat packets sent: 1114
Heartbeat packet errors: 0
Probes sent: 1575937
Probe errors: 0

A final configuration option needs to be configured for the control link, and that is control link recovery. Control link recovery allows for automated recovery of the secondary chassis in the event that the control link fails. If the single or both control links fail, then the secondary device will go into the disabled state.

On the data center SRXs, a feature called unified in-service software upgrade (ISSU) can be used. This method is a graceful upgrade method that allows for the SRXs to upgrade without losing sessions or traffic.

The process might take some time to complete because the kernel on the two devices must synchronize and the software must be updated. It is suggested that you have all of the redundancy groups on a single member in the cluster. The process is similar to the other, except the upgrade only needs to be run on one SRX.

{primary:node0}
root@SRX5800-1> request system software in-service-upgradejunos-srx5000-12.1X44.10-domestic.tgz reboot

The command will upgrade each node and reboot them as needed. No further commands are required.

There is one last option that the unified ISSU process can use: the no-old-master-upgrade command, which leaves the master in a nonupgraded state. This ensures that there is a working box should the software upgrade fail. After successful completion of the upgrade, the old master is manually upgraded, as shown here.

{primary:node0}
root@SRX5800-1> request system software in-service-upgradejunos-srx5000-12.1X44.D10-domestic.tgz no-old-master-upgrade

##next on the old master

{primary:node0}
root@SRX5800-1> request system software add junos-srx5000-12.1X44D10-domestic.tgz

{primary:node0}
root@SRX5800-1> request chassis cluster in-service-upgrade abort

{primary:node0}
root@SRX5800-1> request system reboot

If things do go wrong and both nodes are unable to complete the upgrade in the unified ISSU process, the upgraded node needs to be rolled back. This is simple. First, you must abort the unified ISSU process, then roll back the software on that node, and then reboot the system.

{primary:node0}
root@SRX5800-1> request chassis cluster in-service-upgrade abort

{primary:node0}
root@SRX5800-1> request system software rollback

{primary:node0}
root@SRX5800-1> request system reboot

To recover, the device must be rebooted. The risk is that the device might not be able to see the primary on reboot, so if that occurs, dual mastership or split brain will result. The better option is to enable control link recovery. It only takes a single command to enable, as shown in the next example.

{primary:node0}[edit chassis cluster]
root# set control-link-recovery

{primary:node0}[edit chassis cluster]
root# show
control-link-recovery;

Once control link recovery is enabled, a user can manually reconnect the control link. After the control link has been up for about 30 seconds and the SRXs have determined that the link is healthy, the secondary node will reboot. After recovering from the reboot, the cluster will be up and synchronized and ready to operate. Although a reboot seems harsh for such a recovery, it is the best way to ensure that the backup node is up and completely operational.

### Configuring the Switching Fabric Interface

The branch series of devices has the ability to perform local switching. However, once you enter chassis cluster mode, what do you do when you still need to provide local switching? The branch SRX devices now have the ability to share a single switching domain across two devices. This is excellent for small branches that need to offer switching to hosts without even needing to add standalone switches. There are a few things to take into consideration before you enable switching in your cluster.

First, to enable switching in a cluster, you need to dedicate one interface on each SRX to connect to the other cluster member. This allows a dedicated path to connect between the two switches. On some of the smaller SRXs this will eat up another valuable port. This is why the feature is only supported on the SRX240 and up; it makes more sense to enable this configuration. For the SRX550 or the SRX650 that have G-PIMs, you need to create a switch fabric interface between each G-PIM that you want to bridge switching between. Also Q-in-Q features are not supported in chassis cluster due to hardware limitations.

{primary:node1}
root@SRX-650# set interfaces swfab0 fabric-options member-interfaces ge-2/0/5

{primary:node1}
root@SRX-650# set interfaces swfab0 fabric-options member-interfaces ge-11/0/5

{primary:node1}
root@SRX-650# show interfaces
-- snip --
swfab0 {
fabric-options {
member-interfaces {
ge-2/0/5;
}
}
}
swfab1 {
fabric-options {
member-interfaces {
ge-11/0/5;
}
}
}

{primary:node1}
root@SRX-650# run show chassis cluster ethernet-switching statistics

Probe state : UP
Probes sent: 1866
Probe recv errors: 0
Probe send errors: 0

### Node-Specific Information

A chassis cluster HA configuration takes two devices and makes them look as though they are one. However, the administrator might still want some elements to be unique between the cluster members, such as the hostname and the IP address on fxp0, which are typically unique per device. No matter what unique configuration is required or desired, it’s possible to achieve it by using Junos groups. Groups provide the ability to create a configuration and apply it anywhere inside the configuration hierarchy. It’s an extremely powerful feature, and here we use it to create a group for each node.

Each group is named after the node it is applied to, and it’s a special naming that the SRX looks for. After commit, only the group that matches the local node name is applied, as shown in the following configuration:

{primary:node0}[edit groups]
root# show
node0 {
system {
host-name SRX210-A;
}
interfaces {
fxp0 {
unit 0 {
family inet {
}
}
}
}
}
node1 {
system {
host-name SRX210-B;
}
interfaces {
fxp0 {
unit 0 {
family inet {
}
}
}
}
}

{primary:node0}[edit groups]
root#

In this configuration example, there are two groups, created under the groups hierarchy, which is at the top of the configuration tree. The node0 group has its hostname set as SRX210-A, and node1 has its hostname set as SRX210-B. To apply the groups, the administrator needs to use the apply-groups command at the root of the configuration. When the configuration is committed to the device, Junos will see the command and merge the correct group to match the node name.

{primary:node0}
root# set apply-groups "${node}" {primary:node0} root# show apply-groups ## Last changed: 2010-03-31 14:25:09 UTC apply-groups "${node}";

{primary:node0}
root#

root# show interfaces | display inheritance
fab0 {
fabric-options {
member-interfaces {
fe-0/0/4;
fe-0/0/5;
}
}
}
fab1 {
fabric-options {
member-interfaces {
fe-2/0/4;
fe-2/0/5;
}
}
}
##
## 'fxp0' was inherited from group 'node0'
##
fxp0 {
##
## '0' was inherited from group 'node0'
##
unit 0 {
##
## 'inet' was inherited from group 'node0'
##
family inet {
##
## '10.0.1.210/24' was inherited from group 'node0'
##
}
}
}

{primary:node0}
root#

To apply the configurations to the correct node, a special command was used: the set apply-groups "${node}" command. The variable "${node}" is interpreted as the local node name. Next in the output example is the show | display inheritance command, which shows the components of the configuration that are inherited from the group—the component that is inherited has three lines above it that all begin with ##, with the second line specifying from which group the value is inherited.

As discussed, the fxp0 management port can be configured like a standard interface providing a management IP address for each device, but it’s also possible to provide a shared IP address between each device so that when connecting to the IP it is redirected back to the primary RE. This way, the administrator does not have to figure out which is the master RE before connecting to it. The administrator can connect to what is called the master-only IP.

To do so, a tag is added to the end of the command when configuring the IP address, which is configured in the main configuration and not in the groups (because the tag is applied to both devices, there is no need to place it in the groups).

{primary:node0}
root# set interfaces fxp0.0 family inet address 10.0.1.212/24 master-only

{primary:node0}
root# show interfaces fxp0
unit 0 {
family inet {
master-only;
}
}
}

{primary:node0}
root@SRX210-A> show interfaces fxp0 terse
fxp0                    up    up
fxp0.0                  up    up   inet     10.0.1.210/24
10.0.1.212/24

{primary:node0}
root@SRX210-A>

### Configuring Heartbeat Timers

The SRX sends heartbeat messages on both the control and data links to ensure that the links are up and running. Although the device itself could look to see if the link is up or down, that is not enough to validate it. Heartbeat messages provide three layers of validation: link, daemon, and internal paths.

The message requires the two jsrpd daemons to successfully communicate, ensuring that the other daemon isn’t in a state of disarray and validating the internal paths between the two daemons, including the physical link and the underlying subsystems. For the data link, the packets are even sent through the data plane, validating that the flow daemons are communicating properly.

Each platform has default heartbeat timers that are appropriate for that device. The reason for the differences is due to the ability of the kernel to guarantee the time to the jsrpd daemon. Generally, the larger the device, the larger the processor on the RE; the larger the processor, the faster it can process tasks; and the faster the device can process tasks, the quicker it can move on to the next task.

This begs the question of how fast an administrator needs a device to fail over. Of course, the world would like zero downtime and guaranteed reliability for every service, but the answer is as fast as a device can fail over in a reasonable amount of time while maintaining stability.

Table 7-3 lists the various configuration options for heartbeat timers based on the SRX platform. The branch platforms use a higher timer because they use slower processors to ensure stability at the branch. Although a faster failover might be desired, stability is the most important goal. If the device fails over but is lost in the process, it is of no use.

Table 7-3. Control plane heartbeats
 Platform Control plane timer min (ms) Control plane timer max (ms) Missed heartbeat threshold min (sec) Missed heartbeat threshold max (sec) Min missing peer detection time (sec) SRX100 1,000 2,000 3 8 3 SRX110 1,000 2,000 3 8 3 SRX210 1,000 2,000 3 8 3 SRX220 1,000 2,000 3 8 3 SRX240 1,000 2,000 3 8 3 SRX550 1,000 2,000 3 8 3 SRX650 1,000 2,000 3 8 3 SRX1400 1,000 2,000 3 8 3 SRX3400 1,000 2,000 3 8 3 SRX3600 1,000 2,000 3 8 3 SRX5600 1,000 2,000 3 8 3 SRX5800 1,000 2,000 3 8 3

The SRXs have a default failover detection time of three seconds, and these platforms can be easily modified. There are two options to set: threshold and interval. Increasing the failover time is needed in many networks. Surrounding STP convergence might have high timers, and to match the failover times you might need to increase your failover detection times.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set heartbeat-interval 2000

{primary:node0}[edit chassis cluster]
root@SRX210-A# set heartbeat-threshold 8

{primary:node0}[edit chassis cluster]
root@SRX210-A# show control-link-recovery;
heartbeat-interval 2000;
heartbeat-threshold 8;

{primary:node0}[edit chassis cluster]
root@SRX210-A#

### Redundancy Groups

Redundancy groups are the core of the failover mechanism for the SRX and they are used for both the control and data planes. On any SRX cluster there can be at least 1 redundancy group at a minimum, and up to 128 at a maximum (including redundancy group 0). How many you deploy, of course, varies by platform and deployment scenario.

A redundancy group is a collection of objects, and it represents which node is the owner of the objects. The objects are either interfaces or the control plane. Whichever node is the primary owner for the redundancy group is the owner of the items in the redundancy group. On ScreenOS firewalls this was called a VSD (virtual security device). When a cluster is created, redundancy group 0 is also created by default. No additional configuration is required to make it work.

Each node is given a priority within a redundancy group. The higher-priority device is given mastership over the redundancy group. This depends on a few options, and one of them, by default, is that a node with a higher priority will not preempt the device with the lower priority. The result is that if a lower-priority node were to have ownership of a redundancy group and then a node with the higher-priority were to come online, it would not give ownership to the higher-priority device. To enable this, the preempt option would need to be enabled, and the device with the higher priority would take ownership of the redundancy group when it was healthy to do so. Most organizations do not use this option—they want to manually move the redundancy group back to the node after the failover is investigated.

Creating a redundancy group is the same for the control or data plane, with the only difference seen when configuring the interfaces. Let’s create an example with redundancy group 0. Remember that this is not required, but doing so helps to create the redundancy group and set the node priorities, because if the node priorities are not set they default to 1.

Most organizations use node 0 as the higher-priority device. It’s best when configuring the cluster to keep the configuration logical. When troubleshooting in the middle of the night, it’s great to know that node 0 should be the higher-priority node and that it is the same across the whole organization.

Let’s create the redundancy group:

Default:
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority        Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   1               primary        no       no
node1                   1           secondary      no       no

{primary:node0}
root@SRX210-A>

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 0 node 0 priority 254

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 0 node 1 priority 1

{primary:node0}[edit chassis cluster]
root@SRX210-A# show redundancy-group 0
node 0 priority 254;
node 1 priority 1;

root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority       Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

{primary:node0}
root@SRX210-A>

Now let’s create redundancy group 1. The most common firewall deployment for the SRX is a Layer 3-routed active/passive deployment. This means the firewalls are configured as a router and that one device is active and the other is passive. To accomplish this, a single data plane redundancy group is created. It uses the same commands as used to create redundancy group 0 except for the name redundancy-group 1.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 1 node 0 priority 254

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 1 node 1 priority 1

{primary:node0}[edit chassis cluster]
root@SRX210-A# set chassis cluster reth-count 2

{primary:node0}[edit chassis cluster]
root@SRX210-A# show
reth-count 2;
heartbeat-interval 2000;
heartbeat-threshold 8;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
}

{primary:node0}[edit chassis cluster]
root@SRX210-A#
{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority      Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

{primary:node0}
root@SRX210-A>

To keep things consistent, redundancy group 1 also gives node 0 a priority of 254 and node 1 a priority of 1. To be able to commit the configuration, at least one reth has to be enabled (it’s shown here but is further discussed in the next section). After commit, the new redundancy group can be seen in the cluster status. It looks exactly like redundancy group 0 and contains the same properties.

When creating an active/active configuration and utilizing redundant Ethernet interfaces, the SRX needs to have at least two redundancy groups. Each node in the cluster will have an active redundancy group on it. You configure this redundancy group in the same way as you did the other redundancy group, except that the other node will be configured with a higher priority. In this case, node 1 will have priority 254 and node 0 will have priority 1.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 2 node 0 priority 1

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 2 node 1 priority 254

{primary:node0}[edit chassis cluster]
root@SRX210-A# show
reth-count 2;
heartbeat-interval 2000;
heartbeat-threshold 8;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 2 {
node 0 priority 1;
node 1 priority 254;
}

{primary:node0}[edit chassis cluster]
root@SRX210-A#

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                 Priority      Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 2 , Failover count: 0
node0                   1           secondary      no       no
node1                   254         primary        no       no

{primary:node0}
root@SRX210-A>

Now, three redundancy groups are listed. The newest redundancy group, redundancy group 2, has node 1 as its primary and node 0 as its secondary. In this case, all of the traffic for redundancy group 2 will be flowing through node 1, and redundancy group 1’s traffic will be flowing through node 0. In the event of a failover each node has a mirrored state table of the peer device so it is possible for either node to take over all redundancy groups.

It’s important to plan for the possibility that a single device might have to handle all of the traffic for all of the redundancy groups. If you don’t plan for this, the single device can be overwhelmed.

Each redundancy group needs a minimum of one reth in it to operate. Because of this, the total number of redundancy groups is tied to the total number of reths per platform, plus one for redundancy group 0. Table 7-4 lists the number of supported redundancy groups per SRX platform.

Table 7-4. Redundancy groups per platform
 Platform Redundancy groups SRX100 9 SRX110 9 SRX210 9 SRX220 9 SRX240 25 SRX550 69 SRX650 69 SRX1400 128 SRX3400 128 SRX3600 128 SRX5600 128 SRX5800 128

As previously discussed, it’s possible to have the node with the higher priority preemptively take over the redundancy group. By default, the administrator would need to manually fail over the redundancy group to the other node. Configuring a preempt only requires a single command under the redundancy group as shown here, but redundancy groups also have a default hold-down timer, or the time that the redundancy group must wait until it can preempt. On redundancy group 1 and greater, it is set to one second. On redundancy group 0, it is set to 300 seconds or 5 minutes to prevent instability on the control plane.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 1 preempt

{primary:node0}[edit chassis cluster]
root@SRX210-A# show
reth-count 2;
heartbeat-interval 2000;
heartbeat-threshold 8;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
preempt;
}

{primary:node0}[edit chassis cluster]
root@SRX210-A#

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary      no       no
node1                   1           secondary    no       no

Redundancy group: 1 , Failover count: 1
node0                   254         primary      yes      no
node1                   1           secondary    yes      no

{primary:node0}
root@SRX210-A>

A hold-down timer can be set to prevent unnecessary failovers in a chassis cluster, used in conjunction with preempt as the number of seconds to wait until the redundancy group can fail over. As previously mentioned, default hold-down timers are configured: for redundancy group 1, it’s 1 second; for redundancy group, 0 it’s 300 seconds. You can customize the timer and set it between 0 and 1,800 seconds, but best practice suggests to never set redundancy group 0 to less than 300 seconds to prevent instability on the control plane.

It’s best to set a safe number for the redundancy groups to ensure that the network is ready for the failover, and in the event of a hard failure on the other node, the redundancy group will fail over as fast as possible.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set redundancy-group 1 hold-down-interval 5

{primary:node0}[edit chassis cluster]
root@SRX210-A# show
reth-count 2;
heartbeat-interval 2000;
heartbeat-threshold 8;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
preempt;
hold-down-interval 5;
}

{primary:node0}[edit chassis cluster]
root@SRX210-A#

## Integrating the Cluster into Your Network

Once the SRXs are talking with each other and their configurations are correctly syncing, it is time to integrate the devices into your network. Waiting to configure the network after enabling the cluster is the best practice to follow. Not only does it save time but it reduces the amount of configuration steps needed as the configuration is shared across both devices. To use a cluster in your network, you need to create a special interface called a reth (often pronounced like wreath) interface. This interface is used as a shared interface between the devices. Although there are other more advanced methods to add a cluster into a network, the suggested design is to use an active/active cluster.

### Configuring Interfaces

A firewall without interfaces is like a car without tires—it’s just not going to get you very far. In the case of chassis clusters, there are two different options: the reth, and the local interface. A reth is a special type of interface that integrates the features of an aggregate Ethernet interface together with redundancy groups.

Before redundant Ethernet interfaces are created, the total number of interfaces in the chassis must be specified. This is required because the reth is effectively an aggregate Ethernet interface, and an interface needs to be provisioned before it can work.

It is suggested that you only provision the total number of interfaces that are required to conserve resources.

Let’s set the number of interfaces in the chassis and then move on to create redundancy groups 1+ and configure the interfaces.

{primary:node0}[edit chassis cluster]
root@SRX210-A# set reth-count 2

{primary:node0}[edit chassis cluster]
root@SRX210-A# show
reth-count 2;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
}

{primary:node0}[edit chassis cluster]
root@SRX210-A#

{primary:node0}
root@SRX210-A> show interfaces terse | match reth
reth0                   up    up
reth1                   up    up

Each SRX platform has a maximum number of reths that it can support, as listed in Table 7-5.

Table 7-5. Reth count per platform
 Platform Redundant Ethernet interfaces SRX100 8 SRX110 8 SRX210 8 SRX220 8 SRX240 24 SRX550 58 SRX650 68 SRX1400 128 SRX3400 128 SRX3600 128 SRX5600 128 SRX5800 128

Now let’s create a reth. When using a reth, each member of the cluster has one or more local interfaces that participate in the reth.

{primary:node0}[edit interfaces]
root@SRX210-A# set fe-0/0/2 fastether-options redundant-parent reth0

{primary:node0}[edit interfaces]
root@SRX210-A# set fe-2/0/2 fastether-options redundant-parent reth0

{primary:node0}[edit interfaces]
root@SRX210-A# set reth0.0 family inet address 172.16.0.1/24

{primary:node0}
root@SRX210-A# set interfaces reth0 redundant-ether-options redundancy-group 1

{primary:node0}[edit interfaces]
root@SRX210-A# show
fe-0/0/2 {
fastether-options {
redundant-parent reth0;
}
}
fe-2/0/2 {
fastether-options {
redundant-parent reth0;
}
}
fab0 {
fabric-options {
member-interfaces {
fe-0/0/4;
fe-0/0/5;
}
}
}
fab1 {
fabric-options {
member-interfaces {
fe-2/0/4;
fe-2/0/5;
}
}
}
fxp0 {
unit 0 {
family inet {
master-only;
}
}
}
}
reth0 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
}
}

{primary:node0}
root@SRX210-A#

In this configuration example, interfaces fe-0/0/2 and fe-2/0/2 have reth0 specified as their parent. Then the reth0 interface is specified as a member of redundancy group 1, and finally the interface is given an IP address. From here the interface can be configured with a zone so that it can be used in security policies for passing network traffic.

After commit, there are two places to validate that the interface is functioning properly, as shown in the following output. First, the user can look at the interface listing to show the child links and also the reth itself. Second, under the chassis cluster status, Junos shows if the interface is up or not. The reason to use the second method of validation is that although the child links might be physically up, the redundancy groups might have a problem, and the interface could be down as far as jsrpd is concerned (we discussed this in the section “Cluster ID” in this chapter).

{primary:node0}
root@SRX210-A> show interfaces terse | match reth0
fe-0/0/2.0              up    up   aenet    --> reth0.0
fe-2/0/2.0              up    up   aenet    --> reth0.0
reth0                   up    up
reth0.0                 up    up   inet     172.16.0.1/24

{primary:node0}
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Down        Not configured

{primary:node0}
root@SRX210-A>

With the data center SRX firewalls, it’s possible to utilize multiple child links per node in the cluster, meaning that each node can have up to eight links configured together for its reth interface. The requirement for this to work is that both nodes must have the same number of links on each chassis. It works exactly like a traditional reth where only one chassis will have its links active, and the secondary node’s links are still waiting until a failover occurs. Configuring this is similar to what was done before; the noted difference is that additional interfaces are made child members of the reth.

{primary:node0}[edit interfaces
root@SRX5800-1# set xe-6/2/0 gigether-options redundant-parent reth0

{primary:node0}[edit interfaces]
root@SRX5800-1# set xe-6/3/0 gigether-options redundant-parent reth1

{primary:node0}[edit interfaces]
root@SRX5800-1# set xe-18/2/0 gigether-options redundant-parent reth0

{primary:node0}[edit interfaces]
root@SRX5800-1# set xe-18/3/0 gigether-options redundant-parent reth1

{primary:node0}[edit interfaces]
root@SRX5800-1# show interfaces
xe-6/0/0 {
gigether-options {
redundant-parent reth0;
}
}
xe-6/1/0 {
gigether-options {
redundant-parent reth1;
}
}
xe-6/2/0 {
gigether-options {
redundant-parent reth0;
}
}
xe-6/3/0 {
gigether-options {
redundant-parent reth1;
}
}
xe-18/0/0 {
gigether-options {
redundant-parent reth0;
}
}
xe-18/1/0 {
gigether-options {
redundant-parent reth1;
}
}
xe-18/2/0 {
gigether-options {
redundant-parent reth0;
}
}
xe-18/3/0 {
gigether-options {
redundant-parent reth1;
}
}
reth0 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
}
}
reth1 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
}
}

{primary:node0}
root@SRX5800-1#

{primary:node0}
root@SRX5800-1> show interfaces terse | match reth
xe-6/0/0.0              up    up   aenet    --> reth0.0
xe-6/1/0.0              up    up   aenet    --> reth1.0
xe-6/2/0.0              up    down aenet    --> reth0.0
xe-6/3/0.0              up    down aenet    --> reth1.0
xe-18/0/0.0             up    up   aenet    --> reth0.0
xe-18/1/0.0             up    up   aenet    --> reth1.0
xe-18/2/0.0             up    up   aenet    --> reth0.0
xe-18/3/0.0             up    up   aenet    --> reth1.0
reth0                   up    up
reth0.0                 up    up   inet     1.0.0.1/16
reth1                   up    up
reth1.0                 up    up   inet     2.0.0.1/16

{primary:node0}
root@SRX5800-1> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

{primary:node0}
root@SRX5800-1>

As seen here, the configuration is identical except that additional interfaces are added as members of the reth. As far as the switch it is connected to, the interface is considered an aggregate Ethernet, link agg group, or EtherChannel depending on the vendor. It’s also possible to use LACP as well.

When a failover occurs to the secondary node, the node must announce to the world that it is now owner of the MAC address associated with the reth interface (because the reth’s MAC is shared between nodes). It does this using GARPs, ARPs that are broadcast but not specifically requested. Once a GARP is sent, the local switch will be able to update its MAC table to map which port the MAC address is associated with. By default, the SRX sends four GARPs per reth on a failover. These are sent from the control plane and out through the data plane. To modify the number of GARPs sent, this must be configured on a per-redundancy-group basis. Use the set gratuitous-arp-count command and a parameter between 1 and 16.

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# set gratuitous-arp-count 5

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# show
node 0 priority 254;
node 1 priority 1;
gratuitous-arp-count 5;

{primary:node0}
root@SRX210-A#

One last item to mention is the use of local interfaces. A local interface is not bound or configured to a redundancy group; it’s exactly what the name means: a local interface. It is configured like any traditional type of interface on a Junos device and is used in an active/active scenario. It does not have a backup interface on the second device.

## Fault Monitoring

“In the event of a failure, your seat cushion may be used as a flotation device.” If your plane were to crash and you were given notice, you would take the appropriate action to prevent disaster. When working with a chassis cluster, an administrator wants to see the smoke before the fire. That is what happens when an administrator configures monitoring options in the chassis cluster. The administrator is looking to see if the plane is going down so that she can take evasive action before it’s too late. By default, the SRX monitors for various internal failures such as hardware and software issues. But what if other events occur, such as interfaces failing or upstream gateways going away? If the administrator wants to take action based on these events, she must configure the SRX to take action.

The SRX monitoring options are configured on a per-redundancy-group basis, meaning that if specific items were to fail, that redundancy group can fail over to the other chassis. In complex topologies, this gives the administrator extremely flexible options on what to fail over and when. Two integrated features can be used to monitor the redundancy groups: interface monitoring and IP monitoring.

And there are two situations the SRXs can be in when a failure occurs. The first is that the SRXs are communicating and the two nodes in the cluster are both functional. If this is the case, and a failure occurs, the failover between the two nodes will be extremely fast because the two nodes can quickly transfer responsibility for passing traffic between them. The second scenario is when the two nodes lose communication. This could be caused by a loss of power or other factors. In this case, all heartbeats between the chassis must be missed before the secondary node can take over for the primary, taking anywhere from 3 to 16 seconds, depending on the platform.

In this section, each failure scenario is outlined so that the administrator can gain a complete understanding of what to expect if or when a failure occurs.

### Interface Monitoring

Interface monitoring monitors the physical status of an interface. It checks to see if the interface is in an up or down state. When one or more monitored interfaces fail, the redundancy group fails over to the other node in the cluster.

The determining factor is when a specific weight is met, and in this case it is 255. The weight of 255 is the redundancy group threshold that is shared between interface monitoring and IP monitoring. Once enough interfaces have failed to meet this weight, the failover for the redundancy group occurs. In most situations, interface monitoring is configured in such a way that if one interface were to fail, the entire redundancy group would fail over. However, it could be configured that two interfaces need to fail. In this first configuration, only one interface needs to fail to initiate a failover.

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# set interface-monitor fe-0/0/2 weight 255

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# set interface-monitor fe-2/0/2 weight 255

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# show
node 0 priority 254;
node 1 priority 1;
interface-monitor {
fe-0/0/2 weight 255;
fe-2/0/2 weight 255;
}

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A#
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Down        Not configured

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-2/0/2          255       Up        1
fe-0/0/2          255       Up        1

{primary:node0}
root@SRX210-A>

In this example, interfaces fe-0/0/2 and fe-2/0/2 are configured with a weight of 255. In the event that either interface fails, the redundancy group will fail over.

In the next example, the interface has failed. Node 0 immediately becomes secondary and its priority becomes zero for redundancy group 1. This means it will only be used as a last resort for the primary of redundancy group 1. After restoring the cables, everything becomes normal again.

{primary:node0}
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Down        Not configured

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-2/0/2          255       Up        1
fe-0/0/2          255       Down      1

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 2
node0                   0           secondary      no       no
node1                   1           primary        no       no

{primary:node0}
root@SRX210-A>

In this example:

{primary:node0}
root@SRX210-A# set interfaces fe-0/0/3 fastether-options redundant-parent reth1

{primary:node0}
root@SRX210-A# set interfaces fe-2/0/3 fastether-options redundant-parent reth1

{primary:node0}
root@SRX210-A# set interfaces reth0 redundant-ether-options redundancy-group 1

{primary:node0}
root@SRX210-A# set interfaces reth1 redundant-ether-options redundancy-group 1

{primary:node0}
root@SRX210-A# set interfaces reth1.0 family inet address 172.17.0.1/24

{primary:node0}
root@SRX210-A# show interfaces ## Truncated to only show these interfaces
fe-0/0/3 {
fastether-options {
redundant-parent reth1;
}
}
fe-2/0/3 {
fastether-options {
redundant-parent reth1;
}
}
reth1 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
}
}

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# set interface-monitor fe-0/0/2 weight 128

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# set interface-monitor fe-2/0/2 weight 128

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A# show
node 0 priority 254;
node 1 priority 1;
interface-monitor {
fe-0/0/2 weight 128;
fe-2/0/2 weight 128;
}

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX210-A#

{primary:node0}
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-2/0/2          128       Up        1
fe-0/0/2          128       Up        1

{primary:node0}
root@SRX210-A>

Both interfaces are needed to trigger a failover. The next sequence shows where node 0 will lose one interface from each of its reths. This causes a failover to occur on node 1.

{primary:node0}
root@SRX210-A# show chassis cluster redundancy-group 1
node 0 priority 254;
node 1 priority 1;
interface-monitor {
fe-0/0/2 weight 128;
fe-0/0/3 weight 128;
}

{primary:node0}
root@SRX210-A#

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority      Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 3
node0                   254         primary        no       no
node1                   1           secondary      no       no

{primary:node0}
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-0/0/3          128       Up        1
fe-0/0/2          128       Up        1

{primary:node0}
root@SRX210-A>

{primary:node0}
root@SRX210-A> show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-0/0/3          128       Down      1
fe-0/0/2          128       Down      1

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                Priority       Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 4
node0                   0           secondary      no       no
node1                   1           primary        no       no

{primary:node0}
root@SRX210-A>

Here it required both interfaces to go down to fail over to the other node.

Only physical interfaces can be monitored. The reths themselves can’t be monitored.

Interface monitoring should be done on nonzero redundancy groups and not on the control plane, because best practice urges you to only allow the control plane to fail over in the event of a hard failure.

### IP Monitoring

IP monitoring allows for the monitoring of upstream gateways. When using IP monitoring, the ping probe validates the entire end-to-end path from the SRX to the remote node and back. The feature is typically used to monitor its next hop gateway, ensuring the gateway is ready to accept packets from the SRX. This is key, as the SRX’s link to its local switch could be working but the upstream devices might not.

IP monitoring is configured per redundancy group and has some similarities to interface monitoring. It also uses weights, and when the weights add up to exceed the redundancy group weight, a failover is triggered. But with IP monitoring, the SRX is monitoring remote gateways, not interfaces.

In each redundancy group there are four global options that affect all of the hosts that are to be monitored:

• The first option is the global weight. This is the weight that is subtracted from the redundancy group weight for all of the hosts being monitored.

• The second option is the global threshold. This is the number that needs to be met or exceeded by all of the cumulative weights of the monitored IPs to trigger a failover.

• The last two options are the retry attempts for the ping. The first is the retry count, which is the number of times to retry between failures. The minimum setting is five retries.

• The last is the retry interval, and this value specifies the number of seconds between replies. The default retry time is one second.

Here the configuration options can be seen using the help prompt.

root@SRX5800-1# set redundancy-group 1 ip-monitoring ?
Possible completions:
+ apply-groups         Groups from which to inherit configuration data
+ apply-groups-except  Don't inherit configuration data from these groups
> family               Define protocol family
global-threshold     Define global threshold for IP monitoring (0..255)
global-weight        Define global weight for IP monitoring (0..255)
retry-count          Number of retries needed to declare reachablity failure
(5..15)
retry-interval       Define the time interval in seconds between retries.
(1..30)
{primary:node0}[edit chassis cluster]
root@SRX5800-1#

These IP monitoring options can be overwhelming, but they are designed to give the user more flexibility. The redundancy group can be configured to fail over if one or more of the monitored IPs fail or if a combination of the monitored IPs and interfaces fail.

In the next example, two monitored IPs are going to be configured. Both of them need to fail to trigger a redundancy group failure. The SRX will use routing to resolve which interface should be used to ping the remote host (you could also go across virtual routers as of Junos 10.1 and later).

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# set ip-monitoring family inet 1.2.3.4 weight 128

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# set ip-monitoring family inet 1.3.4.5 weight 128

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# show
node 0 priority 200;
node 1 priority 100;
ip-monitoring {
global-weight 255;
global-threshold 255;
family {
inet {
1.2.3.4 weight 128;
1.3.4.5 weight 128;
}
}
}

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1#

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# run show chassis cluster ip-monitoring status
node0:
--------------------------------------------------------------------------

Redundancy group: 1

IP address   Status       Failure count  Reason
1.3.4.5      unreachable    1            redundancy-group state unknown
1.2.3.4      unreachable    1            redundancy-group state unknown

node1:
---------------------------------------------------------------------

Redundancy group: 1

IP address    Status       Failure count  Reason
1.3.4.5       unreachable    1            redundancy-group state unknown
1.2.3.4       unreachable    1            redundancy-group state unknown

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# run show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   200         primary        no       no
node1                   100         secondary      no       no

Redundancy group: 1 , Failover count: 1
node0                   0           primary        no       no
node1                   0           secondary      no       no

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1#

After you have studied that, the next example uses a combination of both IP monitoring and interface monitoring, and it shows how the combined weight of the two will trigger a failover.

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# show
node 0 priority 200;
node 1 priority 100;
interface-monitor {
xe-6/1/0 weight 255;
}
ip-monitoring {
global-weight 255;
global-threshold 255;
family {
inet {
1.2.3.4 weight 128;
}
}
}

{primary:node0}
root@SRX5800-1# run show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   200         primary        no       no
node1                   100         secondary      no       no

Redundancy group: 1 , Failover count: 2
node0                   200         secondary      no       no
node1                   100         primary        no       no

{primary:node0}
root@SRX5800-1# run show chassis cluster ip-monitoring status
node0:
-------------------------------------------------------------------

Redundancy group: 1

IP address  Status     Failure count  Reason
1.2.3.4     unreachable  1            redundancy-group state unknown

node1:
--------------------------------------------------------------------

Redundancy group: 1

IP address  Status     Failure count  Reason
1.2.3.4     unreachable  1            redundancy-group state unknown

{primary:node0}
root@SRX5800-1# run show chassis cluster interfaces ?
Possible completions:
<[Enter]>            Execute this command
|                    Pipe through a command
{primary:node0}
root@SRX5800-1# run show chassis cluster interfaces

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1
reth2        Down        1
reth3        Up          1

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
xe-6/1/0          128       Up        1

{primary:node0}
root@SRX5800-1#

Here the ping for IP monitoring is sourced from the reth’s active device, with the IP address configured on the specified interface. Optionally, it’s possible to configure a secondary IP to trigger the ping to come from the configured secondary IP address and from the backup interface, allowing the administrator to check the backup path coming from the secondary node. This would ensure that before a failover occurs, the backup path is working. Let’s configure this option. It only takes one additional step per monitored IP.

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# set ip-monitoring family inet 1.2.3.4 weight 255interface reth0.0 secondary-ip-address 1.0.0.10

{primary:node0}[edit chassis cluster redundancy-group 1]
root@SRX5800-1# show
node 0 priority 200;
node 1 priority 100;
ip-monitoring {
global-weight 255;
global-threshold 255;
family {
inet {
1.2.3.4 {
weight 255;
}
}
}
}

{primary:node0}
root@SRX5800-1# run show chassis cluster ip-monitoring status
node0:
----------------------------------------------------------------------

Redundancy group: 1

IP address                Status        Failure count  Reason
1.2.3.4                   unreachable     0            no route to host

node1:
----------------------------------------------------------------------

Redundancy group: 1

IP address                Status        Failure count  Reason
1.2.3.4                   unreachable     0            no route to host

{primary:node0}
root@SRX5800-1#

The SRX5000 Series products can create up to 64 monitored IPs and the SRX3000 Series can create 32. The ping is generated from the second SPU on the system, which is the first non-CP SPU, and because of that, it is not limited to scheduling or processing restrictions found on the RE. The branch devices operate slightly differently. The best practice is to minimize the total number of monitored hosts to two on the branch devices. The more devices that you add, the more difficult it could be to ensure the device has the processing to monitor the remote nodes.

### Hardware Monitoring

On the SRX, there is a daemon running called chassisd. This process is designed to run and control the system hardware, and it is also used to monitor for faults. If the chassisd determines that the system has experienced specific faults, it will trigger a failover to the other node. Depending on the SRX platform, various components can fail before a complete failover is triggered.

The majority of the branch platforms are not component-based. This means the entire system consists of a single board, and if anything were to go wrong on the main board, generally the complete system would fail. The branch SRX devices also have interface cards, and if the cards fail, the local interfaces are lost. Interface monitoring can be used to detect if the interface has failed.

The data center devices are a different story. These devices have many different boards and system components, and because of this, the failover scenarios can get fairly complex. Both Juniper Networks and customers thoroughly test the reliability of the devices, and each component is failed in a matrix of testing scenarios to ensure that failovers are correctly covered.

#### Route engine

The RE is the local brain of a chassis. Its job is to maintain control over the local cards in the chassis. It ensures that all of them are up and running and it allows the administrator to manage the device. If the RE fails, it can no longer control the local chassis, and if that RE was the primary for the cluster, the secondary engineer will pause until enough heartbeats are missed that it assumes mastership.

During this period, the local chassis will continue to forward (the data plane without an RE will continue to run for up to three minutes), but as soon as the other RE contacts the SPUs, they will no longer process traffic. By this time, the secondary data plane will have taken over for the traffic.

In the event that the secondary RE fails, that chassis immediately becomes lost. After the heartbeat threshold is passed, the primary RE will assume the other chassis has failed, and any active traffic running on the chassis in redundancy groups will fail over to the remaining node. Traffic that used local interfaces must use another protocol, such as OSPF, to fail over to the other node.

#### Switch control board

The switch control board is a component that is unique to the SRX5000 Series. This component contains three important systems: the switch fabric, the control plane network, and the carrier slot for the RE. It’s a fairly complex component, as it effectively connects everything in the device. The SRX5600 requires one SCB and can have a second for redundancy. The SRX5800 requires two SCBs and can have a third for redundancy.

If an SCB fails in the SRX5600, it will fail over to the second SCB. Its redundancy, however, causes a brief blip in traffic and then things start moving along. The second SCB also requires the use of a local RE, the same simple RE that is used to bring up dual control links. The second RE is needed to activate the local control plane switching chip on the second SCB—if this was not in place, the RE would be unable to talk to the rest of the chassis.

The SRX5800’s behavior is different because, by default, it has two SCBs. These are required to provide full throughput to the entire chassis, and if one were to fail, the throughput would be halved until a new SCB is brought online. The same conditions as the SRX5600 also apply here. If the SCB containing the RE were to fail, a secondary RE would need to be in the second SCB to provide the backup control network for the RE to communicate. If the SCB that does not contain the primary RE fails, the maximum throughput of the chassis is cut in half. This means all of the paths in the box are halved. If a third SCB is installed, it will take over for either of the failed SCBs. It cannot provide a redundant control link as it is not able to contain an RE, and when the switchover happens to the third SCB, it will briefly interrupt traffic as the switchover occurs.

Now, all of this should pose a question to the careful reader: if the RE is contained in an SCB and the SCB fails, will this affect the RE? The answer depends on the type of failure. If the fabric chips fail, the RE will be fine, as the SCB simply extends the connections from the back plane into the RE. The engineers put the RE in the SCB to conserve slots in the chassis and reserve them for traffic processing cards. It is possible for an SCB to fail in such a way that it will disable the engineer; it’s unlikely, but possible.

#### Switch fabric board

The SFB is a component unique to the SRX3000 Series platform. It contains the switch fabric, the primary control plane switch, the secondary control plane switch, an interface card, and the control ports. If this component were to fail, the chassis would effectively be lost. The SFB’s individual components can fail as well, causing various levels of device degradation. In the end, once the integrity of the card is lost, the services residing in that chassis will fail over to the remaining node.

#### Services Processing Card/Next Generation Services Processing Card

The SPC contains one, two, or up to four SPUs, depending on the model of the SRX. Each SPU is monitored directly by the SRX’s local RE chassisd process. If any SPU fails, several events will immediately occur. The RE will reset all of the cards on the data plane, including interfaces and NPCs. Such an SPU failure causes the chassis monitoring threshold to hit 255. This causes all of the data plane services to fail over to the secondary chassis. Messages relating to SPUs failing can be seen in the jsrpd logs. The entire data plane is reset because it is easier to ensure that everything is up and running after a clean restart, rather than having to validate many individual subsystems. Each subsystem is validated after a clean restart of the chassis.

#### Network Processing Card

A separate NPC is unique to the SRX3000 Series (these items are located on the interface cards on the SRX5000). They were separated out to lower the component costs and to lower the overall cost of the chassis. The SRX3000 has static bindings to each interface. So if an NPC were to fail, the interface bound to it would effectively be lost, as it would not have access to the switching fabric. The chassis will be able to detect this by using simple health checks; alternatively, IP monitoring can be used to validate the next hop. This message would be sent from the SPC and then through the NPC. Because the NPC has failed, the messages will not make it out of the chassis. At this point, IP monitoring triggers a failover to the other node. The NPC failure ultimately triggers a failover to the remaining node in the chassis, and the chassis with the failure restarts all of the cards. If the NPC with the failure is unable to restart, the interfaces are mapped to new NPCs, assuming there are some remaining. Although the device can run in a degraded state, it’s best to leave all of the traffic on the good node and replace the failed component.

#### Interface card

The SRX data center devices have both types of interface cards, often referred to as input/output cards. However, there are stark differences between the two. The IOCs on the SRX3000 contain a switching chip (used to connect multiple interfaces to a single bus) and a Field Programmable Gate Array (FPGA) to connect into the fabric. The IOCs on the SRX5000s contain two or more sets of NPUs, fabric connect chips, and physical interfaces. If an SRX5000 Series interface card fails and it does not contain a monitored interface, or the only fabric link, the SRX will rely on the administrator to use interface monitoring or IP monitoring to detect a failure. The same is true with the SRX3000 Series platforms. On the SRX5000 Series, it is also possible to hot-swap interfaces to replace the card, whereas the SRX3000 requires that the chassis be powered off to replace a card.

#### Power supplies

It’s obvious that if the device’s sole source of power fails, the device shuts off. This will cause the remaining node to perform Dead Peer Detection (DPD) to determine if the other node is alive. DPD is done with jsrpd heartbeats. If the remaining node is the primary device for the control and data planes, it continues to forward traffic as is. It notes that the other node is down because it cannot communicate with it. If the remaining node was secondary, it will wait until all of the heartbeats are missed before it determines that the node has failed. Once the heartbeats have been passed, it assumes mastership of the node.

For devices with redundant power supplies, the remaining power supply will power the chassis and it will continue to operate. This is applicable to the SRX650 and the SRX3400. The SRX3600 has up to four power supplies, and it requires at least two to operate. The other two are used for redundancy. So, in the best availability deployment, four should be deployed.

The SRX5000 Series devices each have up to four power supplies. At a suggested minimum, three should be used. Depending on the total number of cards running in the chassis, a single power supply can be used. If the total draw from the installed components exceeds the available power, all of the cards will be turned off. The RE will continue attempting to start the cards until the power is available. It’s always best to deploy the SRXs with the highest amount of available power supply to ensure availability.

### Software Monitoring

The SRX is set up to monitor the software that is running, and this is true for both the control and data planes. The SRX attempts to detect a failure within the system as soon as it happens, and if and when it can detect a failure within the system, it must react accordingly. The SRX platform has some fairly complex internals, but it is built to protect against failures. So if the RE has a process that fails, it can restart it, and the failure is logged for additional troubleshooting.

The branch’s data plane consists of a core flowd process. The RE is in constant communication to watch if it is acting correctly. In the event that the flowd process crashes or hangs up, the control plane quickly fails over to the other node. This will happen in less than the time it would take to detect a dead node. In any failure case where the two nodes are still in communication, the failover time is quite fast. These cases include IP monitoring, manual failover, and interface monitoring.

On the data center SRX’s data plane, each SPU has both control and data software running on it. The RE talks directly to each SPU’s control software for status updates and for configuration changes. Because of this, the RE will know if the data plane fails. If the flowd processes crash on the data plane (there is one per SPU), the entire data plane will be hard-reset, which means all of the line cards will be reset. This is done to ensure that the control plane is completely up to an acceptable running standard. To do this, the data plane is failed over to the secondary node.

### Preserving the Control Plane

If a device is set up to rapidly fail over, it’s possible that it could be jumping the gun and attempting a failover for no reason. When it’s time to move between two firewalls, it’s best to ensure that the time is correct for the failover. There are methods in dynamic routing to do extremely fast failover using a protocol called bidirectional forwarding detection (BFD). This protocol is used in conjunction with a routing protocol such as OSPF. It can provide 50-ms failovers. That is extremely fast but provides little threat to the network. In this case, BFD is rerouting around a link or device failure typically in a stateless manner. Because it’s done stateless, there is little threat to the traffic.

When a stateful firewall does a failover, there is much more in play than simply rerouting traffic. The new device needs to accept all of the traffic and match up the packets with the existing sessions that are synchronized to the second node. Also, the primary device needs to relinquish control of the traffic. On the data plane, it’s a fairly stable process to fail over and fail back between nodes. In fact, this can be done rapidly and nearly continuously without much worry. It’s best to let the control plane fail over only in the event of a failure, as there simply isn’t a need to fail over the control plane unless a critical event occurs.

The biggest reason for any concern is that the control plane talks to the various daemons on the other chassis and on the data plane. If rapid failover were to occur, it’s possible to destabilize the control plane. This is not a rule, it’s an exception, just as the owner of a car isn’t going to jam the car into reverse on the highway. Often, administrators want to test the limits of the SRX and drop them off shelves and whatnot, so it’s fair to call this out as a warning before it’s tested in production.

## Troubleshooting and Operation

From time to time things can go wrong. You can be driving along in your car and a tire can blow out; sometimes a firewall can crash. Nothing that is made by humans is precluded from undergoing an unseen failure. Because of this, the administrator must be prepared to deal with the worst possible scenarios. In this section, we discuss various methods that show the administrator how to troubleshoot a chassis cluster gone awry.

### First Steps

There are a few commands to use when trying to look into an issue. The administrator needs to first identify the cluster status and determine if it is communicating.

The show chassis cluster status command, although simple in nature, shows the administrator the status of the cluster. It shows who is the primary member for each redundancy group and the status of those nodes, and it will give insight into who should be passing traffic in the network. Here’s a sample:

{primary:node1}
root@SRX210-B> show chassis cluster status
Cluster ID: 1
Node              Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                254         secondary      no       no
node1                1           primary        no       no

Redundancy group: 1 , Failover count: 2
node0                254         primary        no       no
node1                1           secondary      no       no

{primary:node1}
root@SRX210-B>

You should have seen this many times in this chapter, as it is used frequently. Things to look for here are that both nodes show as up; both have a priority greater than zero; both have a status of either primary, secondary, or secondary-hold; and one and only one node is primary for each redundancy group. Generally, if those conditions are met, things in the cluster should be looking okay. If not, and for some reason one of the nodes does not show up in this output, communication to the other node has been lost. The administrator should then connect to the other node and verify that it can communicate.

To validate that the two nodes can communicate, the show chassis cluster control-plane statistics command is used, showing the messages that are being sent between the two members. The send and receive numbers should be incrementing between the two nodes. If they are not, something might be wrong with both the control and fabric links. Here is an example with the statistics in bold:

{primary:node0}
root@SRX210-A> show chassis cluster control-plane statistics
Heartbeat packets sent: 124
Heartbeat packet errors: 0
Probes sent: 122
Probe errors: 0

{primary:node0}
root@SRX210-A>

Again, this command should be familiar as it has been used in this chapter. If these (boldface) numbers are not increasing, check the fabric and control plane interfaces. The fabric interfaces method is the same across all SRX products.

Next let’s check the fabric links. It’s important to verify that the fabric link and the child links show they are in an up state.

{primary:node0}
root@SRX210-A> show interfaces terse
--snip--
fe-0/0/4.0              up    up   aenet    --> fab0.0
fe-0/0/5                up    up
fe-0/0/5.0              up    up   aenet    --> fab0.0
--snip--
fe-2/0/4.0              up    up   aenet    --> fab1.0
fe-2/0/5                up    up
fe-2/0/5.0              up    up   aenet    --> fab1.0
--snip--
fab0                    up    up
fab0.0                  up    up   inet     30.17.0.200/24
fab1                    up    up
fab1.0                  up    up   inet     30.18.0.200/24
--snip--
{primary:node0}
root@SRX210-A>

If any of the child links of the fabric link, fabX, show in a down state, this would show the interface that is physically down on the node. This must be restored to enable communications.

The control link is the most critical to verify, and it varies per SRX platform type. On the branch devices, the interface that is configured as the control link must be checked. This is specified in Table 7-2’s control ports by platform. The procedure would be the same as any physical interface. Here an example from an SRX210 was used, and it shows that the specified interfaces are up.

{primary:node0}
root@SRX210-A> show interfaces terse
--snip--
fe-0/0/7                up    up
--snip--
fe-2/0/7                up    up
--snip--

{primary:node0}
root@SRX210-A>

On the data center SRXs, there is no direct way to check the state of the control ports; because the ports are dedicated off of switches inside the SRX and they are not typical interfaces, it’s not possible to check them. It is possible, however, to check the switch that is on the SCB to ensure that packets are being received from that card. Generally, though, if the port is up and configured correctly, there should be no reason why it won’t communicate. But checking the internal switch should show that packets are passing from the SPC to the RE. There will also be other communications coming from the card as well, but this at least provides insight into the communication. To check, the node and FPC that has the control link must be known. In the following command, the specified port coincides with the FPC number of the SPC with the control port.

{primary:node0}
root@SRX5800-1> show chassis ethernet-switch statistics 1 node 0
node0:
------------------------------------------------------------------
Displaying port statistics for switch 0
Statistics for port 1 connected to device FPC1:
TX Packets 64 Octets        7636786
TX Packets 65-127 Octets    989668
TX Packets 128-255 Octets   37108
TX Packets 256-511 Octets   35685
TX Packets 512-1023 Octets  233238
TX Packets 1024-1518 Octets  374077
TX Packets 1519-2047 Octets  0
TX Packets 2048-4095 Octets  0
TX Packets 4096-9216 Octets  0
TX 1519-1522 Good Vlan frms  0
TX Octets                   9306562
TX Multicast Packets        24723
TX Single Collision frames  0
TX Mult. Collision frames   0
TX Late Collisions          0
TX Excessive Collisions     0
TX Collision frames         0
TX PAUSEMAC Ctrl Frames     0
TX MAC ctrl frames          0
TX Frame deferred Xmns      0
TX Frame excessive deferl   0
TX Oversize Packets         0
TX Jabbers                  0
TX FCS Error Counter        0
TX Fragment Counter         0
TX Byte Counter             1335951885
RX Packets 64 Octets        6672950
RX Packets 65-127 Octets    2226967
RX Packets 128-255 Octets   39459
RX Packets 256-511 Octets   34332
RX Packets 512-1023 Octets  523505
RX Packets 1024-1518 Octets  51945
RX Packets 1519-2047 Octets  0
RX Packets 2048-4095 Octets  0
RX Packets 4096-9216 Octets  0
RX Octets                   9549158
RX Multicast Packets        24674
RX FCS Errors               0
RX Align Errors             0
RX Fragments                0
RX Symbol errors            0
RX Unsupported opcodes      0
RX Out of Range Length      0
RX False Carrier Errors     0
RX Undersize Packets        0
RX Oversize Packets         0
RX Jabbers                  0
RX 1519-1522 Good Vlan frms 0
RX MTU Exceed Counter       0
RX Control Frame Counter    0
RX Pause Frame Counter      0
RX Byte Counter             999614473

{primary:node0}
root@SRX5800-1>

The output looks like standard port statistics from a switch. Looking in here will validate that packets are coming from the SPC. Because the SRX3000 has its control ports on the SFB, and there is nothing to configure for the control ports, there is little to look at on the interface. It is best to focus on the result from the show chassis cluster control-plane statistics command.

If checking the interfaces yields mixed results where they seem to be up but they are not passing traffic, it’s possible to reboot the node in the degraded state. The risk here is that the node could come up in split brain. Because that is a possibility, it’s best to disable its interfaces, or physically disable all of them except the control or data link. The ports can even be disabled on the switch to which they are connected. This way, on boot, if the node determines it is master, it will not interrupt traffic. A correctly operating node using the minimal control port and fabric port configuration should be able to communicate to its peer. If, after a reboot, it still cannot communicate to the other node, it’s best to verify the configuration and cabling. Finally, the box or cluster interfaces might be bad.

### Checking Interfaces

Interfaces are required to pass traffic through the SRX, and for the SRX to be effective in its job, it needs to have interfaces up and able to pass traffic. The SRX can use both local and redundant Ethernet interfaces, and for our purposes here, both have similar methods of troubleshooting.

To troubleshoot an interface, first check to see if the interface is physically up. Use the show interfaces terse command to quickly see all of the interfaces in both chassis.

{primary:node0}
root@SRX210-A> show interfaces terse
ge-0/0/0                up    down
ge-0/0/1                up    down
fe-0/0/2                up    up

This should be familiar if you’ve been reading through this chapter, and certainly throughout the book. The other item to check is the status of the reth within a redundancy group to see if the interface is up or down inside the reth. It’s possible that the reth could be physically up but logically down (in the event that there was an issue on the data plane). To check the status of a reth interface, use the show chassis cluster interfaces command.

root@SRX210-A> show chassis cluster interfaces
Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

Interface Monitoring:
Interface         Weight    Status    Redundancy-group
fe-0/0/2          255       Up        1
fe-2/0/2          255       Up        1

{primary:node0}
root@SRX210-A>

If the interfaces are physically up but the redundant interfaces show that they are in a down state, it’s time to look at the data plane.

### Verifying the Data Plane

The data plane on the SRX passes and processes the traffic. Because it is an independent component from the RE, it could be down while the administrator is still in the RE. There are a few things to check on the SRX to validate the data plane.

Because the data plane is very different between the branch SRX platform and the data center platform, there will be some variance between the commands.

Verifying the FPCs and PICs is the first step, and this shows the status of the underlying hardware that needs to be up to process the data traffic. On the branch SRX, the data plane is a single multithreaded process, however, so running the show chassis fpc pic-status command shows the status of the data plane.

root@SRX210-A> show chassis fpc pic-status
node0:
---------------------------------------------------------------------
Slot 0   Online       FPC
PIC 0  Online       2x GE, 6x FE, 1x 3G

node1:
---------------------------------------------------------------------
Slot 0   Online       FPC
PIC 0  Online       2x GE, 6x FE, 1x 3G

{primary:node0}
root@SRX210-A>

As you can see, this output is from an SRX210, but the command will list the status of the data plane on each SRX. Here it shows a single FPC and a single PIC. Although the output does not mention anything about flowd or the data plane, the output shows that the SRX is up and ready to pass traffic.

Now let’s show node 1 with a failed data plane.

{primary:node0}
root@SRX210-A> show chassis fpc pic-status
node0:
---------------------------------------------------------------------
Slot 0   Online       FPC
PIC 0  Online       2x GE, 6x FE, 1x 3G

node1:
---------------------------------------------------------------------
Slot 0   Offline      FPC

{primary:node0}
root@SRX210-A>

Here, node 1’s data plane went offline, caused by the loss of the flowd process. Another event that can be seen is that redundancy groups 1 and greater will have their priority as zero (to be discussed in the next section).

The output of the pic status command should correlate with the hardware that is in the chassis, which can be seen in the output of show chassis hardware.

{primary:node0}
root@SRX210-A> show chassis hardware
node0:
---------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number  Description
Routing Engine   REV 28   750-021779   AAAH2307       RE-SRX210-HIGHMEM
FPC 0                                                 FPC
PIC 0                                               2x GE, 6x FE, 1x 3G
Power Supply 0

node1:
---------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number  Description
Routing Engine   REV 28   750-021779   AAAH4743       RE-SRX210-HIGHMEM
FPC 0                                                 FPC
PIC 0                                               2x GE, 6x FE, 1x 3G
Power Supply 0

{primary:node0}
root@SRX210-A>

Here the command shows the hardware in PIC 0, which is the same as shown in the pic status command. This command is more useful on the data center platform because on the data center SRX, it’s a little more complex, as there are typically many different processors.

For example, here’s the PIC status of an SRX5800:

{primary:node0}
root@SRX5800-1> show chassis fpc pic-status
node0:
--------------------------------------------------------------------
Slot 0   Online       SRX5k SPC
PIC 0  Online       SPU Cp
PIC 1  Online       SPU Flow
Slot 1   Online       SRX5k SPC
PIC 0  Online       SPU Flow
PIC 1  Online       SPU Flow
Slot 3   Online       SRX5k SPC
PIC 0  Online       SPU Flow
PIC 1  Online       SPU Flow
Slot 6   Online       SRX5k DPC 4X 10GE
PIC 0  Online       1x 10GE(LAN/WAN) RichQ
PIC 1  Online       1x 10GE(LAN/WAN) RichQ
PIC 2  Online       1x 10GE(LAN/WAN) RichQ
PIC 3  Online       1x 10GE(LAN/WAN) RichQ
Slot 11  Online       SRX5k DPC 40x 1GE
PIC 0  Online       10x 1GE RichQ
PIC 1  Online       10x 1GE RichQ
PIC 2  Online       10x 1GE RichQ
PIC 3  Online       10x 1GE RichQ

node1:
--------------------------------------------------------------------
Slot 0   Online       SRX5k SPC
PIC 0  Online       SPU Cp
PIC 1  Online       SPU Flow
Slot 1   Online       SRX5k  SPC
PIC 0  Online       SPU Flow
PIC 1  Online       SPU Flow
Slot 3   Online       SRX5k SPC
PIC 0  Online       SPU Flow
PIC 1  Online       SPU Flow
Slot 6   Online       SRX5k DPC 4X 10GE
PIC 0  Online       1x 10GE(LAN/WAN) RichQ
PIC 1  Online       1x 10GE(LAN/WAN) RichQ
PIC 2  Online       1x 10GE(LAN/WAN) RichQ
PIC 3  Online       1x 10GE(LAN/WAN) RichQ
Slot 11  Online       SRX5k DPC 40x 1GE
PIC 0  Online       10x 1GE RichQ
PIC 1  Online       10x 1GE RichQ
PIC 2  Online       10x 1GE RichQ
PIC 3  Online       10x 1GE RichQ

{primary:node0}
root@SRX5800-1>

Here the command shows the SPCs that are online, which SPU is the CP, and the interface cards. A correctly operating device should have all of its SPCs online, and unless they are disabled, the interfaces should be online. Cards that have not booted yet will be offline or present. In a data center SRX, it can take up to five minutes for the data plane to completely start up. As the cards come online, the following messages will be sent to the command prompts. These messages should only come up once during the process and then they will be logged to the messages file.

{primary:node0}
root@SRX5800-1>
Message from syslogd@SRX5800-1 at Mar 13 22:01:48  ...
SRX5800-1 node0.fpc1.pic0 SCHED: Thread 4 (Module Init) ran for 1806 ms without
yielding

Message from syslogd@SRX5800-1 at Mar 13 22:01:49  ...
SRX5800-1 node0.fpc1.pic1 SCHED: Thread 4 (Module Init) ran for 1825 ms without
yielding

{primary:node0}
root@SRX5800-1>

If these messages are coming up on the CLI, the SPUs are constantly restarting and should identify a problem, perhaps because not enough power is being sent to the data plane and the SPUs are restarting.

Let’s show the hardware that should match up to the output of the show chassis cluster fpc pic-status command in the previous example. This will show all of the FPCs that are SPCs, and the administrator should be able to match up which PICs should be online and active.

{primary:node0}
root@SRX5800-1> show chassis hardware
node0:
-----------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number  Description
Chassis                                JN112A0AEAGA   SRX 5800
Midplane         REV 01   710-024803   TR8821         SRX 5800 Backplane
FPM Board        REV 01   710-024632   WX3786         Front Panel Display
PDM              Rev 03   740-013110   QCS12365066    Power Distribution Module
PEM 0            Rev 01   740-023514   QCS1233E066    PS 1.7kW; 200-240VAC in
PEM 1            Rev 01   740-023514   QCS1233E02V    PS 1.7kW; 200-240VAC in
PEM 2            Rev 01   740-023514   QCS1233E02E    PS 1.7kW; 200-240VAC in
Routing Engine 0 REV 03   740-023530   9009007746     RE-S-1300
CB 0             REV 03   710-024802   WX5793         SRX5k SCB
CB 1             REV 03   710-024802   WV8373         SRX5k SCB
FPC 0            REV 12   750-023996   XS7597         SRX5k SPC
CPU            REV 03   710-024633   XS6648         SRX5k DPC PMB
PIC 0                   BUILTIN      BUILTIN        SPU Cp
PIC 1                   BUILTIN      BUILTIN        SPU Flow
FPC 1            REV 08   750-023996   XA7212         SRX5k SPC
CPU            REV 02   710-024633   WZ0740         SRX5k DPC PMB
PIC 0                   BUILTIN      BUILTIN        SPU Flow
PIC 1                   BUILTIN      BUILTIN        SPU Flow
FPC 3            REV 12   750-023996   XS7625         SRX5k SPC
CPU            REV 03   710-024633   XS6820         SRX5k DPC PMB
PIC 0                   BUILTIN      BUILTIN        SPU Flow
PIC 1                   BUILTIN      BUILTIN        SPU Flow
FPC 6            REV 17   750-020751   WY2754         SRX5k DPC 4X 10GE
CPU            REV 02   710-024633   WY3706         SRX5k DPC PMB
PIC 0                   BUILTIN      BUILTIN        1x 10GE(LAN/WAN) RichQ
Xcvr 0       REV 02   740-011571   C831XJ039      XFP-10G-SR
PIC 1                   BUILTIN      BUILTIN        1x 10GE(LAN/WAN) RichQ
Xcvr 0       REV 01   740-011571   C744XJ021      XFP-10G-SR
PIC 2                   BUILTIN      BUILTIN        1x 10GE(LAN/WAN) RichQ
PIC 3                   BUILTIN      BUILTIN        1x 10GE(LAN/WAN) RichQ
FPC 11           REV 14   750-020235   WY8697         SRX5k DPC 40x 1GE
CPU            REV 02   710-024633   WY3743         SRX5k DPC PMB
PIC 0                   BUILTIN      BUILTIN        10x 1GE RichQ
--snip--
PIC 1                   BUILTIN      BUILTIN        10x 1GE RichQ
--snip--
PIC 2                   BUILTIN      BUILTIN        10x 1GE RichQ
--snip--
PIC 3                   BUILTIN      BUILTIN        10x 1GE RichQ
Xcvr 0       REV 01   740-013111   8280380        SFP-T
--snip--
Fan Tray 0       REV 05   740-014971   TP8104         Fan Tray
Fan Tray 1       REV 05   740-014971   TP8089         Fan Tray

{primary:node0}
root@SRX5800-1>

### Core Dumps

A core dump occurs when things have gone wrong and a process crashes. The memory for the process is then dumped to local storage. If something goes wrong and a process crashes on the SRX, the core dump is stored to several different directories on the local RE. Here’s an example of how to find core dumps:

{primary:node0}
root@SRX5800-1> show system core-dumps
node0:
--------------------------------------------------------------------
/var/crash/*core*: No such file or directory
/var/tmp/*core*: No such file or directory
/var/crash/kernel.*: No such file or directory
/tftpboot/corefiles/*core*: No such file or directory

node1:
-----------------------------------------------------------------------
/var/crash/*core*: No such file or directory
-rw-rw----  1 root  wheel   104611 Feb 26 22:22 /var/tmp/csh.core.0.gz
-rw-rw----  1 root  wheel   108254 Feb 26 23:11 /var/tmp/csh.core.1.gz
-rw-rw----  1 root  wheel   107730 Feb 26 23:11 /var/tmp/csh.core.2.gz
/var/crash/kernel.*: No such file or directory
/tftpboot/corefiles/*core*: No such file or directory
total 3

{primary:node0}
root@SRX5800-1>

If core dumps are found, there isn’t much for the users to troubleshoot. Although sometimes core dumps from CSH or a C shell can occur when a user uses Ctrl-C to terminate a program, these generally can be ignored. However, if a core dump for flowd or other processes exists, it should be reported to JTAC, as it might be an indicator of a more complex problem.

For most administrators, the following output is a disaster:

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node                  Priority      Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   254     primary        no       no
node1                   1       secondary      no       no

Redundancy group: 1 , Failover count: 1
node0                   254     primary        no       no
node1                   0       secondary      no       no

{primary:node0}
root@SRX210-A>

Seeing a priority of zero tends to leave administrators in a state of confusion, but the simple reason this occurs could be a problem on the data plane. Determining the problem can be difficult. Although some of the troubleshooting steps we already discussed can be helpful, you might try another. Everything that happens with jsrpd is logged to the file jsrpd, in the directory /var/log. You can view the file by using the show log jsrpd command. The contents of the file vary, based on the events that occur with jsrpd, but the file is typically quite readable.

There are some specific items to check for. The first is coldsync, which is the initial synchronization between the kernels on the two REs. A failed coldsync will cause the priority to be set to zero. If there is a problem and coldsync cannot complete, the coldsync monitoring weight will be set to 255. If it completes, it is set to zero. Here’s an example of a coldsync log:

{primary:node0}
root@SRX210-A> show log jsrpd | match coldsync

Apr 11 08:44:14 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:09:38 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:09:38 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:09:38 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:11:20 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:11:20 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:11:20 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO
Apr 11 13:19:05 coldsync status message received from PFE: 0, status: 0x1
Apr 11 13:19:05 duplicate coldsync completed message from PFE: 0 ignored
Apr 11 13:19:05 coldsync is completed for all the PFEs. cs monitoring weight
is set to ZERO

If coldsync fails, it’s possible to do two things. First, on either device, issue a commit full command, which will resend the complete configuration to the data and control planes (this might impact traffic as it reapplies all of the policies). The other option is to reboot the secondary node and attempt the coldsync process again. (As a last resort, read the next section.)

In the logfile, the history of interfaces going up and down, node mastership, and other events are kept. Most of the events are quite obvious to administrators and should provide a road map to what happened on the device.

Additional information can be gathered by turning on the traceoptions; just be aware that a lot of additional processing can be required based on the type of traceoptions you enable. If all events are enabled, it will spike the chassis process to 100 percent utilization.

Do not enable traceoptions for more than a few minutes!

There have been countless times where administrators have left traceoptions enabled for all events, and all sorts of trouble has occurred, from service outages to crashing devices, if traceoptions stays active for a long enough period of time.

### When All Else Fails

The SRX is a complex and feature-rich product, and Junos provides all sorts of configuration knobs that are not available on other products, all of it engineered with an appreciation that uptime is critical to any organization.

If the SRX is going to be deployed in a complex environment, the administrator should become familiar with the product before deployment. The administrator’s knowledge and understanding of the product is the first line of defense for ensuring that the product is going to work in the environment. The more critical the environment, the more detailed an administrator should be in her testing and knowledge about the product. Before deployment, some administrators spend months learning and staging the SRX. Although that might seem like an excessive amount for you and your network needs, it’s a fact that the most prepared administrators have the fewest issues. It’s one of the reasons we wrote this book, and hopefully, you’ve read this far into it.

There are other sources for studying and analyzing the SRX. For instance, the J-Net community allows users and product experts to communicate, sharing solutions and issues about Juniper products. It’s also a really great set of resources to learn from what other users are doing. Another great resource is the juniper-nsp mailing list. This mailing list has been around for many years and the SRX has become a popular topic.

You might also look at a new and budding series of free Day One booklets from Juniper Networks that cover the SRX product line.

But truly, when all else fails, it’s a good idea to contact JTAC for support. When contacting JTAC, it’s important to provide the correct information. If you share the correct data with JTAC, they can quickly get to the root of the problem.

First, collect the output from the command request support information. The output can be quite large. If possible, save it locally to the RE, then transfer it off the box by using request support information | save SupportInfo.txt, and then use the following sequence of commands to copy off the file:

{primary:node0}

OR

{primary:node0}
root@SRX5800-1> copy file SupportInfo.txt scp://172.19.100.50:
SupportInfo.txt
100% 7882     7.7KB/s   00:00

{primary:node0}
root@SRX5800-1>

JTAC might also request the contents of the /var/log directory. If possible, when opening a case, have the support information file, the /var/log contents, any core dumps, and a simple topology diagram readily available. By providing this, you will solve half the problem for JTAC in getting to the root of the issue. If some event occurs and it’s not reflected in the logs, there isn’t much JTAC can do. Be sure to document the event and share what was observed in the network. JTAC can take it from there and work with you to resolve the issue.

### Manual Failover

Although the SRX has control over which node is in charge of each redundancy group, sometimes the administrator needs to fail over a redundancy group—say, for maintenance or troubleshooting purposes. No matter the reason, it’s possible to manually fail over any of the redundancy groups. By executing a manual failover, the SRX will place the new master node with a priority of 255 (you can’t configure this priority as it is only used for a manual failover).

The only event that can take over a manual failover is a hard failure, such as the device failing. When using a manual failover, it’s best to unset the manual failover flag so that the SRX can manage it from there.

In this example, redundancy group 1 is failed over between the two chassis and then reset to the default state.

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node              Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0              254         primary        no       no
node1              1           secondary      no       no

Redundancy group: 1 , Failover count: 5
node0              254         primary        no       no
node1              1           secondary      no       no

{primary:node0}
root@SRX210-A> request chassis cluster failover redundancy-group 1 node 1
node1:
----------------------------------------------------------------------

Initiated manual failover for redundancy group 1

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node             Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0              254         primary        no       no
node1              1           secondary      no       no

Redundancy group: 1 , Failover count: 6
node0              254         secondary      no       yes
node1              255         primary        no       yes

{primary:node0}
root@SRX210-A> request chassis cluster failover reset redundancy-group 1
node0:
---------------------------------------------------------------------
No reset required for redundancy group 1.

node1:
---------------------------------------------------------------------
Successfully reset manual failover for redundancy group 1

{primary:node0}
root@SRX210-A> request chassis cluster failover redundancy-group 1 node 0
node0:
---------------------------------------------------------------------
Initiated manual failover for redundancy group 1

root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node             Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0              254         primary        no       no
node1              1           secondary      no       no

Redundancy group: 1 , Failover count: 7
node0              255         primary        no       yes
node1              1           secondary      no       yes

{primary:node0}
root@SRX210-A> request chassis cluster failover reset redundancy-group 1
node0:
---------------------------------------------------------------------
Successfully reset manual failover for redundancy group 1

node1:
---------------------------------------------------------------------
No reset required for redundancy group 1.

{primary:node0}
root@SRX210-A>

Here redundancy group 1 is failed over to node 1. Then, as you can see, the priority is set to 255 and the manual failover flag is set. Once this flag is set, another manual failover cannot occur until it is cleared. Next, the failover is reset for redundancy group 1, using the request chassis cluster failover reset redundancy group 1 command, allowing the redundancy group to be failed over again. Next, the redundancy group is failed over back to the original node and the manual failover is reset. If a hold-down timer was configured, the manual failover cannot go over the hold-down timer, meaning that a manual failover cannot occur until the hold-down timer has passed.

It is also possible to do this for the control plane. However, it’s best to not rapidly fail over the control plane, and best practice recommends that you use a 300-second hold-down timer to prevent excessive flapping of the control plane (which was discussed in the section “Preserving the Control Plane” earlier in this chapter).

Now, in this manual failover example, redundancy group 0 is failed over and then the hold-down timer prevents a manual failover.

{primary:node0}
root@SRX210-A> show configuration chassis cluster
reth-count 2;
heartbeat-interval 2000;
heartbeat-threshold 8;
redundancy-group 0 {
node 0 priority 254;
node 1 priority 1;
hold-down-interval 300;
}
redundancy-group 1 {
node 0 priority 254;
node 1 priority 1;
interface-monitor {
fe-2/0/2 weight 255;
fe-0/0/2 weight 255;
}
}

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node             Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0             254         primary        no       no
node1             1           secondary      no       no

Redundancy group: 1 , Failover count: 7
node0             254         primary        no       no
node1             1           secondary      no       no

{primary:node0}
root@SRX210-A> request chassis cluster failover redundancy-group 0 node 1
node1:
---------------------------------------------------------------------
Initiated manual failover for redundancy group 0

{primary:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node              Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 2
node0              254         secondary-hold no       yes
node1              255         primary        no       yes

Redundancy group: 1 , Failover count: 7
node0              254         primary        no       no
node1              1           secondary      no       no

{secondary-hold:node0}
root@SRX210-A> request chassis cluster failover reset redundancy-group 0
node0:
----------------------------------------------------------------------
No reset required for redundancy group 0.

node1:
----------------------------------------------------------------------
Successfully reset manual failover for redundancy group 0

{secondary-hold:node0}
root@SRX210-A> request chassis cluster failover redundancy-group 0 node 0
node0:
----------------------------------------------------------------------
Manual failover is not permitted as redundancy-group 0 on node0 is in secondary-
hold state.

{secondary-hold:node0}
root@SRX210-A> show chassis cluster status
Cluster ID: 1
Node             Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 2
node0              254         secondary-hold no       no
node1              1           primary        no       no

Redundancy group: 1 , Failover count: 7
node0              254         primary        no       no
node1              1           secondary      no       no

{secondary-hold:node0}
root@SRX210-A>

Here redundancy group 0 is failed over from node 0 to node 1. This is just as before. It creates the priority on the new primary as 255 and sets the manual failover to yes. However, now node 0 shows secondary-hold as its status, indicating that it is in secondary mode but is also on a hold-down timer. When the timer expires, it will show secondary. In the event of a critical failure to the primary device, the secondary-hold unit can still take over. Finally, an attempt to manually fail over the node is made, and it’s not possible to fail over because the node is on a hold-down timer.

## Sample Deployments

The most common chassis cluster deployment on an SRX is active/passive. This type of deployment has many benefits that outweigh its drawbacks. An active/passive deployment offers resiliency in the event of a failover and is fairly easy to operate. The downside is that the backup box lays dormant until it is needed to step in for the primary device. The risk is that the backup device could run into an issue while waiting for its turn. If this occurs, your network will go down. So when running an SRX active/passive cluster, you should routinely fail the devices over to ensure both devices are operational.

For our sample deployment, we show a typical SRX100 branch deployment. Figure 7-11 shows our example topology.

In this deployment, we have two subnets: Trust and Untrust. The Untrust subnet is 10.0.2.0/24 and the Trust subnet is 10.0.1.0/24. We will implement one reth interface for each subnet. We will also utilize fxp0 interfaces for management. This is how our interfaces are configured:

{secondary:node1}
root@SRX-HA-1# show interfaces | display inheritance
fe-0/0/3 {
fastether-options {
redundant-parent reth0;
}
}
fe-0/0/4 {
fastether-options {
redundant-parent reth1;
}
}
fe-1/0/3 {
fastether-options {
redundant-parent reth0;
}
}
fe-1/0/4 {
fastether-options {
redundant-parent reth1;
}
}
fab0 {
fabric-options {
member-interfaces {
fe-0/0/5;
}
}
}
fab1 {
fabric-options {
member-interfaces {
fe-1/0/5;
}
}
}
fxp0 {
unit 0 {
family inet {
master-only;
}
##
## '10.0.1.252/24' was inherited from group 'node1'
##
}
}
}
reth0 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
family inet6 {
}
}
}
reth1 {
redundant-ether-options {
redundancy-group 1;
}
unit 0 {
family inet {
}
family inet6 {
}
}
}

From this output, we can only see one fxp0 interface. This is because we are on the secondary node. Using show interfaces | display inheritance, we can see that the fxp0 interface is being imported into the configuration from node groups. Next you can see our node group configuration. This gives each host a unique hostname and management IP.

{secondary:node1}
root@SRX-HA-1# show groups
node0 {
system {
host-name SRX-HA-0;
backup-router 10.0.1.2 destination 0.0.0.0/0;
}
interfaces {
fxp0 {
unit 0 {
family inet {
}
}
}
}
}
node1 {
system {
host-name SRX-HA-1;
backup-router 10.0.1.2 destination 0.0.0.0/0;
}
interfaces {
fxp0 {
unit 0 {
family inet {
}
}
}
}
}

We can check to see that the devices are correctly working through the standard chassis cluster status commands.

{secondary:node1}
root@SRX-HA-1> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
node0                   254         primary        no       no
node1                   1           secondary      no       no

Redundancy group: 1 , Failover count: 0
node0                   254         primary        no       no
node1                   1           secondary      no       no

{secondary:node1}
root@SRX-HA-1> show chassis cluster interfaces

Control interfaces:
Index   Interface        Status
0       fxp1             Up

Fabric interfaces:
Name    Child-interface    Status
fab0    fe-0/0/5           Up
fab0
fab1    fe-1/0/5           Up
fab1

Redundant-ethernet Information:
Name         Status      Redundancy-group
reth0        Up          1
reth1        Up          1

{secondary:node1}
root@SRX-HA-1>

## Summary

Because the SRX will be placed in a mission-critical location in the network, it is extremely important to ensure that it is up and functional. Firewalls are placed in between the untrusted and trusted locations within a network. If the firewall fails, there is nothing left to bring the two networks together, causing a major outage. As you saw in this chapter, the SRX has a robust HA architecture that can survive the worst of tragedies.

The biggest benefit to the SRX HA design is the flexibility it gives to the end user. The ability to use redundancy groups and mix and match them with local interfaces is very powerful. It allows you to overcome the traditional limitations of a redundant firewall configuration and explore new design scenarios. At first, the new paradigm of mixing redundant interfaces, redundancy groups, and local interfaces is overwhelming. Hopefully, this chapter will allow you to think more freely and move away from past firewall limitations.

## Study Questions

Questions

1. What is the purpose of the control link?

2. What are the three types of communication that pass over the fabric link?

3. Can configuration groups be used for any other tasks on a Junos device? Be specific.

4. What feature needs to be enabled when using dynamic routing?

5. What are the two most important commands when troubleshooting an SRX cluster?

6. From what Juniper product did the SRX get part of its HA code infrastructure?

7. Which platform supports the automatic upgrade of the secondary node?

8. Are acknowledgments sent for session synchronization messages?

9. What is a redundancy group?

10. Why is the control port so important?

1. The control link is used for the two REs to talk to each other. The kernels synchronize state between each other, the REs talk to the data plane on the other node, and jsrpd communicates. The jsrpd daemon sends heartbeat messages to validate that the other side is up and running.

2. Heartbeats are sent by the jsrpd daemon to ensure that the remote node is up and healthy. The heartbeats pass through the data planes of both devices and back to the other side. This validates the entire path end to end, making sure it is able to pass traffic. In the event that traffic needs to be forwarded between the two nodes, it is done over the data link. Last but not least, the data link is used to synchronize RTO messages between the two chassis. RTOs are used in the maintenance of the state between the two devices. This includes session creation and session closing messages.

3. Node-specific information is configured using Junos groups. This was one of the fundamental features that was created in Junos. Junos groups can also be thought of as configuration templates or snippets. They can be used to do such things as enabling logging on all firewall policies and configuring specific snippets of information. Using Junos groups where it makes sense simplifies the administration of the SRX and makes reading the configuration easier.

4. When using dynamic routing, the graceful restart feature should be enabled. It allows the data plane to keep dynamic routes active if the control plane fails over. It also allows for other routers that surround the SRX to assist it during a control plane failover.

5. The two most important commands are show chassis cluster status and show chassis cluster statistics. This will allow for the current state of the cluster and the current status of communication between the two nodes. Anyone who is administering a cluster will use these two commands the most.

6. The SRX used code from the TX Series products. The TX Series are some of the largest and most scalable routing products in the world.

7. The data center SRXs support unified in-service software upgrades. This feature allows for an automatic upgrade of the backup node without impacting network availability.

8. Session synchronization messages are not acknowledged. This would take additional time and resources away from the processors by forcing the processing of an additional message.

9. A redundancy group is a logical collection of objects. It can contain either the control plane (redundancy group 0 only) or interfaces (redundancy group 1+).

10. The control port provides critical communication between the two REs. If this link is lost, the two REs cannot synchronize the state of the kernels. Because of this, if the control link goes down, the secondary node will go into a disabled state.