The world is a jungle in general, and the networking game contributes many animals. // RFC 826
This post is the very first in a series that will cover gratuitous ARP and its relation to OpenStack. There will be six posts in the series. My plan is to post them all in span of the next two weeks. You can find the list of “episodes” already published below.
- Introduction into gratuitous ARP (this post)
- Gratuitous ARP in OpenStack Neutron
- The Failure, part 1: It’s OpenStack fault
- The Failure, part 2: Beyond OpenStack
- The Failure, part 3: Diggin’ the Kernel
- The Failure, part 4: Summary
This post will start the series with a discussion of what gratuitous ARP is, and how it helps in IT operations. Later posts will touch on how it applies to OpenStack and Linux kernel, and also discuss several issues that you may want to be aware of before building your OpenStack operations on the protocol.
Let’s dive in.
ARP (Address Resolution Protocol)
ARP is one of the most widely used protocols in modern networks. Its history goes back to early 80s into the times of DARPA backed internetworking experiments. The very first RFC 826 that defined the protocol is dated November 1982 (it’s 35 years old at the time of writing). Despite the age, it’s still a backbone of local IPv4 network connectivity. Even in 2017 (the year I draft this post), it’s still very hard to find a IPv6-only network node, especially outside cloud environments. But that’s IPv4, so how does ARP fit into the picture?
To understand the goal of ARP, let’s first look at how network nodes are connected. The general model can be described as a set of hosts, each having one or more Network Interface Controller (NIC) cards connected to a common data link fabric. This fabric comes in different flavors (Ethernet, IEEE 802.11 aka WiFi, or FireWire). Irrespective of particular fabric flavor, all of them provide similar capabilities. One of the features that are expected from all of them is some form of endpoint addressing, ideally globally unique, so that network hosts connected to a shared medium can distinguish each other and address transferred data to specific peers. Ethernet and IEEE 802.11 are probably the most popular data link layers in the world, and since they are largely identical in terms of NIC addressing, in next discussions we will assume Ethernet fabric unless explicitly said otherwise.
For Ethernet, each NIC card produced in the world gets a unique 48-bit long hardware address allocated by a vendor under IEEE supervision that guarantees that no hardware address is allocated to two NIC cards. Uniqueness is to ensure that whichever hardware you plug into your network, it will never clash in address space with any other card also attached to the network. An example of a EUI-48 address would be, in commonly used notation, f4:5c:89:89:cd:54. These addresses are widely known as MAC addresses, and so I will also use this term moving forward.
It all means that your NIC already has a unique address, so why do you even need IP addresses? Sadly, people are bad at memorizing 48 randomized bits, so an easier scheme would be handy. Another problem is whenever your NIC dies and you replace it with a new one, the new card will have another unique address, and so you would need to advertise the new MAC address to all your network peers that may need to access your host.
And so engineers were looking for a better scheme to address network hosts. One of those successful alternative addressing proposals was IPv4. In this scheme, IPv4 addresses are defined 32-bit long. Still a lot, but the crucial point is that now you could pick addresses for your NIC cards. With that freedom, you could pick the same bit prefix for all your hosts, distinguishing them by a shorter number of trailing bits, and memorize just those unique bits, and configure your networking software to use the same prefix for network communication with other hosts. Then whenever you want to address a host, you pass unique trailing bits assigned to the host into your networking stack and allow it to produce the resulting address by prepending the common prefix.
The only problem with this approach is that now you have two address schemes: MAC addresses and IP addresses, with no established mapping between them. Of course, in small networks, you could maintain static IP-to-MAC mappings in sync on every host, but that is error prone and doesn’t scale well.
And that’s where ARP comes in to the stage. Instead of maintaining static mappings across hosts, the protocol allows to dynamically disseminate the information on the wire.
Quoting the abstract of RFC 826:
Presented here is a protocol that allows dynamic distribution of the information needed to build tables to translate an address A in protocol P’s address space into a 48.bit Ethernet address.
And that’s exactly what we need.
While the abstract and even the RFC title talk about Ethernet, the mechanism rendered so successful that it was later expanded to other data links, including e.g. FireWire.
The protocol introduces both ARP packet format as well as its state machine. Sadly, the RFC doesn’t contain a visual scheme for ARP packets, but we can consult the protocol Wikipedia page.
The RFC describes an address translation (ARP) table for each host storing IP-to-MAC mappings. It also defines two operations: a REQUEST and a REPLY. Whenever a host wants to contact an IP address for which there is no mapping in the local ARP table, the host sends a REQUEST ARP packet to broadcast destination MAC address asking the question “Who has the IP address?” Then it’s expected that the host carrying the IP address will send a REPLY ARP packet back with its own MAC address set in “Sender hardware address” field. The original host will then update its ARP table with a new IP-to-MAC mapping and will use the newly learned value as a destination MAC address for all communication with the IP address.
One thing to clarify before we move forward: this is all true assuming both interacting hosts are on the same layer-2 network segment, without an IP gateway (router) in between. If hosts are located in different segments, then connection between them is established through a router. In this case, a host willing to communicate with a host in another segment will determine that fact by inspecting its IP routing table. Since the destination IP address then would not belong to the local network IP prefix, the host will instead send the data to the default router IP address. (Of course, at this point the host may also determine that its ARP table doesn’t contain an entry for the gateway IP address yet, in which case it will use ARP to learn about the router MAC address.)
ARP table invalidation
One interesting aspect of the original RFC is that it doesn’t define a mechanism to update existing ARP table entries with new MAC addresses. Back in 1982, it was probably widely assumed that mobile IP stations roaming across network segments changing devices used to connect to outside world on the fly (think about how your smartphone seamlessly switches from WiFi to LTE) were not a too realistic use case. But even then, in “Related issue” section of the document, some ideas on how it could be implemented if needed were captured.
One suggestion was for every host to define “aging time” for its ARP entries. If a peer host is detected as unreachable (probably because there was no incoming traffic using both the MAC and IP addresses stored in ARP table), the originating host could remove the corresponding ARP entry from its table after it’s “aged”. This mechanism is indeed used in most modern ARP implementations, with 60 seconds being the common default for Linux systems (can be overridden using gc_stale_time sysctl setting).
It means that your connectivity to a roaming IP host will heal itself after a minute of temporary down time. While that’s great, some use cases would benefit from a more rapid reaction of hosts to network changes.
And that’s where gratuitous ARP comes into play.
Gratuitous ARP is an ARP packet that was never asked for (hence its alternative name – unsolicited ARP). RFC 826, “Related issue” section, mentions an algorithm to update existing ARP table entries in the network based on unsolicited ARP packets. But it’s only RFC 2002, “IP Mobility support” from year 1996 that made it part of a standard and introduced the very term “gratuitous ARP”.
RFC 2002 discusses protocol enhancements for IP networks to allow for IP devices roaming across networks without introducing significant connectivity delays or disruptions. Among other things, it defines the algorithm to be used to update existing ARP table entries with new MAC addresses. For this matter, it adopts the proposal from RFC 826, where a host can broadcast a gratuitous ARP packet into a network, and its peers then update their tables with the new MAC address sent, restoring connectivity even before old ARP entries expire.
There are two main use cases for gratuitous ARP. One is to quickly switch between multiple devices on the same host. Another is to move services exposed through an IP address from one host to another transparently to network peers.
This last scenario may happen either as part of a planned action on behalf of an Ops team managing a service, or triggered by a self-healing mechanism used in networks to guarantee availability of services in case of software or network failures. One piece of popular software that allows to fail over IP addresses from one host to another is keepalived which uses the VRRP protocol to negotiate between hosts which node should carry IP addresses managed by the software.
In OpenStack Neutron, gratuitous ARP is how floating IP addresses roam between ports; they also help with failing over IP addresses between HA router instances.
In the next post, I will expand on how OpenStack Neutron uses and implements gratuitous ARP.