Chapter 12. HTTP 2.0

HTTP 2.0 will make our applications faster, simpler, and more robust—a rare combination—by allowing us to undo many of the HTTP 1.1 workarounds previously done within our applications and address these concerns within the transport layer itself. Even better, it also opens up a whole host of entirely new opportunities to optimize our applications and improve performance!

The primary goals for HTTP 2.0 are to reduce latency by enabling full request and response multiplexing, minimize protocol overhead via efficient compression of HTTP header fields, and add support for request prioritization and server push. To implement these requirements, there is a large supporting cast of other protocol enhancements, such as new flow control, error handling, and upgrade mechanisms, but these are the most important features that every web developer should understand and leverage in his applications.

HTTP 2.0 does not modify the application semantics of HTTP in any way. All the core concepts, such as HTTP methods, status codes, URIs, and header fields, remain in place. Instead, HTTP 2.0 modifies how the data is formatted (framed) and transported between the client and server, both of whom manage the entire process, and hides all the complexity from our applications within the new framing layer. As a result, all existing applications can be delivered without modification. That’s the good news.

However, we are not just interested in delivering a working application; our goal is to deliver the best performance! HTTP 2.0 enables a number of new optimizations our applications can leverage, which were previously not possible, and our job is to make the best of them. Let’s take a closer look under the hood.

Standard under construction

HTTP 2.0 is under active construction: core architectural designs, principles, and features are well defined, but the same cannot be said for specific, low-level implementation details. For this reason, our discussion will focus on the architecture and its implications, with a very brief look at the wire format—just enough to understand how the protocol works and its implications.

For the latest draft and status of the HTTP 2.0 standard, visit the IETF tracker.

History and Relationship to SPDY

SPDY is an experimental protocol, developed at Google and announced in mid-2009, whose primary goal was to try to reduce the load latency of web pages by addressing some of the well-known performance limitations of HTTP 1.1. Specifically, the outlined project goals were set as follows:

  • Target a 50% reduction in page load time (PLT).
  • Avoid the need for any changes to content by website authors.
  • Minimize deployment complexity, avoid changes in network infrastructure.
  • Develop this new protocol in partnership with the open-source community.
  • Gather real performance data to (in)validate the experimental protocol.

To achieve the 50% PLT improvement, SPDY aimed to make more efficient use of the underlying TCP connection by introducing a new binary framing layer to enable request and response multiplexing, prioritization, and to minimize and eliminate unnecessary network latency; see “Latency as a Performance Bottleneck”.

Not long after the initial announcement, Mike Belshe and Roberto Peon, both software engineers at Google, shared their first results, documentation, and source code for the experimental implementation of the new SPDY protocol:

 

So far we have only tested SPDY in lab conditions. The initial results are very encouraging: when we download the top 25 websites over simulated home network connections, we see a significant improvement in performance—pages loaded up to 55% faster.

 
  -- A 2x Faster Web Chromium Blog

Fast-forward a few years to 2012, and the new experimental protocol was supported in Chrome, Firefox, and Opera, and many large web destinations (e.g., Google, Twitter, Facebook) were offering SPDY to compatible clients. In other words, SPDY proved to offer great performance benefits and was on track to become a de facto standard through growing industry adoption. As a result, the HTTP Working Group (HTTP-WG) kicked off the new HTTP 2.0 effort in early 2012 to take the lessons learned from SPDY and apply them to the official standard.

The Road to HTTP 2.0

SPDY was the catalyst for HTTP 2.0, but SPDY is not HTTP 2.0. An open call for HTTP 2.0 proposals was made in early 2012, and following much discussion within the HTTP-WG, the SPDY specification was adopted as a starting point for further work on the standard. Since then, many changes and improvements have been and will continue to be made to the official HTTP 2.0 standard.

However, before we get too far ahead, it is worth reviewing the drafted charter for HTTP 2.0, as it highlights the scope and the key design criteria of the protocol:

 

It is expected that HTTP/2.0 will:

  • Substantially and measurably improve end-user perceived latency in most cases, over HTTP 1.1 using TCP.
  • Address the "head of line blocking" problem in HTTP.
  • Not require multiple connections to a server to enable parallelism, thus improving its use of TCP, especially regarding congestion control.
  • Retain the semantics of HTTP 1.1, leveraging existing documentation, including (but not limited to) HTTP methods, status codes, URIs, and where appropriate, header fields.
  • Clearly define how HTTP 2.0 interacts with HTTP 1.x, especially in intermediaries.
  • Clearly identify any new extensibility points and policy for their appropriate use.

The resulting specification(s) are expected to meet these goals for common existing deployments of HTTP; in particular, Web browsing (desktop and mobile), non-browsers ("HTTP APIs"), Web serving (at a variety of scales), and intermediation (by proxies, corporate firewalls, "reverse" proxies and Content Delivery Networks). Likewise, current and future semantic extensions to HTTP/1.x (e.g., headers, methods, status codes, cache directives) should be supported in the new protocol.

 
  -- HTTPbis WG charter HTTP 2.0

In short, HTTP 2.0 aims to address the well-known performance limitations of preceding standards, but it is also extending, not replacing, the previous 1.x standards. The application semantics of HTTP are the same, and no changes are being made to the offered functionality or core concepts such as HTTP methods, status codes, URIs, and header fields; these changes are explicitly out of scope. With that in mind, is the "2.0" really warranted?

The reason for the major revision increment to 2.0 is due to the change in how the data is exchanged between the client and server. To achieve the outlined performance goals, HTTP 2.0 adds a new binary framing layer, which is not backward compatible with previous HTTP 1.x servers and clients. Hence, 2.0.

Unless you are implementing a web server, or a custom client, by working with raw TCP sockets, then chances are you may not even notice any of the actual technical changes in HTTP 2.0: all the new, low-level framing is performed by the browser and server on your behalf. Perhaps the only difference may be the availability of new and optional API capabilities like server push!

Finally, it is important to discuss the timeline for HTTP 2.0. Developing a major revision of a protocol underlying all web communication is a nontrivial task, and one that requires a lot of careful thought, experimentation, and coordination. As such, crystal ball gazing for HTTP 2.0 timelines is dangerous business: it will be ready when it’s ready. Having said that, the HTTP-WG is making good progress, and the current official milestones are set as follows:

  • March 2012: Call for proposals for HTTP 2.0
  • September 2012: First draft of HTTP 2.0
  • July 2013: First implementation draft of HTTP 2.0
  • April 2014: Working Group last call for HTTP 2.0
  • November 2014: Submit HTTP 2.0 to IESG as a Proposed Standard

The big gap between 2012 and 2014 is where the bulk of the editorial and experimental work is planned to happen. Depending on the progress, and the feedback from implementers and the industry at large, the dates will be adjusted as needed. The good news is, as of 2013, the schedule looks well on track!

Design and Technical Goals

HTTP 1.x was intentionally designed for simplicity of implementation: HTTP 0.9 was a one-line protocol to bootstrap the World Wide Web; HTTP 1.0 documented the popular extensions to 0.9 in an informational standard; HTTP 1.1 introduced an official IETF standard; see Chapter 9. As such, HTTP 0.9-1.x delivered exactly what it set out to do: HTTP is one of the most ubiquitous and widely adopted application protocols on the Internet.

Unfortunately, implementation simplicity also came at a cost of application performance, which is the exact gap that HTTP 2.0 is designed to fill:

 

The HTTP/2.0 encapsulation enables more efficient use of network resources and reduced perception of latency by allowing header field compression and multiple concurrent messages on the same connection. It also introduces unsolicited push of representations from servers to clients.

 
  -- HTTP/2.0 Draft 4

HTTP 2.0 is a work in progress, which means that the specific details of how the bits are encoded within each frame, the names of individual fields, and similar low-level details may change. However, while the "how" will continue to evolve, the core design and technical goals are what matters most for our discussion; these are well understood and agreed upon.

Binary Framing Layer

At the core of all performance enhancements of HTTP 2.0 is the new binary framing layer (Figure 12-1), which dictates how the HTTP messages are encapsulated and transferred between the client and server.

HTTP 2.0 binary framing layer
Figure 12-1. HTTP 2.0 binary framing layer

The "layer" refers to a design choice to introduce a new mechanism between the socket interface and the higher HTTP API exposed to our applications: the HTTP semantics, such as verbs, methods, and headers, are unaffected, but the way they are encoded while in transit is what’s different. Unlike the newline delimited plaintext HTTP 1.x protocol, all HTTP 2.0 communication is split into smaller messages and frames, each of which is encoded in binary format.

As a result, both client and server must use the new binary encoding mechanism to understand each other: an HTTP 1.x client won’t understand an HTTP 2.0 only server, and vice versa. Thankfully, our applications remain blissfully unaware of all these changes, as the client and server perform all the necessary framing work on our behalf.

HTTPS is another great example of binary framing in action: all HTTP messages are transparently encoded and decoded on our behalf (“TLS Record Protocol”), enabling secure communication between the client and server, without requiring any modifications of our applications. HTTP 2.0 works in a similar way.

Streams, Messages, and Frames

The introduction of the new binary framing mechanism changes how the data is exchanged (Figure 12-2) between the client and server. To describe this process, we need to introduce some new HTTP 2.0 terminology:

Stream
A bidirectional flow of bytes within an established connection.
Message
A complete sequence of frames that map to a logical message.
Frame
The smallest unit of communication in HTTP 2.0, each containing a frame header, which at minimum identifies the stream to which the frame belongs.

All HTTP 2.0 communication is performed within a connection that can carry any number of bidirectional streams. In turn, each stream communicates in messages, which consist of one or multiple frames, each of which may be interleaved and then reassembled via the embedded stream identifier in the header of each individual frame.

HTTP 2.0 streams, messages, and frames
Figure 12-2. HTTP 2.0 streams, messages, and frames

All HTTP 2.0 frames use binary encoding, and header data is compressed. As such, the preceding diagram illustrates the relationship between streams, messages, and frames, not their exact encoding on the wire—for that, skip to “Brief Introduction to Binary Framing”.

There is a lot of information packed into those few terse sentences. Let’s review it one more time. The terminology of streams, messages, and frames is essential knowledge for understanding HTTP 2.0:

  • All communication is performed with a single TCP connection.
  • The stream is a virtual channel within a connection, which carries bidirectional messages. Each stream has a unique integer identifier (1, 2, …, N).
  • The message is a logical HTTP message, such as a request, or response, which consists of one or more frames.
  • The frame is the smallest unit of communication, which carries a specific type of data—e.g., HTTP headers, payload, and so on.

In short, HTTP 2.0 breaks down the HTTP protocol communication into small individual frames, which map to messages within a logical stream. In turn, many streams can be exchanging messages, in parallel, within a single TCP connection.

Request and Response Multiplexing

With HTTP 1.x, if the client wants to make multiple parallel requests to improve performance, then multiple TCP connections must be used; see “Using Multiple TCP Connections”. This behavior is a direct consequence of the HTTP 1.x delivery model, which ensures that only one response can be delivered at a time (response queuing) per connection. Worse, this also results in head-of-line blocking and inefficient use of the underlying TCP connection.

The new binary framing layer in HTTP 2.0 removes these limitations, and enables full request and response multiplexing, by allowing the client and server to break down an HTTP message into independent frames (Figure 12-3), interleave them, and then reassemble them on the other end.

HTTP 2.0 request and response multiplexing within a shared connection
Figure 12-3. HTTP 2.0 request and response multiplexing within a shared connection

The snapshot in Figure 12-3 captures multiple streams in flight within the same connection: the client is transmitting a DATA frame (stream 5) to the server, while the server is transmitting an interleaved sequence of frames to the client for streams 1 and 3. As a result, there are three parallel request-response exchanges in flight!

The ability to break down an HTTP message into independent frames, interleave them, and then reassemble them on the other end is the single most important enhancement of HTTP 2.0. In fact, it introduces a ripple effect of numerous performance benefits across the entire stack of all web technologies, enabling us to do the following:

  • Interleave multiple requests in parallel without blocking on any one
  • Interleave multiple responses in parallel without blocking on any one
  • Use a single connection to deliver multiple requests and responses in parallel
  • Deliver lower page load times by eliminating unnecessary latency
  • Remove unnecessary HTTP 1.x workarounds from our application code
  • And much more…

The new binary framing layer in HTTP 2.0 resolves the head-of-line blocking problem found in HTTP 1.1 and eliminates the need for multiple connections to enable parallel processing and delivery of requests and responses. As a result, this makes our applications faster, simpler, and cheaper to deploy.

Support for request and response multiplexing allows us to eliminate many of the HTTP 1.x workarounds, such as concatenated files, image sprites, and domain sharding; see “Optimizing for HTTP 1.x”. Similarly, by lowering the number of required TCP connections, HTTP 2.0 also lowers the CPU and memory costs for both the client and server.

Request Prioritization

Once an HTTP message can be split into many individual frames, the exact order in which the frames are interleaved and delivered can be optimized to further improve the performance of our applications. To facilitate this, each stream can be assigned a 31-bit priority value:

  • 0 represents the highest priority stream.
  • \(2^{31}-1\) represents the lowest priority stream.

With priorities in place, the client and server can apply different strategies to process individual streams, messages, and frames in an optimal order: the server can prioritize stream processing by controlling the allocation of resources (CPU, memory, bandwidth), and once the response data is available, prioritize delivery of high-priority frames to the client.

HTTP 2.0 does not specify any specific algorithm for dealing with priorities, it just provides the mechanism by which the priority data can be exchanged between client and server. As such, priorities are hints, and the prioritization strategy can vary based on the implementation of client and server: the client should provide good priority data, and the server should adapt its processing and delivery based on indicated stream priorities.

As a result, while you may not control the quality of sent priority data from the client, chances are you can control the server; choose your HTTP 2.0 server carefully! To illustrate the point, let’s consider the following questions:

  • What if the server disregards all priority information?
  • Should higher-priority streams always take precedence?
  • Are there cases where different priority streams should be interleaved?

If the server disregards all priority information, then it may unintentionally slow the application—e.g., block browser rendering—which may be waiting for critical CSS and JavaScript, by sending images instead. However, dictating a strict priority ordering may also generate suboptimal scenarios, as it may reintroduce the head-of-line blocking problem—e.g., a single slow request unnecessarily blocking delivery of other resources.

Frames from multiple priority levels can and should be interleaved by the server. Where possible, high-priority streams should be given precedence, both during the processing stage and with respect to the bandwidth allocation between client and server. However, to make the best use of the underlying connection, a mix of priority levels is required.

One Connection Per Origin

With the new binary framing mechanism in place, HTTP 2.0 no longer needs multiple TCP connections to multiplex streams in parallel; each stream is split into many frames, which can be interleaved and prioritized. As a result, all HTTP 2.0 connections are persistent, and only one connection should be used between the client and server.

 

Through lab measurements, we have seen consistent latency benefits by using fewer connections from the client. The overall number of packets sent by HTTP 2.0 can be as much as 40% less than HTTP. Handling large numbers of concurrent connections on the server also does become a scalability problem, and HTTP 2.0 reduces this load.

 
  -- HTTP/2.0 Draft 2

One connection per origin significantly reduces the associated overhead: fewer sockets to manage along the connection path, smaller memory footprint, and better connection throughput. Plus, many other benefits at all layers of the stack:

  • Consistent prioritization between all streams
  • Better compression through use of a single compression context
  • Improved impact on network congestion due to fewer TCP connections
  • Less time in slow-start and faster congestion and loss recovery

Most HTTP transfers are short and bursty, whereas TCP is optimized for long-lived, bulk data transfers. By reusing the same connection between all streams, HTTP 2.0 is able to make more efficient use of the TCP connection.

The move to HTTP 2.0 should not only reduce the network latency, but also help improve throughput and reduce the operational costs!

Flow Control

Multiplexing multiple streams over the same TCP connection introduces contention for shared bandwidth resources. Stream priorities can help determine the relative order of delivery, but priorities alone are insufficient to control how the resource allocation is performed between the streams or multiple connections. To address this, HTTP 2.0 provides a simple mechanism for stream and connection flow control:

  • Flow control is hop-by-hop, not end-to-end.
  • Flow control is based on window update frames: receiver advertises how many bytes it is prepared to receive on a stream and for the entire connection.
  • Flow control window size is updated by a WINDOW_UPDATE frame, which specifies the stream ID and the window size increment value.
  • Flow control is directional: receiver may choose to set any window size that it desires for each stream and for the entire connection.
  • Flow control can be disabled by a receiver, both for an individual stream or for the entire connection.

When the HTTP 2.0 connection is established, the client and server exchange SETTINGS frames, which set the flow control window sizes in both directions. Optionally, either side can also disable flow control on an individual stream or the entire connection.

Does the preceding list remind you of TCP flow control? It should; the mechanism is effectively identical—see “Flow Control”. However, because TCP flow control cannot differentiate among the many streams within a single HTTP 2.0 connection, it is insufficient on its own. Hence the reason for HTTP 2.0 flow control.

The HTTP 2.0 standard does not specify any specific algorithm, values, or when the WINDOW_UPDATE frames should be sent: the implementers are able to select their own algorithm to match their use case and deliver the best performance.

In addition to priority, which determines the relative order of delivery, flow control can regulate the amount of resources consumed by each stream within an HTTP 2.0 connection: the receiver can advertise a lower window size on a specific stream to limit the rate at which the data is delivered!

Server Push

A powerful new feature of HTTP 2.0 is the ability of the server to send multiple replies for a single client request. That is, in addition to the response for the original request, the server can push additional resources to the client (Figure 12-4), without the client having to explicitly request each one!

Server initiates new streams (promises) for push resources
Figure 12-4. Server initiates new streams (promises) for push resources

When the HTTP 2.0 connection is established, the client and server exchange SETTINGS frames, which can limit the maximum number of concurrent streams in both directions. As a result, the client can limit the number of pushed streams or disable server push entirely by setting this value to zero.

Why would we need such a mechanism? A typical web application consists of dozens of resources, all of which are discovered by the client by examining the document provided by the server. As a result, why not eliminate the extra latency and let the server push the associated resources to the client ahead of time? The server already knows which resources the client will require; that’s server push. In fact, if you have ever inlined a CSS, JavaScript, or any other asset via a data URI (see “Resource Inlining”), then you already have hands-on experience with server push!

By manually inlining the resource into the document, we are, in effect, pushing that resource to the client, without waiting for the client to request it. The only difference with HTTP 2.0 is that we can now move this workflow out of the application and into the HTTP protocol itself, which offers important benefits:

  • Pushed resources can be cached by the client.
  • Pushed resources can be declined by the client.
  • Pushed resources can be reused across different pages.
  • Pushed resources can be prioritized by the server.

All pushed resources are subject to the same-origin policy. As a result, the server cannot push arbitrary third-party content to the client; the server must be authoritative for the provided content.

In effect, server push obsoletes most of the cases where inlining is used with HTTP 1.x. The only case where direct resource inlining still makes sense is if the inlined resource is needed on only a single page, and the resource does not incur high encoding overhead; see “Resource Inlining”. In all other cases, your application should be using HTTP 2.0 server push!

Implementing HTTP 2.0 server push

Server push opens many new possibilities for optimized delivery of our applications. However, how does the server determine which resources can or should be pushed? As with prioritization, the HTTP 2.0 standard does not specify any specific algorithm and the decision is left to the implementers. As a result, there are many possible strategies, each of which can be tailored to the context of the application or the server in use:

  • The application can explicitly initiate server push within its application code. This requires tight coupling with the HTTP 2.0 server in use but provides full control to the developer.
  • The application can signal to the server the associated resources it wants pushed via an additional HTTP header. This decouples the application from the HTTP 2.0 server API—e.g., Apache’s mod_spdy looks for X-Associated-Content header, which lists the resources to be pushed.
  • The server can automatically learn the associated resources without relying on the application. The server can parse the document and infer the resources to be pushed, or it can analyze the incoming traffic and make the appropriate decisions—e.g., the server can collect the dependency data based on the Referrer header, and then automatically push the critical resources to the client.

This is not a complete list of strategies, but it illustrates the wide range of possibilities: from hands on with the low-level API and all the way through to fully automated implementation. Similarly, should the server push the same resources every time, or could it implement a smarter strategy? The server can be smart and try to infer which resources are in cache based on its own model, client cookie, or another mechanism, and act accordingly. Long story short, server push opens up a lot of new opportunities for innovation.

Finally, it is important to note that pushed resources go directly into the client’s cache, just as if the client initiated the request. There is no client-side API, or JavaScript callbacks, that serve as notifications that a push resource has arrived. The entire mechanism is transparent to web applications running within the browser.

Header Compression

Each HTTP transfer carries a set of headers that describe the transferred resource and its properties. In HTTP 1.x, this metadata is always sent as plain text and adds anywhere from 500–800 bytes of overhead per request, and kilobytes more if HTTP cookies are required; see “Measuring and Controlling Protocol Overhead”. To reduce this overhead and improve performance, HTTP 2.0 compresses header metadata:

  • Instead of retransmitting the same data on each request and response, HTTP 2.0 uses "header tables" on both the client and server to track and store previously sent key-value pairs.
  • Header tables persist for the entire HTTP 2.0 connection and are incrementally updated both by the client and server.
  • Each new header key-value pair is either appended to the existing table or replaces a previous value in the table.

As a result, both sides of the HTTP 2.0 connection know which headers have been sent, and their previous values, which allows a new set of headers to be coded as a simple difference (Figure 12-5) from the previous set.

Differential coding of HTTP 2.0 headers
Figure 12-5. Differential coding of HTTP 2.0 headers

The definitions of the request and response header fields in HTTP 2.0 remain unchanged, with a few minor exceptions: all header keys are lowercase, and the request line is now split into individual :method, :scheme, :host, and :path key-value pairs.

In the previous example, the second request needs to communicate only the single path header that has changed between requests; all other headers are inherited from the previous working set. As a result, HTTP 2.0 avoids transmitting redundant header data, which significantly reduces the overhead of each request.

Common key-value pairs that rarely change throughout the lifetime of a connection (e.g., user-agent, accept header, and so on), need to be transmitted only once. In fact, if no headers change between requests (e.g., a polling request requesting the same resource), then the header overhead is zero bytes. All headers are automatically inherited from the previous request!

Efficient HTTP 2.0 Upgrade and Discovery

The switch to HTTP 2.0 cannot happen overnight: millions of servers must be updated to use the new binary framing, and billions of clients must similarly update their browsers and networking libraries.

The good news is, most modern browsers use efficient background update mechanisms, which will enable HTTP 2.0 support quickly and with minimal intervention for a large portion of existing users. However, despite this, some users will be stuck on older browsers, and servers and intermediaries will also have to be updated to support HTTP 2.0, which is a much longer, and labor- and capital-intensive, process.

HTTP 1.x will be around for at least another decade, and most servers and clients will have to support both 1.x and 2.0 standards. As a result, an HTTP 2.0 capable client must be able to discover whether the server, and any and all intermediaries, support the HTTP 2.0 protocol when initiating a new connection. There are three cases to consider:

  • Initiating a new HTTPS connection via TLS and ALPN
  • Initiating a new HTTP connection with prior knowledge
  • Initiating a new HTTP connection without prior knowledge

Application Layer Protocol Negotiation (ALPN) is used to discover and negotiate HTTP 2.0 support as part of the regular HTTPS negotiation; see “TLS Handshake” and “Application Layer Protocol Negotiation (ALPN)”. Reducing network latency is a critical criteria for HTTP 2.0, and for this reason ALPN negotiation is always used when establishing an HTTPS connection.

Establishing an HTTP 2.0 connection over a regular, non-encrypted channel will require a bit more work. Because both HTTP 1.0 and HTTP 2.0 run on the same port (80), in absence of any other information about server support for HTTP 2.0, the client will have to use the HTTP Upgrade mechanism to negotiate the appropriate protocol:

GET /page HTTP/1.1
Host: server.example.com
Connection: Upgrade, HTTP2-Settings
Upgrade: HTTP/2.0 1
HTTP2-Settings: (SETTINGS payload) 2

HTTP/1.1 200 OK 3
Content-length: 243
Content-type: text/html

(... HTTP 1.1 response ...)

          (or)

HTTP/1.1 101 Switching Protocols 4
Connection: Upgrade
Upgrade: HTTP/2.0

(... HTTP 2.0 response ...)

1

Initial HTTP 1.1 request with HTTP 2.0 upgrade header

2

Base64 URL encoding of HTTP/2.0 SETTINGS payload

3

Server declines upgrade, returns response via HTTP 1.1

4

Server accepts HTTP 2.0 upgrade, switches to new framing

Using the preceding Upgrade flow, if the server does not support HTTP 2.0, then it can immediately respond to the request with HTTP 1.1 response. Alternatively, it can confirm the HTTP 2.0 upgrade by returning the 101 Switching Protocols response in HTTP 1.1 format and then immediately switch to HTTP 2.0 and return the response using the new binary framing protocol. In either case, no extra roundtrips are incurred.

To confirm that both the server and client are knowingly electing to speak HTTP 2.0, both also have to send a "connection header," which is a well-known sequence of bytes defined in the standard. This exchange acts as a "fail-fast" mechanism to avoid clients, servers, and intermediaries that sometimes accept the requested upgrade without understanding the new protocol. This exchange does not incur any extra roundtrips, just a few extra bytes at the beginning of the connection.

Finally, if the client chooses to, it may also remember or obtain the information about HTTP 2.0 support through some other means—e.g., DNS record, manual configuration, and so on—instead of having to rely on the Upgrade workflow. Armed with this knowledge, it may choose to send HTTP 2.0 frames right from the start, over an unencrypted channel, and hope for the best. In the worst case, the connection will fail and the client will fall back to Upgrade workflow or switch to a TLS tunnel with ALPN negotiation.

Brief Introduction to Binary Framing

At the core of all HTTP 2.0 improvements is the new binary, length-prefixed framing layer. Compared with the newline delimited plaintext HTTP 1.x, binary framing offers more compact representation and is both easier and more efficient to process in code.

Once an HTTP 2.0 connection is established, the client and server communicate by exchanging frames, which serve as the smallest unit of communication within the protocol. All frames share a common 8-byte header (Figure 12-6), which contains the length of the frame, its type, a bit field for flags, and a 31-bit stream identifier.

Common 8-byte frame header
Figure 12-6. Common 8-byte frame header
  • The 16-bit length prefix tells us that a single frame can carry \(2^{16}-1\) bytes of data: ~64 KB, which excludes the 8-byte header size.
  • The 8-bit type field determines how the rest of the frame is interpreted.
  • The 8-bit flags field allows different frame types to define frame-specific messaging flags.
  • The 1-bit reserved field is always set to 0.
  • The 31-bit stream identifier uniquely identifies the HTTP 2.0 stream.

When debugging HTTP 2.0 traffic, some may prefer to work with their favorite hex viewer. Alternatively, there are plug-ins for Wireshark and similar tools that present a much easier and human-friendly representation—e.g., Google Chrome allows you to inspect the decoded exchange in chrome://internals#spdy.

Given this knowledge of the shared HTTP 2.0 frame header, we can now write a simple parser that can examine any HTTP 2.0 bytestream and identify different frame types, report their flags, and report the length of each by examining the first eight bytes of every frame. Further, because each frame is length-prefixed, the parser can skip ahead to the beginning of the next frame both quickly and efficiently, a big performance improvement over HTTP 1.x.

Once the frame type is known, the remainder of the frame can be interpreted by the parser. The HTTP 2.0 standard defines the following types:

DATA

Used to transport HTTP message bodies

HEADERS

Used to communicate additional header fields for a stream

PRIORITY

Used to assign or reassign priority of referenced resource

RST_STREAM

Used to signal abnormal termination of a stream

SETTINGS

Used to signal configuration data about how two endpoints may communicate

PUSH_PROMISE

Used to signal a promise to create a stream and serve referenced resource

PING

Used to measure the roundtrip time and perform "liveness" checks

GOAWAY

Used to inform the peer to stop creating streams for current connection

WINDOW_UPDATE

Used to implement flow control on a per-stream or per-connection basis

CONTINUATION

Used to continue a sequence of header block fragments

The GOAWAY frame allows the server to indicate to the client the last processed stream ID, which eliminates a number of request races and allows the browser to intelligently retry or cancel "in-flight" requests. An important and necessary feature for enabling safe multiplexing!

The exact implementation of the preceding taxonomy of frames is mostly only relevant to server and client implementers, who need to worry about the semantics of flow control, error handling, connection termination, and many other details. And the good news is that all of these are covered extensively in the official standard. If you are curious, check out the latest draft.

Having said that, even though the framing layer is hidden from our applications, it is useful for us to go just one step further and look at the two most common workflows: initiating a new stream and exchanging application data. Having an intuition for how a request, or a response, is translated into individual frames can help answer a lot of questions about HTTP 2.0 performance.

Initiating a New Stream

Before any application data can be sent, a new stream must be created and the appropriate metadata, such as stream priority and HTTP headers, must be sent. With HTTP 2.0, both the client and the server can initiate new streams; hence there are two cases to consider:

  • The client initiates a new request by sending a HEADERS frame (Figure 12-7), which includes the common header with a new stream ID, an optional 31-bit priority value, and a set of HTTP header key-value pairs within its payload.
  • The server initiates a push stream by sending a PUSH_PROMISE frame, which is effectively identical to a HEADERS frame, except that it carries an extra "promised stream ID," instead of a priority value.
HEADERS frame with optional priority
Figure 12-7. HEADERS frame with optional priority

Both types of frames are used to communicate only the metadata about each new stream; the payload is delivered independently, within the DATA frames. Also, because both sides can initiate new streams, the stream counters are offset: client-initiated streams have odd-numbered stream IDs and server-initiated streams have even-numbered stream IDs. This offset eliminates collisions in stream IDs between the server and the client: each keeps a simple counter, and increments it when initiating a new stream.

Because stream metadata delivery is separate from application data, the client and server can manage each with different priorities—e.g., "control traffic" can be delivered with higher priority, and flow control is applied only to DATA frames.

Sending Application Data

Once a new stream is created and the HTTP headers are sent, DATA frames (Figure 12-8) are used to send the application payload if one is present. The payload can be split between multiple DATA frames, with the last frame indicating the end of message by toggling the END_STREAM flag in the header of the frame.

DATA frame
Figure 12-8. DATA frame

No extra encoding or compression is performed on the payload. The choice of the encoding mechanism is deferred to the application or server—e.g., plain text, gzip compression, or the choice of image or video compression format. And with that, there is literally nothing more to say about the DATA frame! The entire frame consists of the common 8-byte header, followed by the HTTP payload.

Technically, the length field of the DATA frame allows payloads of up to \(2^{16}-1\) (65535) bytes per frame. However, to reduce head-of-line blocking, the HTTP 2.0 standard requires that DATA frames not exceed \(2^{14}-1\) (16383) bytes per frame—messages that exceed this threshold must be broken up into multiple DATA frames.

Analyzing HTTP 2.0 Frame Data Flow

With basic knowledge of the different frame types, we can now revisit the diagram (Figure 12-9) we encountered earlier in “Request and Response Multiplexing” and analyze the data flow.

HTTP 2.0 request and response multiplexing within a shared connection
Figure 12-9. HTTP 2.0 request and response multiplexing within a shared connection
  • There are three active streams: 1, 3, and 5.
  • All three stream IDs are odd; all three are client-initiated streams.
  • There are no server-initiated streams in this exchange.
  • The server is sending multiple DATA frames for stream 1, which carry the application response to the client’s earlier request. This also indicates that the response HEADERS frame was transferred earlier.
  • The server has interleaved a HEADERS and DATA frame for stream 3 between the DATA frames for stream 1—response multiplexing in action!
  • The client is transferring a DATA frame for stream 5, which indicates that a HEADERS frame was transferred earlier.

In short, the preceding connection is currently multiplexing three streams in parallel, each at various stages of the processing cycle. The server determines the order of the frames, and we do not have to worry about the type or content of each stream. Stream 1 could be a large data transfer or a video stream, but it does not block the other streams within the shared connection!