Chapter 6. Media Type Selection and Design

A good contract lets friends stay friends.

It is common to hear developers explain how REST-based systems are better than alternatives for building Web APIs because they are simpler. The lack of contracts, such as WSDL (Web Service Description Language), is often cited as one of the reasons for the simplicity.

However, when building distributed systems, you cannot avoid coming to some kind of prearranged agreement between components. Without some form of shared knowledge, those components cannot interact in a meaningful way.

This chapter is about the types of contracts used in web architecture, the process of selecting the best contracts to meet our needs, and identifying where it’s necessary to create new contracts.

Self-Description

One of the key concepts around designing contracts is self-description. Ideally, a message should contain all the information the recipient needs to understand the intent of the sender, or at least provide references to where the necessary information can be found.

Imagine you receive a letter in the mail that has the numbers 43.03384,–71.07338 written on it. That letter provides you with all the data that you need to achieve a very specific goal, but you are missing all of the context required to do anything useful with it. If I were to tell you that the pair of numbers were latitude and longitude coordinates, then you would have some understanding of their meaning. Obviously, I am assuming that you either already understand that coordinate system or are capable of searching for information on how to use it. Self-descriptive does not mean that the message needs to actually include a description of the longitude/latitude system. It simply needs to reference it in some way.

Knowing that the information included in the letter is coordinates is only half the story. What are you supposed to do with those coordinates? More context is needed for you to know what you should do with that information. If you see that the letter’s return address is "Great Places To Drink, PO Box 2000, Nevada," you have a hint that maybe the coordinates are the location of a recommended pub.

Types of Contracts

In the web world, media types are used to convey what a resource represents, and a link relation suggests why you should care about that resource. These are the contracts that we can use when building evolvable systems. These pieces of the architecture represent the shared knowledge. These are the parts of the system that we have to be careful to design well, because when they change, they could break components that depend on them.

Media Types

Media types are platform-independent types designed for communication between distributed systems. Media types carry information. How that information is represented is defined in a written specification.

Unfortunately, the potential for media types is woefully underused in the world of Web APIs. The vast majority of Web APIs limit their support to application/xml and application/json. These two media types have very little capability to carry meaningful semantics and often lead people to use out-of-band knowledge to interpret them. Out-of-band knowledge is the opposite of self-descriptive. To return to our letter example, if the "Great Places To Drink" company were to tell you ahead of your receiving the letter that the numbers written on your letter would be geographic coordinates, that would be considered out-of-band knowledge. The information needed to interpret the data is communicated in some way separate from the message itself. In the case of generic types like application/xml and application/json, it requires us to communicate the semantics of the message in some other way. Depending on how we do that, it can make evolving a system much more difficult because it requires communicating changes to the system in that out-of-band manner. When out-of-band knowledge is used, clients assume that the server will send certain content and therefore they are unable to automatically adapt if a server returns something different. The result is that server behavior becomes locked down by the existence of clients. This can become a major inhibitor of change.

Primitive Formats

This section includes examples that show how the same set of data is communicated through different media types.

The media type application/octet-stream, shown in Example 6-1, is about as basic a media type as you can get. It is simply a stream of bytes. User agents who receive this media type usually cannot do anything with the payload other than allow the user to save the bytes in a file. There are no application semantics defined at all.

Example 6-1. A stream of bytes
 GET /some-mystery-resource
 200 OK
 Content-Type: application/octet-stream
 Content-Length: 20

 00 3b 00 00 00 0d 00 01 00 11 00 1e 00 08 01 6d 00 03 FF FF

In Example 6-2, media type text/plain tells us that the content can be safely rendered directly to a end user, who will be able to read the data. In its current form, the example body does not provide any hints as to what the data is for; however, there is nothing to stop a server from including a paragraph of prose in the body with an explanation of the information.

Example 6-2. Human readable
 GET /some-mystery-resource
 200 OK
 Content-Type: text/plain
 Content-Length: 29

 59,0,13,1,17,30,8,365,3,65535

In Example 6-3, the media type text/csv provides some structure to the information being returned. The data model is defined as a set of comma-separated values that are then broken down into lines of (usually) structurally similar data. We still have no idea what the data is, but we could at least format the data for presentation to a user, assuming the user knows what she is looking at.

Example 6-3. Simply structured data
 GET /some-mystery-resource
 200 OK
 Content-Type: text/csv
 Content-Length: 29

 59,0
 13,1
 17,30
 8,365
 3,65535

New Formats

Now let’s consider Example 6-6.

Example 6-6. Service-specific format
 GET /some-mystery-resource
 200 OK
 Content-Type: application/vnd.acme.cache-stats+xml
 Content-Length: ??

 <cacheStats>
        <cacheMaxAge percent="59" daysLowerLimit="0" daysUpperLimit="0">
        <cacheMaxAge percent="13" daysLowerLimit="0" daysUpperLimit="1">
        <cacheMaxAge percent="17" daysLowerLimit="1" daysUpperLimit="30">
        <cacheMaxAge percent="8" daysLowerLimit="30" daysUpperLimit="365">
        <cacheMaxAge percent="3" daysLowerLimit="365" daysUpperLimit="65535">
 </cacheStats>

This media type finally conveys that the data we have been dealing with is the series of data points for a graph that shows the frequency distribution of the length of the max-age cache control header of requests on the Internet. This media type provides all the information a client needs for rendering a graph of this information. However, the applicability of this media type is fairly specific. How often is someone going to write an application that needs to render a graph of caching statistics? The idea of writing a specification and submitting that specification to IANA for registration seems like overkill. The vast majority of today’s Web APIs create these narrowly focused payloads but just don’t bother with the specification and registration part of the process. There are alternatives, though, that provide all the information needed by the client, and yet can be applicable to far more scenarios.

Consider the scenario in Example 6-7.

Example 6-7. Domain-specific format
 GET /some-mystery-resource
 200 OK
 Content-Type: application/data-series+xml
 Content-Length: ??

 <series        xAxisType="range"
                        yAxisType="percent"
                        title="% of requests with their max-age value in days">
        <dataPoint yValue="59" xLowerValue="0" xUpperValue="0">
        <dataPoint yValue="13" xLowerValue="0" xUpperValue="1">
        <dataPoint yValue="17" xLowerValue="1" xUpperValue="30">
        <dataPoint yValue="8" xLowerValue="30" xUpperValue="365">
        <dataPoint yValue="3" xLowerValue="365" xUpperValue="65535">
 </series>

In this case, we have created a media type whose purpose is to deliver a series of data points that can be used to plot a graph. It could be a line graph, a pie chart, a histogram, or even just a table of data. The client can understand the semantics of the data points from the perspective of drawing a graph. It doesn’t know what the graph is about; that is left for the human consumer to appreciate. However, the additional semantics allow the client to do things like overlay graphs, switch axes, and zoom to portions of the graph.

The reusability of this media type is vastly higher than that of the application/vnd.acme.cachestats+xml. Any application scenario where there is some data to be graphed could make use of this media type. The time and effort put into writing a specification to completely describe this format could quickly pay off.

It is my opinion that this kind of domain-specific, but not service-specific, media type is the optimal balance of semantics that media types should convey. There are a few examples of this sort of media type that have proven quite successful:

  • HTML was conceived as a way of communicating hyperlinked textual documents.
  • Atom was designed as a way to enable syndication of web-based blogs.
  • ActivityStream is a way of representing streams of events.
  • Json-home is designed to enable discovery of resources made available in an API.
  • Json-problem is a media type designed to provide details on errors returned from an API.

All of these media type examples have several things in common. They have semantics that are intended to solve a specific problem, but they are not specific to any particular application. Every API needs to return error information; most applications have areas where there is some stream of events. These media types are defined in a completely platform- and language-agnostic way, which means that they can be used by every developer in any type of application. There remain many opportunities for defining new media types that would be reusable across many applications.

Hypermedia Types

Hypermedia types are a class of media types that are usually text based and contain links to other resources. By providing links within representations, user agents are able to navigate from one representation to another based on their understanding of the meaning of the link.

Hypermedia types play a huge role in decoupling clients from servers. Through hypermedia, clients no longer need to have preexisting knowledge of resources exposed on the Web. Resources can be discovered at runtime.

Despite the obvious benefits of hypermedia in HTML for web applications, hypermedia has so far played a very minor role in the development of Web APIs. Web application developers have tended to avoid hypermedia due to lack of tooling, a perception that links create unnecessary bloat in the size of representations, and a general lack of appreciation of its benefits.

There are scenarios where the cost of hypermedia cannot be justified—for example, when performance is absolutely critical. For performance-critical systems, the HTTP protocol is probably not the best choice either. When evolvability is a key goal in an HTTP-based system, hypermedia cannot be ignored.

Media Type Explosion

So far we have seen how generic media types require out-of-band knowledge to provide semantics, and we have seen examples of more specific media types that carry domain semantics. Some members of the web development community are reluctant to encourage the creation of new media types. There is a fear of an "explosion of media types"—that the creation of a large number of new media types would produce badly designed specifications, duplicated efforts, and service-specific types, and any possibility of serendipitous reuse would be severely hindered. It is not an unfounded fear, but it’s likely that, just as with evolution, the strong would survive and the weak would have little impact.

Generic Media Types and Profiles

There is another approach to media types that is favored by some. Its basic premise is to use a more generic media type and use a secondary mechanism to layer semantics onto the representation.

One example of this is a media type called Resource Description Framework (RDF). To dramatically oversimplify, RDF is a media type that allows you to make statements about things using triples, where a triple consists of a subject, an object, and a predicate that describes the relationship between the subject and object. The significant portion of the semantics of an RDF representation are provided by standardized vocabularies of predicates that have documented meanings. The RDF specification provides the way to relate pieces of data together but does not actually define any domain semantics itself.

In Example 6-8, taken from the RDF Wikipedia entry, the URI http://purl.org./dc/elements/1.1 refers to a vocabulary defined by the Dublin Core Metadata Initiative.

Example 6-8. RDF Example
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
        <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
                <dc:title>Tony Benn</dc:title>
                <dc:publisher>Wikipedia</dc:publisher>
        </rdf:Description>
</rdf:RDF>

Another example of layering semantics is the use of application-level profile semantics (ALPS). ALPS is a method of specifying domain semantics that can then be applied to a base media type like XHTML, as shown in Example 6-9. The recently ratified link relation profile is a way of attaching these kinds of additional semantic specifications to existing media types.

Example 6-9. ALPS over XHTML
 GET /some-mystery-resource
 200 OK
 Content-Type: application/xhtml
 Content-Length: 29

 <html>
        <head>
        <link rel="profile" href="http://example.org/profiles/stats" />
    </head>
        <title>% of requests with their cache-control: max-age value in days </title>
        <body>
                <table class="data-series">
                        <thead>
                                <td>from</td>
                                <td>to (days)</td>
                                <td>percent</td>
                        </thead>
                        <tr class="data-point">
                                <td class="xLowerValue"></td>
                                <td class="xUpperValue">0</td>
                                <td class="yValue">59</td>
                        </tr>
                        <tr class="data-point">
                                <td class="xLowerValue">0</td>
                                <td class="xUpperValue">1</td>
                                <td class="yValue">13</td>
                        </tr>
                        <tr class="data-point">
                                <td class="xLowerValue">1</td>
                                <td class="xUpperValue">30</td>
                                <td class="yValue">17</td>
                        </tr>
                        <tr class="data-point">
                                <td class="xLowerValue">30</td>
                                <td class="xUpperValue">365</td>
                                <td class="yValue">8</td>
                        </tr>
                        <tr class="data-point">
                                <td class="xLowerValue">365</td>
                                <td class="xUpperValue"></td>
                                <td class="yValue">3</td>
                        </tr>
                </table>
        </body>
 </html>


GET http://example.org/profiles/stats
200 OK
Content-Type: application/alps+xml

<alps version="1.0">
    <doc format="text">
        Types to support the domain of statistical data
    </doc>

   <descriptor id="data-series" type="semantic">
        <descriptor id="data-point" type="semantic">
         <doc>A data point</doc>
                <descriptor id="xValue" type="semantic"">
                        <doc>X value on graph</doc>
                </descriptor>
                <descriptor id="xLowerValue" type="semantic">
                        <doc>Lower bound on X range of values</doc>
                </descriptor>
                <descriptor id="xUpperValue" type="semantic">
                        <doc>Upper bound on X range of values</doc>
                </descriptor>
                <descriptor id="yValue" type="semantic" >
                        <doc>Y value on graph</doc>
                </descriptor>
                <descriptor id="yLowerValue" type="semantic">
                        <doc>Lower bound on Y range of values</doc>
                </descriptor>
                <descriptor id="yUpperValue" type="semantic">
                        <doc>Upper bound on Y range of values</doc>
                </descriptor>
                </descriptor>
   </descriptor>


</alps>

The Hypermedia Application Language (HAL), demonstrated in Example 6-10, is a generic media type that uses link relations as a way to apply domain semantics.

Example 6-10. HAL in both application/hal+xml and application/hal+json
<resource       xAxisType="range"
                        yAxisType="percent"
                        title="% of requests with their max-age value in days">
        <resource       rel="http://example.org/stats/data-point"
                                yValue="59"
                                xLowerValue="0"
                                xUpperValue="0">
        <resource       rel="http://example.org/stats/data-point"
                                yValue="13"
                                xLowerValue="0"
                                xUpperValue="1">
        <resource       rel="http://example.org/stats/data-point"
                                yValue="17"
                                xLowerValue="1"
                                xUpperValue="30">
        <resource       rel="http://example.org/stats/data-point"
                                yValue="8"
                                xLowerValue="30"
                                xUpperValue="365">
        <resource       rel="http://example.org/stats/data-point"
                                yValue="3"
                                xLowerValue="365"
                                xUpperValue="65535">
 </resource>

 {
        "xAxisType" : "range",
        "yAxisType" : "percent",
        "title" : "% of requests with their max-age value in days",
        "_embedded" : {
                "http://example.org/stats/data-point" :
                { "yValue" : "59", "xLowerValue" : "0", "xUpperValue" : "0"},
                "http://example.org/stats/data-point" :
                { "yValue" : "13", "xLowerValue" : "0", "xUpperValue" : "1"},
                "http://example.org/stats/data-point" :
                { "yValue" : "17", "xLowerValue" : "1", "xUpperValue" : "30"},
                "http://example.org/stats/data-point" :
                { "yValue" : "8", "xLowerValue" : "30", "xUpperValue" : "365"},
                "http://example.org/stats/data-point" :
                { "yValue" : "3", "xLowerValue" : "365", "xUpperValue" : "65535"}

        }

 }

HAL relies on link relations to provide the semantics needed to interpret the non-HAL parts of the representation. This means that in the documentation for the link relation type http://example.org/stats/data-point, the meaning of yValue, xLowerValue, and xUpperValue needs to be defined. HAL doesn’t care whether these values are specified as attributes or elements; it is up to the consumer of the HAL document to discover where the information is stored.

One challenge with using the link relation to communicate semantics is that entry point URIs often do not have link relations. When you type a link into a browser address bar, there is no link relation. There are a couple of workarounds for this. You can keep the root resource limited to just embedded resources and links, or you can use a link with the link relation type to associate semantics to the root resource.

The advantage of using a generic media type is that tooling to generate, parse, and render those formats likely already exists and can be reused. It also means that you can define semantic profiles that have mappings to multiple different base media types. This can be advantageous when one particular media type is more suitable for a particular platform. When defining domain-specific media types, if it is desirable to support both XML and JSON variants, you must write two specifications because the format and semantics are both defined by the media type.

Efforts are under way to try to formalize the process of describing a semantic profile, and it is likely that there will be multiple viable approaches.

The disadvantage of using the generic media type combined with a secondary semantic profile is that the semantics of the message are less visible to intermediary components. Media types specified in the Content-Type header can easily be processed by any layer in the HTTP architecture. Using techniques like link relations and profiles to attach semantics makes it more difficult for intermediaries to discover that information and make processing decisions based on it. As is commonly the case in web architecture, there is not only one way to do things. There are advantages and disadvantages, and systems must be engineered to meet the requirements.

The use of a secondary semantic profile is definitely an interesting space to watch, and I look forward to a future where there are more prescriptive solutions to defining the semantics of a message.

From these examples, you can see there are many ways you can communicate the same data between the client and server. Some media types carry more semantics, some less. How much application-specific semantics you use to drive your client depends on the availability of existing standard media types that suit your needs and your tolerance for coupling.

Other Hypermedia Types

The last few years have seen a number of new hypermedia types introduced. The following sections summarize a few examples.

Collection+Json

The Collection+Json media type addresses the domain of lists of things. It is interesting in that it is generic and specific at the same time. It specifically supports only lists of things, but it does not care what the list of things is. It also has interesting semantic affordances that can describe how to query the list and add new items to it. Although it has minimal semantics for the items in the list, it does support the notion of profiles for describing the list items.

Siren

Siren is another fairly new hypermedia type that is similar to HAL in that it is effective in representing nested data structures and embedded resources. It differs from HAL in the way it attaches semantics to data elements. Siren borrows the notion of a class from HTML as a way to identify semantic information. It also makes a distinction between links that are for navigating between resources and those that represent behaviors. The action links also have their own style of link hints that describe to the client how to invoke the action.

Although some people argue that we should all just standardize on a single format, I would rather let natural selection take its course than try to force-fit a single hypermedia format into every scenario. HTTP makes it easy for APIs to support multiple representations of a resource, and clients can pick the one they prefer, so having multiple formats in use is not a major hindrance to progress.

Up to this point, we have talked primarily about media types as a way of communicating semantics, but as we hinted when discussing HAL, semantics can also be communicated via link relations. The next section will discuss this further.

Designing a New Media Type Contract

When trying to identify the best media type to use for a particular resource, you should always first look for standard media types. Creating media can be challenging and sometimes standard media types may not be an exact fit, but they may be capable of carrying enough semantics to allow the client to achieve the users' goals.

In the case where you determine that there is no existing media type or link relation that carries the needed semantics, then it may be worth considering creating one. When creating a media type, aim for the following characteristics:

  • The captured semantics could be used by more than one application.
  • The required syntax is minimal and makes an open world assumption. In other words, the absence of information does not make any statement about that information.
  • Unrecognized information should be ignored unless it specifically violates other rules.

Selecting a Format

Most of the time when developers think of selecting a format for their media type, they think of using XML or JSON as the base format. The advantage is that there is so much tooling available that can process these formats, and they provide a flexible structure on which to define additional semantics. Both formats have varying strengths and weaknesses, and in many cases it is simply a matter of preference. However, the types of clients that you wish to support do have an influence on the decision. If you expect that JavaScript clients will be the primary consumers of your media type, then JSON is the obvious choice. For systems that are integrating with large enterprise applications, XML may be more appropriate. Either way, as web developers we need to become comfortable with both formats and use whichever one makes the most sense.

However, I would caution against doing both. XML and JSON are quite different in their approach to data representation, and you risk creating a lowest-common-denominator format that doesn’t really take advantage of either format. If you really do feel you need to support both, then recognize that managing two different spec formats is double the work and the community using the media type will end up being fragmented. For generic formats like HAL, it may make sense to support both variants, but it is a decision that you should not take lightly: having two variants that work slightly differently may end up causing more confusion than it is worth to attract the larger audience that refuses to adopt the single chosen format.

Sometimes, neither XML nor JSON may be the right choice. I’ve seen numerous occasions where people have been using a JSON document for the purposes of updating a single property value. In some cases, a plain-text format representation is the simplest choice. All languages have libraries that allow converting from simple text into native data types. If all you want to do is transfer a simple value, then consider text/plain or some derivative of it as an option.

The use of text/plain format is a good example of why it is smart to keep metadata out of the body of your HTTP representation. Many APIs have taken this approach, including status codes and other metadata in the body of their responses. However, if you do this, you are limiting the media types that you can use and duplicating the intent of the HTTP headers. If you are forced to support clients that cannot access HTTP headers, then define special media types just for those clients. Try not to constrain your API for other, more capable clients.

Creative use of media types does not need to be limited to text-based types. In a blog post, Roy Fielding shows how to use a monochrome image format as a sparse array to identify resources that have changed in a single representation to avoid polling large numbers of them.

Enabling Hypermedia

As covered in Chapter 1, for media types that are not text based, the best option for hypermedia is to use link headers as defined in RFC 5988. For text-based formats, we discussed the variety of link syntaxes that are currently in use when covering link relations. However, there are a few other concerns that we need to address at the media type level. Should the media type allow two links with the same relation? If so, how can a user agent distinguish between those two links? In HAL, a link can have a name attribute associated with the link to identify it. HTML allows tags to have an id attribute associated for the purposes of identification.

Is there a need for hyperlinks to reference a particular element within an instance of the media type? Should a syntax be defined for identifying fragments?

What are the rules for resolving relative URIs?

Optional, Mandatory, Omitted, Applicable

When designing media types, especially ones used to represent writeable resources, I have found it necessary to convey semantics about the presence or absence of a particular piece of information. The most obvious scenario is that of a mandatory element. In the case of our Issue item media type, the title attribute is the one property that is mandatory.

For properties that are not mandatory, there are a number of reasons why a property may not be present in a representation. A resource may have chosen to include only a subset of properties in a representation for performance reasons, and therefore certain properties may be omitted. Another possibility is that the property is considered nonapplicable.

Applicability refers to when the relevance of one property is dependent on the value of another property of the resource. For example, in an employee record, there may be a TerminatedDate field. If the employee status is Current, then it is likely that the TerminatedDate field is not applicable. Database tables and classes don’t have the flexibility to change their shape dynamically per entity instance, so we often end up using null values to indicate that a property does not have any meaningful value. Unfortunately, a null value is also used to indicate that a property has not yet been supplied a value. This is not the same as a property being not applicable.

With media type representations, we can completely omit any syntax relating to the nonapplicable property and include the property syntax, but set the value as null or empty for those where a value has not yet been defined.

The policy of removing nonapplicable properties from representations can simplify client code and reduce coupling. There is often business logic that correlates the controlling property and the dependent properties. If the client assumes that the presence of a property infers applicability, then the client never needs to be aware of that business logic and can evolve without impacting the client.

The ability to distinguish between mandatory, applicable, and omitted properties is one example of how explicitly defined media types are more expressive than object serialization formats, as those formats are limited to the semantics that can be expressed by a class.

When defining representations that contain only a subset of the resource properties, you need to avoid ambiguity between information that is being omitted and information that’s nonapplicable. Attribute groups can sometimes help in these cases in the same way they allow partitioning of mandatory fields.

Embedded Versus External Metadata

Annotating representations is one way to include metadata such as a mandatory flag, type definitions, and range conditions. For example:

<foo>
    <fooDate required="true" type="Date" minValue="2001/01/01"
                 maxValue="2020/12/31">2010/04/12</fooDate>
</foo>

This approach makes it easy for the client to access the metadata because it will be parsed at the same time as the actual data. However, as the amount of metadata increases it can significantly increase the size of the representation, and usually the metadata changes far less frequently than the data does. Also, the same metadata can usually be reused for many resources that belong to a single resource class.

When a resource contains two distinct sets of data that have different lifetimes, often the best option is to break it into two resources and link the resource with the shorter lifetime to the resource with the longer lifetime. This allows caching layers to reduce the data transmitted across the network.

One challenging aspect of using external metadata is that we must correlate which pieces of metadata apply to which pieces of representation data. Some media types define a selection syntax that allows us to point to a specific fragment of data within a document. For example, CSS stylesheets use selectors, and XML-based representations can use XPath queries to identify nodes within a document.

Extensibility

Media types are the point of coupling between the client and the server. A breaking change to a media type specification will potentially break clients. Ensuring that media types are designed to be extensible will help to minimize breaking changes while accommodating changing requirements. Writing client code to handle extensible formats can be trickier, as you can’t make as many assumptions about the format. However, the up-front cost of writing more tolerant parsing code will quickly pay off.

One common tactic is to achieve extensibility to ignore unknown content. This may seem like counterintuitive advice to someone with lots of experience with schemas like XSD. However, this enables existing parsers to continue to process new versions of the media type that contain additional information. Returning to our minimal information model of an Issue, it could be represented in XML by:

<Issue>
   <Title>This is a bug</Title>
   <Description>Here are the details of the bug.</Description>
</Issue>

Assuming we wrote a parser that looked for the elements using the XPath queries /Issue/Title and /Issue/Description, then if the media type were enhanced to allow:

<Issue>
   <Title>This is a bug</Title>
   <FoundBy href='http://issueapi.example.org/user/342'/>
   <Description>Here are the details of the bug.</Description>
</Issue>

the existing parser would still be able to process this document even if there were missing information. What happens to that extra information is very much dependent on the use case. In some scenarios, it can be safely ignored. In others, the user should be warned about the fact that some information cannot be processed. Some services may choose to refuse to accept additional information that is not understood. All these are valid options, but there is no need to constrain the media type to support only one of the scenarios.

Another constraint that is often applied by XSD schemas is the ordering of elements. Unless the order of properties has some semantic significance, there is no need for a media type specification to enforce the order. I can imagine there are some simplicity and performance benefits for the parsing logic when properties appear in a specific order; however, once you allow unknown properties, those benefits are minimal. Facilitating extensibility is as much about avoiding unnecessary constraints as it is about applying constraints.

A media type specification should try to limit itself to constraints defined by the domain and not be limited by the implementation constraints of the service. For example, when a service is backed by a database, it is common to define field lengths. Field lengths are a constraint of the database used by the service; other implementations that use the media type will likely have different physical constraints. Arbitrarily enforcing a lowest-common-denominator constraint due to current implementation limitations is both unnecessary and unwise.

As a general rule of thumb, when it comes to defining a constraint in a media type, I ask myself if it is possible to parse the representation without the constraint and still convey the same meaning. If so, then I drop the constraint. The fewer the rules, the more likely it is for any extension to the media type to be harmless to existing code.

Registering the Media Type

In order for this "distributed type system" of media types to work, there needs to be a way for people to discover what valid types exist. The IANA registry is that central location. However, we have to be honest and say the current state of the IANA media type registry is pretty dismal. Currently, it consists of a few web pages with a bunch of links. Many of those links point to little more than an email message from 20 years ago. However, the fact that these entries still exist highlights the permanence of deploying types on the Internet. Once a new type has been let out onto the Internet, there is no guaranteed way of removing it. This is another reason why versioning media types can be problematic. There is no easy way to say "don’t use that version anymore."

Many of the IANA registries have been updated to a newer XML/XSLT/XHTML format that makes them easier to consume by crawlers. However, the media type registry has not yet had this makeover and remains a pain to harvest.

The registration process is also fairly barbaric. It suggests you read six different RFCs and then asks you to submit an HTML form. It is recommended that before you submit the application form, you publish the proposed specification on the Internet and send an announcement of intent to submit to the IETF Types mailing list. The experts who moderate this list will likely provide feedback to assist in any perceived problems with the proposal. Be aware that these experts provide this guidance for free and are not there to educate people on media type design. When participating in these lists, try not to misinterpret bluntness and brevity for hostility!

There definitely is a need for some community pressure on IANA to improve the media type registry and its registration process. This is a critical piece of Internet infrastructure, and yet many visitors get the impression it is obsolete and abandoned because of its unkempt appearance.

Media Types in the Issue Tracking Domain

In Chapter 4, we identified several different categories of resources: list resources, item resources, discovery resources, and search resources. For each of these resource categories, we need to identify which media types we believe are the most appropriate to carry the required semantics.

List Resources

For resources that return a list of items, we will be using the media type called collection+json. This is a hypermedia-enabled type designed explicitly to support lists of items. This media type supports associating an arbitrary set of data with each item in the list. It includes queries to enable searching for various subsets of items as well as a template property to facilitate creating a new item in the list.

We could have used HAL or even XHTML, as both are capable of representing a list of items; however, as collection+json is specifically designed for the purpose of representing lists, it seems a more natural fit. Example 6-13 demonstrates how collection+json can be used to represent a list of issues.

Example 6-13. Sample issue list
{
  "collection": {
    "href": "http://localhost:8080/Issue",
    "links": [],
    "items": [
      {
        "href": "http://localhost:8080/issue/1",
        "data": [
          {
            "name": "Description",
            "value": "This is an issue"
          },
          {
            "name": "Status",
            "value": "Open"
          },
          {
            "name": "Title",
            "value": "An issue"
          }
        ],
        "links": [
          {
            "rel": "http://webapibook.net/rels#issue-processor",
            "href": "http://localhost:8080/issueprocessor/1?action=transition"
          },
          {
            "rel": "http://webapibook.net/rels#issue-processor",
            "href": "http://localhost:8080/issueprocessor/1?action=close"
          }
        ]
      },
      {
        "href": "http://localhost:8080/issue/2",
        "data": [
          {
            "name": "Description",
            "value": "This is a another issue"
          },
          {
            "name": "Status",
            "value": "Closed"
          },
          {
            "name": "Title",
            "value": "Another Issue"
          }
        ],
        "links": [
          {
            "rel": "http://webapibook.net/rels#issue-processor",
            "href": "http://localhost:8080/issueprocessor/2?action=transition"
          },
          {
            "rel": "http://webapibook.net/rels#issue-processor",
            "href": "http://localhost:8080/issueprocessor/2?action=open"
          }
        ]
      }
    ],
    "queries": [
      {
        "rel": "http://webapibook.net/rels#search",
        "href": "/issue",
        "prompt": "Issue search",
        "data": [
          {
            "name": "SearchText",
            "prompt": "Text to match against Title and Description"
          }
        ]
      }
    ],
    "template": {
      "data": []
    }
  }
}

Item Resources

We have several options for representing each individual issue. We could use HAL and define a link relation issue that specifies the content. We could use XHTML and define a semantic profile that annotates the HTML with semantics from the issue tracking domain. Or we could define a new media type to represent an issue.

The notion of an issue is sufficiently generic that it could easily be reused by many services, and therefore it justifies the creation of a new media type. This is not a niche domain; it is one used by every software developer and many customer support call centers. Having an interoperable format, even if implementation variations prevent a full-fidelity communication, has the potential to be extremely valuable.

A sample representation of this media type is shown in Example 6-14. For the moment, this media type will be defined as JSON. Early adopters of web technology are more likely to be comfortable with JSON, and if the media type gains traction, then an XML variant will be defined to enable wider adoption.

The full specification for this media type can be found in Appendix E.

Example 6-14. Sample issue
{
  "id": "1",
  "title": "An issue",
  "description": "This is an issue",
  "status": "Open",
  "Links": [
    {
      "rel": "self",
      "href": "http://localhost:8080/issue/1"
    },
    {
      "rel": "http://webapibook.net/rels#issue-processor",
      "href": "http://localhost:8080/issueprocessor/1?action=transition",
      "action": "transition"
    },
    {
      "rel": "http://webapibook.net/rels#issue-processor",
      "href": "http://localhost:8080/issueprocessor/1?action=close",
      "action": "close"
    }
  ]
}

Discovery Resource

The discovery resource is an entry point resource that points to other resources that are available in the system. For this resource, we will be using a recently proposed media type called json-home. This media type is designed specifically to provide a representation for an entry point resource that allows dynamic discovery of resources. It is similar to the Atom Service Document but not limited to pointing to Atom feeds. The json-home document can have links to any arbitrary resource and can contain additional metadata that can be used to discover how to activate those links. Example 6-15 shows a possible json-home document for the Issue Tracker API.

Example 6-15. Sample root resource
{
  "resources": {
    "http://webapibook.net/rels#issue": {
      "href": "/issue/{id}",
      "hints": {
        "allow": [
          "GET"
        ],
        "formats": {
          "application/json": {},
          "application/vnd.issue+json": {}
        }
      }
    },
    "http://webapibook.net/rels#issues": {
      "href": "/issue",
      "hints": {
        "allow": [
          "GET"
        ],
        "formats": {
          "application/json": {},
          "application/vnd.collection+json": {}
        }
      }
    },
    "http://webapibook.net/rels#issue-processor": {
      "href": "/issueprocessor/{id}{?action}",
      "hints": {
        "allow": [
          "POST"
        ]
      }
    }
  }
}

Search Resource

For searching we will likely be able to rely on the query capability of collection+json. Where this proves insufficient, we will try to use the link relation search and the protocol defined by OpenSearch.

Conclusion

Media types and link relations are the tools used to manage the coupling between the components in your distributed application. This chapter has covered the different ways to use that coupling to communicate application semantics. Being aware of existing specifications, and how and when to create new ones, provides a solid foundation on which to actually start building an API. In the next chapter, we begin to write a sample API based on the knowledge we have gained.