Chapter 14. Programming for the Web

When you think about the Web, you probably think of web-based applications and services. If you are asked to go deeper, you may consider tools such as web browsers and web servers that support those applications and move data around the network. But it’s important to note that standards and protocols, not the applications and tools themselves, have enabled the Web’s growth. Since the earliest days of the Internet, there have been ways to move files from here to there, and document formats that were just as powerful as HTML, but there was not a unifying model for how to identify, retrieve, and display information, nor was there a universal way for applications to interact with that data over the network. Since the web explosion began, HTML has reigned supreme as a common format for documents, and most developers have at least some familiarity with it. In this chapter, we’re going to talk a bit about its cousin, HTTP, the protocol that handles communications between web clients and servers, and URLs, which provide a standard for naming and addressing objects on the Web. Java provides a very simple API for working with URLs to address objects on the Web. In this chapter, we’ll discuss how to write web clients that can interact with the servers using the HTTP GET and POST methods and also say a bit about web services, which are the next step up the evolutionary chain. In Chapter 15, we’ll jump over to the server side and take a look at servlets and web services, which are Java programs that run on web servers and implement the other side of these conversations.

Uniform Resource Locators (URLs)

A URL points to an object on the Internet. It’s a text string that identifies an item, tells you where to find it, and specifies a method for communicating with it or retrieving it from its source. A URL can refer to any kind of information source. It might point to static data, such as a file on a local filesystem, a web server, or an FTP site; or it can point to a more dynamic object such as an RSS news feed or a record in a database. URLs can even refer to more dynamic resources such as communication sessions and email addresses.

Because there are many different ways to locate an item on the Net and different mediums and transports require different kinds of information, URLs can have many forms. The most common form has four components: a network host or server, the name of the item, its location on that host, and a protocol by which the host should communicate:

 protocol://hostname/path/item-name
    

protocol (also called the “scheme”) is an identifier such as http or ftp; hostname is usually an Internet host and domain name; and the path and item components form a unique path that identifies the object on that host. Variants of this form allow extra information to be packed into the URL, specifying, for example, port numbers for the communications protocol and fragment identifiers that reference sections inside documents. Other, more specialized types of URLs such as “mailto” URLs for email addresses or URLs for addressing things like database components may not follow this format precisely, but do conform to the general notion of a protocol followed by a unique identifier. (Some of these would more properly be called URIs, which we’ll discuss later.)

Because most URLs have the notion of a hierarchy or path, we sometimes speak of a URL that is relative to another URL, called a base URL. In that case, we are using the base URL as a starting point and supplying additional information to target an object relative to that URL. For example, the base URL might point to a directory on a web server and a relative URL might name a particular file in that directory or in a subdirectory.

The URL Class

Bringing this down to a more concrete level is the Java URL class. The URL class represents a URL address and provides a simple API for accessing web resources, such as documents and applications on servers. It can use an extensible set of protocol and content handlers to perform the necessary communication and in theory even data conversion. With the URL class, an application can open a connection to a server on the network and retrieve content with just a few lines of code. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying your applications.

A URL is represented by an instance of the java.net.URL class. A URL object manages all the component information within a URL string and provides methods for retrieving the object it identifies. We can construct a URL object from a URL string or from its component parts:

try {
    URL aDoc =
      new URL( "http://foo.bar.com/documents/homepage.html" );
    URL sameDoc =
      new URL("http","foo.bar.com","documents/homepage.html");
} catch ( MalformedURLException e ) { ... }

These two URL objects point to the same network resource, the homepage.html document on the server foo.bar.com. Whether the resource actually exists and is available isn’t known until we try to access it. When initially constructed, the URL object contains only data about the object’s location and how to access it. No connection to the server has been made. We can examine the various parts of the URL with the getProtocol(), getHost(), and getFile() methods. We can also compare it to another URL with the sameFile() method (which has an unfortunate name for something that may not point to a file). sameFile() determines whether two URLs point to the same resource. It can be fooled, but sameFile() does more than compare the URL strings for equality; it takes into account the possibility that one server may have several names as well as other factors. (It doesn’t go as far as to fetch the resources and compare them, however.)

When a URL is created, its specification is parsed to identify just the protocol component. If the protocol doesn’t make sense, or if Java can’t find a protocol handler for it, the URL constructor throws a MalformedURLException. A protocol handler is a Java class that implements the communications protocol for accessing the URL resource. For example, given an http URL, Java prepares to use the HTTP protocol handler to retrieve documents from the specified web server.

As of Java 7, URL protocol handlers are guaranteed to be provided for http, https (secure HTTP), and ftp, as well as local file URLs and jar URLs that refer to files inside JAR archives. Outside of that, it gets a little dicey. We’ll talk more about the issues surrounding content and protocol handlers a bit later in this chapter.

Stream Data

The lowest-level and most general way to get data back from a URL is to ask for an InputStream from the URL by calling openStream(). Getting the data as a stream may also be useful if you want to receive continuous updates from a dynamic information source. The drawback is that you have to parse the contents of the byte stream yourself. Working in this mode is basically the same as working with a byte stream from socket communications, but the URL protocol handler has already dealt with all of the server communications and is providing you with just the content portion of the transaction. Not all types of URLs support the openStream() method because not all types of URLs refer to concrete data; you’ll get an UnknownServiceException if the URL doesn’t.

The following code prints the contents of an HTML file on a web server:

try {
    URL url = new URL("http://server/index.html");
  
    BufferedReader bin = new BufferedReader (
        new InputStreamReader( url.openStream() ));
  
    String line;
    while ( (line = bin.readLine()) != null ) {
        System.out.println( line );
    }
    bin.close();
} catch (Exception e) { }

We ask for an InputStream with openStream() and wrap it in a BufferedReader to read the lines of text. Because we specify the http protocol in the URL, we enlist the services of an HTTP protocol handler. Note that we haven’t talked about content handlers yet. In this case, because we’re reading directly from the input stream, no content handler (no transformation of the content data) is involved.

Getting the Content as an Object

As we said previously, reading raw content from a stream is the most general mechanism for accessing data over the Web. openStream() leaves the parsing of data up to you. The URL class, however, was intended to support a more sophisticated, pluggable, content-handling mechanism. We’ll discuss this now, but be aware that it is not widely used because of lack of standardization and limitations in how you can deploy new handlers. Although the Java community made some progress in recent years in standardizing a small set of protocol handlers, no such effort was made to standardize content handlers. This means that although this part of the discussion is interesting, its usefulness is limited.

The way it’s supposed to work is that when Java knows the type of content being retrieved from a URL and a proper content handler is available, you can retrieve the URL content as an appropriate Java object by calling the URL’s getContent() method. In this mode of operation, getContent() initiates a connection to the host, fetches the data for you, determines the type of data, and then invokes a content handler to turn the bytes into a Java object. It acts sort of as if you had read a serialized Java object, as in Chapter 13. Java will try to determine the type of the content by looking at its MIME type, its file extension, or even by examining the bytes directly.

For example, given the URL http://foo.bar.com/index.html , a call to getContent() uses the HTTP protocol handler to retrieve data and might use an HTML content handler to turn the data into an appropriate document object. Similarly, a GIF file might be turned into an AWT ImageProducer object using a GIF content handler. If we access the GIF file using an FTP URL, Java would use the same content handler but a different protocol handler to receive the data.

Since the content handler must be able to return any type of object, the return type of getContent() is Object. This might leave us wondering what kind of object we got. In a moment, we’ll describe how we could ask the protocol handler about the object’s MIME type. Based on this, and whatever other knowledge we have about the kind of object we are expecting, we can cast the Object to its appropriate, more specific type. For example, if we expect an image, we might cast the result of getContent() to ImageProducer:

try  {
    ImageProducer ip = (ImageProducer)myURL.getContent();
} catch ( ClassCastException e ) { ... }

Various kinds of errors can occur when trying to retrieve the data. For example, getContent() can throw an IOException if there is a communications error. Other kinds of errors can occur at the application level: some knowledge of how the application-specific content and protocol handlers deal with errors is necessary. One problem that could arise is that a content handler for the data’s MIME type wouldn’t be available. In this case, getContent() invokes a special “unknown type” handler that returns the data as a raw InputStream (back to square one).

In some situations, we may also need knowledge of the protocol handler. For example, consider a URL that refers to a nonexistent file on an HTTP server. When requested, the server returns the familiar “404 Not Found” message. To deal with protocol-specific operations like this, we may need to talk to the protocol handler, which we’ll discuss next.

Managing Connections

Upon calling openStream() or getContent() on a URL, the protocol handler is consulted and a connection is made to the remote server or location. Connections are represented by a URLConnection object, subtypes of which manage different protocol-specific communications and offer additional metadata about the source. The HttpURLConnection class, for example, handles basic web requests and also adds some HTTP-specific capabilities such as interpreting “404 Not Found” messages and other web server errors. We’ll talk more about HttpURLConnection later in this chapter.

We can get a URLConnection from our URL directly with the openConnection() method. One of the things we can do with the URLConnection is ask for the object’s content type before reading data. For example:

URLConnection connection = myURL.openConnection();
String mimeType = connection.getContentType();
InputStream in = connection.getInputStream();

Despite its name, a URLConnection object is initially created in a raw, unconnected state. In this example, the network connection was not actually initiated until we called the getContentType() method. The URLConnection does not talk to the source until data is requested or its connect() method is explicitly invoked. Prior to connection, network parameters and protocol-specific features can be set up. For example, we can set timeouts on the initial connection to the server and on reads:

URLConnection connection = myURL.openConnection();
connection.setConnectTimeout( 10000 ); // milliseconds
connection.setReadTimeout( 10000 ); // milliseconds
InputStream in = connection.getInputStream();

As we’ll see in the section “Using the POST Method,” we can get at the protocol-specific information by casting the URLConnection to its specific subtype.

Handlers in Practice

The content- and protocol-handler mechanisms we’ve described are very flexible; to handle new types of URLs, you need only add the appropriate handler classes. One interesting application of this would be Java-based web browsers that could handle new and specialized kinds of URLs by downloading them over the Net. The idea for this was touted in the earliest days of Java. Unfortunately, it never came to fruition. There is no API for dynamically downloading new content and protocol handlers. In fact, there is no standard API for determining what content and protocol handlers exist on a given platform.

Java currently mandates protocol handlers for HTTP, HTTPS, FTP, FILE, and JAR. While in practice you will generally find these basic protocol handlers with all versions of Java, that’s not entirely comforting, and the story for content handlers is even less clear. The standard Java classes don’t, for example, include content handlers for HTML, GIF, JPEG, or other common data types. Furthermore, although content and protocol handlers are part of the Java API and an intrinsic part of the mechanism for working with URLs, specific content and protocol handlers aren’t defined. Even those protocol handlers that have been bundled in Java are still packaged as part of the Sun implementation classes and are not truly part of the core API for all to see.

In summary, the Java content- and protocol-handler mechanism was a forward-thinking approach that never quite materialized. The promise of web browsers that dynamically extend themselves for new types of protocols and new content is, like flying cars, always just a few years away. Although the basic mechanics of the protocol-handler mechanism are useful (especially now with some standardization) for decoding content in your own applications, you should probably turn to other, newer frameworks that have a bit more specificity.

Useful Handler Frameworks

The idea of dynamically downloadable handlers could also be applied to other kinds of handler-like components. For example, the Java XML community is fond of referring to XML as a way to apply semantics (meaning) to documents and to Java as a portable way to supply the behavior that goes along with those semantics. It’s possible that an XML viewer could be built with downloadable handlers for displaying XML tags.

The JavaBeans APIs touch upon this subject with the Java Activation Framework (JAF), which provides a way to detect the data stream type and “encapsulate access to it” in a Java bean. If this sounds suspiciously like the content handler’s job, it is. Unfortunately, it looks like these APIs will not be merged and, outside of the Java Mail API, the JAF has not been widely used.

Fortunately, for working with URL streams of images, music, and video, very mature APIs are available. The Java Advanced Imaging API (JAI) includes a well-defined, extensible set of handlers for most image types, and the Java Media Framework (JMF) can play most common music and video types found online.

Talking to Web Applications

Web browsers are the universal clients for web applications. They retrieve documents for display and serve as a user interface, primarily through the use of HTML, JavaScript, and linked documents. In this section, we‘ll show how to write client-side Java code that uses HTTP through the URL class to work with web applications directly using GET and POST operations to retrieve and send data. Later in this chapter, we’ll begin a discussion of web services, which marry HTTP with XML to enable cross-platform application-to-application communications using web standards.

There are many reasons an application might want to communicate via HTTP. For example, compatibility with another browser-based application might be important, or you might need to gain access to a server through a firewall where direct socket connections (and RMI) are problematic. HTTP is the lingua franca of the Net, and despite its limitations (or more likely because of its simplicity), it has rapidly become one of the most widely supported protocols in the world. As for using Java on the client side, all the other reasons you would write a client-side GUI or non-GUI application (as opposed to a pure web/HTML-based application) also present themselves. A client-side GUI can perform sophisticated presentation and validation while, with the techniques presented here, still using web-enabled services over the network.

The primary task we discuss here is sending data to the server, specifically HTML form-encoded data. In a web browser, the name/value pairs of HTML form fields are encoded in a special format and sent to the server using one of two methods. The first method, using the HTTP GET command, encodes the user’s input into the URL and requests the corresponding document. The server recognizes that the first part of the URL refers to a program and invokes it, passing along the information encoded in the URL as a parameter. The second method uses the HTTP POST command to ask the server to accept the encoded data and pass it to a web application as a stream. In Java, we can create a URL that refers to a server-side program and request or send it data using the GET and POST methods. (In Chapter 15, we’ll see how to build web applications that implement the other side of this conversation.)

Using the GET Method

Using the GET method of encoding data in a URL is pretty easy. All we have to do is create a URL pointing to a server program and use a simple convention to tack on the encoded name/value pairs that make up our data. For example, the following code snippet opens a URL to an old-school CGI program called login.cgi on the server myhost and passes it two name/value pairs. It then prints whatever text the CGI sends back:

URL url = new URL(
    // this string should be URL-encoded
    "http://myhost/cgi-bin/login.cgi?Name=Pat&Password=foobar");

BufferedReader bin = new BufferedReader (
  new InputStreamReader( url.openStream() ));

String line;
while ( (line = bin.readLine()) != null ) {
    System.out.println( line );
}

To form the URL with parameters, we start with the base URL of login.cgi; we add a question mark (?), which marks the beginning of the parameter data, followed by the first name/value pair. We can add as many pairs as we want, separated by ampersand (&) characters. The rest of our code simply opens the stream and reads back the response from the server. Remember that creating a URL doesn’t actually open the connection. In this case, the URL connection was made implicitly when we called openStream(). Although we are assuming here that our server sends back text, it could send anything.

It’s important to point out that we have skipped a step here. This example works because our name/value pairs happen to be simple text. If any “nonprintable” or special characters (including ? or &) are in the pairs, they must be encoded first. The java.net.URLEncoder class provides a utility for encoding the data. We’ll show how to use it in the next example.

Another important thing is that although this small example sends a password field, you should never send sensitive data using this simplistic approach. The data in this example is sent in clear text across the network (it is not encrypted). And in this case, the password field would appear anywhere the URL is printed as well (e.g., server logs and bookmarks). We’ll talk about secure web communications later in this chapter and when we discuss writing web applications using servlets in Chapter 15.

Using the POST Method

Here’s a small application that acts like an HTML form. It gathers data from two text fields—name and password—and posts the data to a specified URL using the HTTP POST method. This Swing-based client application works with a server-side web-based application, just like a web browser.

Here’s the code:

//file: Post.java
import java.net.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

public class Post extends JPanel implements ActionListener {
  JTextField nameField, passwordField;
  String postURL;

  GridBagConstraints constraints = new GridBagConstraints(  );
  
  void addGB( Component component, int x, int y ) {
    constraints.gridx = x;  constraints.gridy = y;
    add ( component, constraints );
  }

  public Post( String postURL ) {
      
    this.postURL = postURL;  
      
    setBorder(BorderFactory.createEmptyBorder(5, 10, 5, 5));
    JButton postButton = new JButton("Post");
    postButton.addActionListener( this );
    setLayout( new GridBagLayout(  ) );
    constraints.fill = GridBagConstraints.HORIZONTAL;
    addGB( new JLabel("Name ", JLabel.TRAILING), 0, 0 );
    addGB( nameField = new JTextField(20), 1, 0 );
    addGB( new JLabel("Password ", JLabel.TRAILING), 0, 1 );
    addGB( passwordField = new JPasswordField(20), 1, 1 );
    constraints.fill = GridBagConstraints.NONE;
    constraints.gridwidth = 2;
    constraints.anchor = GridBagConstraints.EAST;
    addGB( postButton, 1, 2 );
  }

  public void actionPerformed(ActionEvent e) {
    postData(  );
  }

  protected void postData(  ) {
    StringBuffer sb = new StringBuffer(  );
    sb.append( URLEncoder.encode("Name") + "=" );
    sb.append( URLEncoder.encode(nameField.getText(  )) );
    sb.append( "&" + URLEncoder.encode("Password") + "=" );
    sb.append( URLEncoder.encode(passwordField.getText(  )) );
    String formData = sb.toString(  );

    try {
      URL url = new URL( postURL );
      HttpURLConnection urlcon =
          (HttpURLConnection) url.openConnection(  );
      urlcon.setRequestMethod("POST");
      urlcon.setRequestProperty("Content-type",
          "application/x-www-form-urlencoded");
      urlcon.setDoOutput(true);
      urlcon.setDoInput(true);
      PrintWriter pout = new PrintWriter( new OutputStreamWriter(
          urlcon.getOutputStream(  ), "8859_1"), true );
      pout.print( formData );
      pout.flush(  );

      // read results...
      if ( urlcon.getResponseCode(  ) != HttpURLConnection.HTTP_OK )
        System.out.println("Posted ok!");
      else {
        System.out.println("Bad post...");
        return;
      }
      //InputStream in = urlcon.getInputStream(  );
      // ...

    } catch (MalformedURLException e) {
      System.out.println(e);     // bad postURL
    } catch (IOException e2) {
      System.out.println(e2);    // I/O error
    }
  }

  public static void main( String [] args ) {
    JFrame frame = new JFrame("SimplePost");
    frame.add( new Post( args[0] ), "Center" );
    frame.pack(  );
    frame.setVisible(true);
  }
}

When you run this application, you must specify the URL of the server program on the command line. For example:

% java Post http://www.myserver.example/cgi-bin/login.cgi

The beginning of the application creates the form; there’s nothing here that won’t be obvious after you’ve read Chapters 16 through 18, which cover the AWT and Swing GUI toolkits. All the magic happens in the protected postData() method. First, we create a StringBuffer and load it with name/value pairs, separated by ampersands. (We don’t need the initial question mark when we’re using the POST method because we’re not appending to a URL string.) Each pair is first encoded using the static URLEncoder.encode() method. We run the name fields through the encoder as well as the value fields, even though we know that in this case they contain no special characters.

Next, we set up the connection to the server program. In our previous example, we weren’t required to do anything special to send the data because the request was made by the simple act of opening the URL on the server. Here, we have to carry some of the weight of talking to the remote web server. Fortunately, the HttpURLConnection object does most of the work for us; we just have to tell it that we want to do a POST to the URL and the type of data we are sending. We ask for the URLConnection object that is using the URL’s openConnection() method. We know that we are using the HTTP protocol so we should be able to cast it to an HttpURLConnection type, which has the support we need. Because HTTP is one of the guaranteed protocols, we can safely make this assumption.

We then use setRequestMethod() to tell the connection we want to do a POST operation. We also use setRequestProperty() to set the Content-Type field of our HTTP request to the appropriate type—in this case, the proper MIME type for encoded form data. (This is necessary to tell the server what kind of data we’re sending.) Finally, we use the setDoOutput() and setDoInput() methods to tell the connection that we want to both send and receive stream data. The URL connection infers from this combination that we are going to do a POST operation and expects a response. Next, we get an output stream from the connection with getOutputStream() and create a PrintWriter so that we can easily write our encoded data.

After we post the data, our application calls getResponseCode() to see whether the HTTP response code from the server indicates that the POST was successful. Other response codes (defined as constants in HttpURLConnection) indicate various failures. At the end of our example, we indicate where we could have read back the text of the response. For this application, we’ll assume that simply knowing that the post was successful is sufficient.

Although form-encoded data (as indicated by the MIME type we specified for the Content-Type field) is the most common, other types of communications are possible. We could have used the input and output streams to exchange arbitrary data types with the server program. The POST operation could send any kind of data; the server application simply has to know how to handle it. One final note: if you are writing an application that needs to decode form data, you can use the java.net.URLDecoder to undo the operation of the URLEncoder. If you use the Servlet API, this happens automatically, as you’ll see in Chapter 15.

The HttpURLConnection

Other information from the request is available from the HttpURLConnection as well. We could use getContentType() and getContentEncoding() to determine the MIME type and encoding of the response. We could also interrogate the HTTP response headers by using getHeaderField(). (HTTP response headers are metadata name/value pairs carried with the response.) Convenience methods can fetch integer and date-formatted header fields, getHeaderFieldInt() and getHeaderFieldDate(), which return an int and a long type, respectively. The content length and last modification date are provided through getContentLength() and getLastModified().

SSL and Secure Web Communications

The previous examples sent a field called Password to the server. However, standard HTTP doesn’t provide encryption to hide our data. Fortunately, adding security for GET and POST operations like this is easy (trivial in fact, for the client-side developer). Where available, you simply have to use a secure form of the HTTP protocol—HTTPS:

https://www.myserver.example/cgi-bin/login.cgi

HTTPS is a version of the standard HTTP protocol run over Secure Sockets Layer (SSL), which uses public-key encryption techniques to encrypt the browser-to-server communications. Most web browsers and servers currently come with built-in support for HTTPS (or raw SSL sockets). Therefore, if your web server supports HTTPS and has it configured, you can use a browser to send and receive secure data simply by specifying the https protocol in your URLs. There is much more to learn about SSL and related aspects of security such as authenticating whom you are actually talking to, but as far as basic data encryption goes, this is all you have to do. It is not something your code has to deal with directly. The Java JRE standard edition ships with SSL and HTTPS support, and beginning with Java 5.0, all Java implementations must support HTTPS as well as HTTP for URL connections. We’ll discuss writing secure web applications in more detail in Chapter 15.

URLs, URNs, and URIs

Earlier, we discussed URLs and distinguished them from the concept of URNs. Whereas a URL points to a specific location on the Net and specifies a protocol or scheme for accessing its contents, a URN is simply a globally unique name. A URL is analogous to giving someone your phone number. But a URN is more like giving them your social security number. Your phone number may change, but your social security number is supposed to uniquely identify you forever.

While it’s possible that some mechanism might be able to look at a given URN and tie it to a location (a URL), it is not necessarily so. URNs are intended only to be permanent, unique, abstract identifiers for an item, whereas a URL is a mechanism you can use to get in touch with a resource right now. You can use a phone number to contact me today, but you can use my social security number to uniquely identify me anytime.

An example of a URN is http://www.w3.org/1999/XSL/Transform, which is the identifier for a version of the Extensible Stylesheet Language, standardized by the W3C. Now, it also happens that this is a URL (you can go to that address and find information about the standard), but that is for convenience only. This URN’s primary mission is to uniquely label the version of the programming language in a way that never changes.

Collectively, URLs and URNs are called Uniform Resource Identifiers or URIs. A URI is simply a URL or URN. So, URLs and URNs are kinds of URIs. The reason for this abstraction is that URLs and URNs, by definition, have some things in common. All URIs are supposed to be human-readable and “transcribable” (it should be possible to write them on the back of a napkin). They always have a hierarchical structure, and they are always unique. Both URLs and URNs also share some common syntax, which is described by RFC 2396.

The java.net.URI class formalizes these distinctions. The difference between the URI and URL classes is that the URI class does not try to parse the contents of the identifier and apply any “meaning.” Whereas the URL class immediately attempts to parse the scheme portion of the URL and locate a protocol handler, the URI class doesn’t interpret its content. It serves only to allow us to work with the identifier as structured text, according to the general rules of URI syntax. With the URI class, you can construct the string, resolve relative paths, and perform equality or comparison operations, but no hostname or protocol resolution is done.

Web Services

Web services is a big, fast-moving topic and the subject of many other fine O’Reilly books. However, because we have already covered so many of the basic networking concepts (and we’ll cover XML in detail in Chapter 24), we would be shirking our duties if we didn’t provide an introduction to this important area of application development. We conclude this chapter on client-side web communications with a small example of invoking a web service.

In contrast to regular web applications intended to be visited by web browsers, web services are application-level APIs intended to be invoked by other application components. The primary distinction from other types of interapplication communications mechanisms is that they use web standards and XML to maximize cross-platform interoperability. We will leave the analysis of when exactly this is important and the cost versus benefits tradeoffs out of our discussion here. But the value in this idea should be evident from the explosion of web-based business applications in the past few years. Web services allow web-based applications to provide well-defined, cross-platform interfaces for other web-based applications.

XML-RPC

The term web services means different things to different people and has spawned many (too many) new standards in recent years. In fact, there are so many web service standards named with the prefix “WS” now that they are collectively known as “WS-*” (affectionately referred to as WS “splat” or WS “death star”). However, the original concept is simple: web services take the ubiquitous, universally understood, and easily implemented HTTP transaction and marry it with XML to define a standard for invoking application services over the Web. The process is a type of remote procedure call in which HTTP plays its traditional role as the basic communication provider and XML adds a “business envelope” in which structured data is passed. This RPC-style web service interaction defines both the basic structure of an invocation request and also a set of XML encodings for marshaling the primitive data types, allowing data parameters and results to be exchanged in a truly cross-platform way. In contrast, another form of web services—termed “document style”—places more emphasis on the exchange of application-specific XML documents than on RPC-style data marshaling and unmarshaling. We will concentrate on RPC-style web services because they currently provide the tightest coupling to Java.

WSDL

A key component of web services technology is the Web Services Description Language (WSDL). Using this standard, a structured XML document describes a web service, the individual functions (methods) it offers, and the XML data types for their respective arguments and return values. WSDL is a type of interface definition language (IDL) and plays that role for web services. However, a WSDL document can also specify the service location and other features that are not traditionally part of the service definitely.

For the client-side web services developer, the WSDL document describing a service contains all of the information needed to generate the client-side code used to invoke the service from Java or any other language. As we’ll see in our example, it is not even necessary to have an understanding of WSDL to use the service. One can simply generate the client-side interfaces and use them from a Java language viewpoint. We’ll see in Chapter 15 that we can generate the WSDL document for a new service directly from our own Java code as well.

The Tools

The Java JAX-WS Java API for XML Web Services comes bundled with Java 6 and later and contains all of the tools necessary to use, create, and work with web services in Java. It’s even possible to deploy web services for testing in simple scenarios using out-of-the-box tools. As you might imagine, Java web services make extensive use of the JAXP APIs for working with XML. JAX-WS adds the classes necessary for remote calls, as well as the development-time wsimport and wsgen tools. The wsimport tool reads a WSDL description file and generates the required Java interface and implementation classes to invoke it. The wsgen tool reads Java code containing web service annotations and can generate WSDL and other deployment-related files.

There are many application servers that provide their own mechanisms for deploying web services and generating client-side code. The Apache CXF project is another popular Java web services alternative that can work with JAX-WS and other standards.

The Weather Service Client

This example shows just how easy it is to use a web service from client-side code. We’re going to create a client for a web-based weather lookup service. The service accepts a U.S. zip code as an argument and returns the city, state, and weather conditions as a result. Please note that the server-side component of this example is hosted by a company called cdyne.com, which is a professional web services provider. Because this is a third-party site, we cannot guarantee that it will remain active. If for any reason this service disappears, don’t fret—we’ll build our own example in Chapter 15, where we implement a simple web service ourselves.

All that we need to get started is the web service WSDL description file. You can view the weather service at the WSDL website. It’s an XML file that defines a set of operations and data types for arguments and results. The file is not intended to be human readable, and should make more sense after we discuss XML in Chapter 24.

To generate the client code needed to interact with the service, we run the wsimport utility that is found in the JDK bin and pass it the WSDL location like so:

% wsimport http://wsf.cdyne.com/WeatherWS/Weather.asmx?WSDL

When wsimport completes, you should find a new directory tree named com/cdyne/ws/weatherws that contains compiled Java classes for the temperature service client interface and an implementation. The wsimport command has many useful options: you may wish to use the -keep option to retain the generated source code for the client classes so that you can store the source with your application. There is also a -p option that lets you override the generated Java package name.

The generated code contains a class called Weather that represents the overall service and an interface called WeatherSoap that represents various ports or groups of methods on the service, among other implementation classes. (The “port” is WSDL terminology for a group of functions on a web service.) If you retain the soure code (with -keep) and take a look at it, you’ll see that the generated classes use Java annotations to identify the service elements. The Weather class is marked with @WebServiceClient and the WeatherSoap interface is marked as @WebService. Furthermore, the methods of the WeatherSoap interface are marked with @WebMethod. These annotations add metadata to the code to identify the service and capture the information needed from the WSDL to map to the service XML. We’ll discuss web service annotations more when we build and deploy the server side of a web service in the next chapter. We’ll also see annotations used in analogous ways when we discuss XML binding with JAXB in Chapter 24.

Our client application can now use these classes to invoke the service. The following code looks up the current weather in the 63132 zip code:

import com.cdyne.ws.weatherws.*;

public class WSTest {
    public static void main( String[] args )
    {
        WeatherSoap weatherService = new Weather().getWeatherSoap();
        WeatherReturn weather = weatherService.getCityWeatherByZIP( "63132" );
        System.out.format("%s, %s : %s : Temperature: %s, Wind: %s",
            weather.getCity(), weather.getState(), weather.getDescription(), 
            weather.getTemperature(), weather.getWind() );        
    }
}

Remember that you need to either add the compiled service classes to your classpath or compile the generated source files along with the example code. If you run it, you should see output like the following. Note that although this service has returned the values as strings, in general, web service bindings to Java would allow elements like the temperature to be returned as numeric types.

Saint Louis, MO : Partly Cloudy : Temperature: 25, Wind: CALM

We’ll return to the topic of web services and implement our own web service in the next chapter, where we hop over to the server side of things and start building web applications.