Chapter 11. Core Utilities

In this chapter, we’ll continue our look at the core Java APIs, covering more of the tools of the java.util package. The java.util package includes a wide range of utilities including tools for mathematical operations, fundamental data structures (collections), working with dates and times, storing user preference data, and logging.

Math Utilities

Java supports integer and floating-point arithmetic directly in the language. Higher-level math operations are supported through the java.lang.Math class. As you may have seen by now, wrapper classes for primitive data types allow you to treat them as objects. Wrapper classes also hold some methods for basic conversions.

First, a few words about built-in arithmetic in Java. Java handles errors in integer arithmetic by throwing an ArithmeticException:

    int zero = 0;

    try {
        int i = 72 / zero;
    } catch ( ArithmeticException e ) {
        // division by zero
    }

To generate the error in this example, we created the intermediate variable zero. The compiler is somewhat crafty and would have caught us if we had blatantly tried to perform division by a literal zero.

Floating-point arithmetic expressions, on the other hand, don’t throw exceptions. Instead, they take on the special out-of-range values shown in Table 11-1.

Table 11-1. Special floating-point values

Value

Mathematical representation

POSITIVE_INFINITY

1.0/0.0

NEGATIVE_INFINITY

-1.0/0.0

NaN

0.0/0.0

The following example generates an infinite result:

    double zero = 0.0;
    double d = 1.0/zero;

    if ( d == Double.POSITIVE_INFINITY )
        System.out.println( "Division by zero" );

The special value NaN (not a number) indicates the result of dividing zero by zero. This value has the special mathematical distinction of not being equal to itself (NaN != NaN evaluates to true). Use Float.isNaN() or Double.isNaN() to test for NaN.

The java.lang.Math Class

The java.lang.Math class is Java’s math library. It holds a suite of static methods covering all of the usual mathematical operations like sin(), cos(), and sqrt(). The Math class isn’t very object-oriented (you can’t create an instance of Math). Instead, it’s really just a convenient holder for static methods that are more like global functions. As we saw in Chapter 6, it’s possible to use the static import functionality to import the names of static methods and constants like this directly into the scope of our class and use them by their simple, unqualified names.

Table 11-2 summarizes the methods in java.lang.Math.

Table 11-2. Methods in java.lang.Math

Method

Argument type(s)

Functionality

Math.abs(a)

int, long, float, double

Absolute value

Math.acos(a)

double

Arc cosine

Math.asin(a)

double

Arc sine

Math.atan(a)

double

Arc tangent

Math.atan2(a,b)

double

Angle part of rectangular-to-polar coordinate transform

Math.ceil(a)

double

Smallest whole number greater than or equal to a

Math.cbrt(a)

double

Cube root of a

Math.cos(a)

double

Cosine

Math.cosh(a)

double

Hyperbolic cosine

Math.exp(a)

double

Math.E to the power a

Math.floor(a)

double

Largest whole number less than or equal to a

Math.hypot(a,b)

double

Precision calculation of the sqrt() of a2 + b2

Math.log(a)

double

Natural logarithm of a

Math.log10(a)

double

Log base 10 of a

Math.max(a, b)

int, long, float, double

The value a or b closer to Long.MAX_VALUE

Math.min(a, b)

int, long, float, double

The value a or b closer to Long.MIN_VALUE

Math.pow(a, b)

double

a to the power b

Math.random()

None

Random-number generator

Math.rint(a)

double

Converts double value to integral value in double format

Math.round(a)

float, double

Rounds to whole number

Math.signum(a)

double, float

Get the sign of the number at 1.0, –1.0, or 0

Math.sin(a)

double

Sine

Math.sinh(a)

double

Hyperbolic sine

Math.sqrt(a)

double

Square root

Math.tan(a)

double

Tangent

Math.tanh(a)

double

Hyperbolic tangent

Math.toDegrees(a)

double

Convert radians to degrees

Math.toRadians(a)

double

Convert degrees to radians

log(), pow(), and sqrt() can throw a runtime ArithmeticException. abs(), max(), and min() are overloaded for all the scalar values, int, long, float, or double, and return the corresponding type. Versions of Math.round() accept either float or double and return int or long, respectively. The rest of the methods operate on and return double values:

    double irrational = Math.sqrt( 2.0 ); // 1.414...
    int bigger = Math.max( 3, 4 );  // 4
    long one = Math.round( 1.125798 ); // 1

For convenience, Math also contains the static final double values E and PI:

    double circumference = diameter  * Math.PI;

Big/Precise Numbers

If the long and double types are not large or precise enough for you, the java.math package provides two classes, BigInteger and BigDecimal, that support arbitrary-precision numbers. These full-featured classes have a bevy of methods for performing arbitrary-precision math and precisely controlling rounding of remainders. In the following example, we use BigDecimal to add two very large numbers and then create a fraction with a 100-digit result:

    long l1 = 9223372036854775807L; // Long.MAX_VALUE
    long l2 = 9223372036854775807L;
    System.out.println( l1 + l2 ); // -2 ! Not good.
     
    try {
        BigDecimal bd1 = new BigDecimal( "9223372036854775807" );
        BigDecimal bd2 = new BigDecimal( 9223372036854775807L );
        System.out.println( bd1.add( bd2 ) ); // 18446744073709551614
     
        BigDecimal numerator = new BigDecimal(1);
        BigDecimal denominator = new BigDecimal(3);
        BigDecimal fraction =
            numerator.divide( denominator, 100, BigDecimal.ROUND_UP );
        // 100 digit fraction = 0.333333 ... 3334
    }
    catch (NumberFormatException nfe) { }
    catch (ArithmeticException ae) { }

If you implement cryptographic or scientific algorithms for fun, BigInteger is crucial. Other than that, you’re not likely to need these classes.

Floating-Point Components

As we mentioned in Chapter 4, Java uses the IEEE 754 standard to represent floating-point numbers (float and double types) internally. Those of you familiar with how floating-point math works will already know that “decimal” numbers are represented in binary in this standard by separating the number into three components: a sign (positive or negative), an exponent representing the magnitude in powers of 2 of the number, and a mantissa using up most of the bits to represent the precise value irrespective of its magnitude. While for most applications the precision of float and double-type floating-point numbers is sufficient enough that we don’t need to worry about running into limitations, there are times when specialized apps may wish to work with the floating-point values more directly.

By definition, floating-point numbers trade off precision and scale. Even the smallest Java floating-point type, float, can represent (literally) astronomical numbers ranging from negative 10–45 to positive 1038. This is accomplished, put in decimal terms, by having the mantissa part of the floating-point value represent a fixed number of “digits” and the exponent tell us where to put the decimal point. As the numbers get larger in magnitude, the “precision” therefore gets shifted to the “left” as more digits appear to the left of the decimal point. What this means is that floating-point numbers can very precisly (with a large number of digits) represent small values like pi, but for bigger numbers (in the billions and trillions) those digits will be taken up with the more signifcant digits. Therefore, the gap between any two consecutive numbers that can be represented by a floating-point value grows larger as the numbers get bigger.

For some applications, knowing the limitations may be important. The java.lang.Math class therefore provides a few methods for interrogating floats and doubles about their precision. The Math.ulp() method retrieves the “unit of least precision” for a given floating-point number, which is the smallest value that bits in the mantissa represent at their current exponent. Another way to say this is that the ulp() is the approximate distance from the floating-point number to the next closest higher or lower floating-point number that can be represented. Adding positive values smaller than half the ULP to a float will not yield a new number. Adding values between half and the full ULP will result in the value plus the ULP. The Math.nextUp() method is a convenience that will take a float and tell you the next number that can be represented by adding the ULP.

        float trillionish = (float)1e12; // trillionish ~= 999,999,995,904
        float ulp = Math.ulp( f ); // ulp = 65536
        float next = Math.nextUp( f ); // next ~= 1000000061440
        trillionish += 32767; // trillionish still ~= 999,999,995,904. No change!

Additionally, the java.lang.Math class contains the method getExponent(), which retrieves the exponent part of a floating-point number (and from there one could determine the mantissa by division). It is also possible to get the raw bits of a float or double using their corresponding wrapper class methods floatToIntBits() and doubleToRawLongBits() and pick out the (IEEE standard) bits yourself.

Random Numbers

You can use the java.util.Random class to generate random values. It’s a pseudorandom-number generator that is initialized with a 48-bit seed.[31] Because it’s a pseudorandom algorithm, you’ll get the same series of values every time you use the same seed value. The default constructor uses the current time to produce a seed, but you can specify your own value in the constructor:

    long seed = mySeed;
    Random rnums = new Random( seed );

After you have a generator, you can ask for one or more random values of various types using the methods listed in Table 11-3.

Table 11-3. Random-number methods

Method

Range

nextBoolean()

true or false

nextInt()

–2147483648 to 2147483647

nextInt(int n )

0 to (n – 1) inclusive

nextLong()

–9223372036854775808 to 9223372036854775807

nextFloat()

0.0 inclusive to 1.0 exclusive

nextDouble()

0.0 inclusive to 1.0 exclusive

nextGaussian()

Gaussian distributed double with mean 0.0 and standard deviation of 1.0

By default, the values are uniformly distributed. You can use the nextGaussian() method to create a Gaussian (bell curve) distribution of double values, with a mean of 0.0 and a standard deviation of 1.0. (Lots of natural phenomena follow a Gaussian distribution rather than a strictly uniform random one.)

The static method Math.random() retrieves a random double value. This method initializes a private random-number generator in the Math class the first time it’s used, using the default Random constructor. Thus, every call to Math.random() corresponds to a call to nextDouble() on that random-number generator.

Dates and Times

Working with dates and times without the proper tools can be a chore. Fortunately, Java has three classes that handle most of the work for you. The java.util.Date class encapsulates a point in time. The java.util.GregorianCalendar class, which extends the abstract java.util.Calendar, translates between a point in time and calendar fields like month, day, and year. Finally, the java.text.DateFormat class knows how to generate and parse string representations of dates and times in many languages.[32]

The separation of the Date and Calendar classes is analogous to having a class representing temperature and a class that translates that temperature to Celsius units. A Date represents an absolute point in time as defined by a number of milliseconds from the reference point: midnight, Jan 1, 1970, GMT. This is the same frame of reference used by the System.currentTimeMillis() call. A Calendar encapsulates a point in time and maps it to higher-level (and messier) notions like years, months, weeks, and days, and deals with discontinuities like leap years. Conceivably, we could define subclasses of Calendar other than the default GregorianCalendar, say JulianCalendar or LunarCalendar, that map time using other sociological or cultural conventions.[33]

Working with Calendars

The default GregorianCalendar constructor creates a calendar initialized to the current time, in the current time zone:

    GregorianCalendar now = new GregorianCalendar();

However, more generally we can just ask the Calendar class for an appropriate calendar instance without worrying about what type of calendar system the world is using this century:

    Calendar now = Calendar.getInstance();

In either case, all the real work is done through the main set() and get() methods of Calendar. These methods use static identifiers to refer to calendar fields and values. For example:

    Calendar birthday = Calendar.getInstance();
    birthday.set( Calendar.YEAR, 1972 );
    birthday.set( Calendar.MONTH, Calendar.MAY );
    birthday.set( Calendar.DATE, 20 );

Here, we set the year, month, and day values on the calendar, altering the internal Date of the Calendar object. Any remaining fields that we did not set are left as they were initialized (to the current date and time when it was constructed). In this case, we did not really specify a full date and time; we simply overrode individual fields in the calendar.

The Calendar class contains identifiers for all of the standard date and time fields, as well as values such as days of the week and months of the year. The following are the most common identifiers:

  • YEAR, MONTH

  • WEEK_OF_YEAR, WEEK_OF_MONTH

  • DATE, DAY_OF_YEAR, DAY_OF_MONTH, DAY_OF_WEEK

  • HOUR, HOUR_OF_DAY, AM_PM

  • MINUTE, SECOND, MILLISECOND

  • ZONE_OFFSET, DST_OFFSET

DATE and DAY_OF_MONTH are synonymous. HOUR is a 12-hour clock that can be combined with AM_PM. The values are just what you would expect, as shown in the following:

  • SUNDAY, MONDAY, TUESDAY...

  • JANUARY, FEBRUARY, MARCH...

  • AM, PM

In addition to the set() method for changing field values, the Calendar class has two additional methods for performing date math, add() and roll(). Using add(), you can move a calendar forward or backward in any unit of time easily, without having to calculate the other fields. For example, we can move our calendar forward four weeks:

    Calendar cal = Calendar.getInstance();
    System.out.println( cal.getTime() );
    // Thu Nov 04 16:39:06 CST 2004
     
    cal.add( Calendar.WEEK_OF_YEAR, 4 );
    System.out.println( cal.getTime() );
    // Thu Dec 02 16:39:06 CST 2004

The roll() method, by contrast, does not alter the other fields of the calendar, but arbitrarily adjusts individual fields. See the Spinner example in Chapter 17 for additional information about adding and subtracting time periods using the add() method.

Finally, you can always get the internal Date of the Calendar object or reinitialize the calendar to a specific Date using the getTime() and setTime() method:

    // Get the absolute time the Calendar references
    Date date = calendar.getTime();
     
    // Reinitialize this calendar to the current date and time
    Date now = new Date();
    calendar.setTime( now );

Time Zones

An instance of the TimeZone class represents a time zone and the knowledge of daylight savings time at that location. You can construct a time zone from a string specifier in a number of ways. The most general approach is to use an offset from GMT, but many human-readable formats are included. (For a list, use TimeZone.getAvailableIDs().)

    TimeZone.getTimeZone("US/Central");       // CST
    TimeZone.getTimeZone("GMT-06");           // CST
    TimeZone.getTimeZone("America/Chicago");  // CST

A Calendar inherits the default time zone from the platform on which it was created. You can set a different time zone with the setTimeZone() method:

    GregorianCalendar smokey = new
    GregorianCalendar();
    smokey.setTimeZone( TimeZone.getTimeZone("US/Mountain") );

It’s important to think about dates and time zones in the right way. Remember that a Date is an absolute point in time, while a Calendar translates that Date into localized fields that may depend on where you are. In a sense, it is meaningless to talk about the date “Nov 1, 2004,” without specifying a time zone because at any given moment on earth, “now” could be one of two different calendar days. Even specifying a date and time such as “Nov 1, 2004, 9:01 pm” is ambiguous, because that particular combination of calendar and time fields occurs at 24 separate times over the span of a day as the world turns (see Figure 11-1). Only a complete date, time, and time zone specifies an absolute point in time, such as “Nov 1, 2004, 9:01 pm EST.” So it’s important to remember that the Calendar class defaults all of these fields for you even if you haven’t set them.

Calendars translate an absolute point in time to a localized data and time
Figure 11-1. Calendars translate an absolute point in time to a localized data and time

The following example prints the day of the week for the same Date object in two different time zones:

    Date date = new Date(); // point in time
     
    TimeZone CST = TimeZone.getTimeZone( "America/Chicago" );
    Calendar usa = Calendar.getInstance( CST );
    usa.setTime( date );
    System.out.println( usa.get( Calendar.DAY_OF_WEEK ) );  // 1
     
    TimeZone GMT8 = TimeZone.getTimeZone( "GMT+08"); // Beijing
    Calendar china = Calendar.getInstance( GMT8 );
    china.setTime( date );
    System.out.println( china.get( Calendar.DAY_OF_WEEK ) ); // 2

In this example, we could also have simply changed the time zone on the calendar usa using the setTimeZone() method. Unlike the field set() methods, setting the time zone does not change the underlying Date value of the calendar, only the interpretation of the fields.

The meaning of the Date object and its relationship to Calendar become particularly important when dealing with APIs for things such as databases that construct dates from incomplete date and time fields. If, as is entirely possible, you end up sending your Date object from a client application in one part of the world to a server in another, you may be surprised that the calendar fields have changed. In these situations, it’s important to work with Calendars to translate the date fields and avoid the temptation to “fix” the problem by adding or subtracting real time from the date.

Locale

It should be clear now that Calendar is not just a fancy Date, but rather is something in between a time-keeping device and a time-formatting device. This point is brought home by the fact that the Calendar class is also locale-sensitive. In addition to the notion of a time zone, a Calendar has a Locale that governs conventions such as on which day the week begins and ends. You can specify an alternate locale with the setLocale() method. Most locale-specific details, however, are handled by the DateFormat class, which we’ll discuss next.

Parsing and Formatting with DateFormat

As its name suggests, the DateFormat class formats Date objects and not Calendars, so the first step in formatting dates and times from a Calendar is to get back to a Date with the getTime() method:

    Date birthDate = calendar.getTime();

To create string representations of dates and times, create a DateFormat object and apply its format() method to a Date object. Like the NumberFormat object we looked at in the previous chapter, DateFormat itself is abstract, but it has several static (“factory”) methods that return useful DateFormat subclass instances. To get a default DateFormat, simply call getInstance():

    DateFormat simple = DateFormat.getInstance();
    String now = simple.format( new Date() );         // 4/12/06 6:06 AM

You can generate a date string or a time string, or both, using the getDateInstance(), getTimeInstance(), and getDateTimeInstance() factory methods. The argument to these methods describes what level of detail you’d like to see. DateFormat defines four constants representing detail levels: they are SHORT, MEDIUM, LONG, and FULL. There is also a DEFAULT, which is the same as MEDIUM. The following code creates three DateFormat instances: one to format a date, one to format a time, and one to format a date and time together. getDateTimeInstance() requires two arguments: the first specifies how to format the date, the second how to format the time:

    // 12-Apr-06
    DateFormat df  = DateFormat.getDateInstance(DateFormat.DEFAULT);

    // 9:18:27 AM
    DateFormat tf  = DateFormat.getTimeInstance(DateFormat.DEFAULT);

    // Wednesday, April 12, 2006 9:18:27 o'clock AM EDT
    DateFormat dtf =
      DateFormat.getDateTimeInstance( DateFormat.FULL, DateFormat.FULL );

We’re showing only how to create the DateFormat objects here. In order to actually generate a String from a date, you’ll need to call the format() method of these objects, passing a Date as an argument.

Formatting dates and times for other countries is just as easy. Overloaded factory methods accept a Locale argument:

    // 12 avr. 06
    DateFormat df =
      DateFormat.getDateInstance( DateFormat.DEFAULT, Locale.FRANCE );

    // 9:27:49
    DateFormat tf =
      DateFormat.getTimeInstance( DateFormat.DEFAULT, Locale.GERMANY );

    // mercoledi 12 aprile 2006 9.27.49 GMT-04:00
    DateFormat dtf =
        DateFormat.getDateTimeInstance(
            DateFormat.FULL, DateFormat.FULL, Locale.ITALY );

To parse a string representing a date, we use the parse() method of the DateFormat class. The result is a Date object. The parsing algorithms are finicky, so it’s safest to parse dates and times that are in the same format produced by the DateFormat. The parse() method throws a ParseException if it doesn’t understand the string you give it. All of the following calls to parse() succeed except the last; we don’t supply a time zone, but the format for the time is LONG. Other exceptions are occasionally thrown from the parse() method. To cover all the bases, catch NullPointerExceptions and StringIndexOutOfBoundsExceptions also:

    try {
      Date d;
      DateFormat df;

      df = DateFormat.getDateTimeInstance(
               DateFormat.FULL, DateFormat.FULL);
      d = df.parse("Wednesday, April 12, 2006 2:22:22 o'clock PM EDT");

      df = DateFormat.getDateTimeInstance(
               DateFormat.MEDIUM, DateFormat.MEDIUM);
      d = df.parse("12-Apr-06 2:22:22 PM");

      df = DateFormat.getDateTimeInstance(
               DateFormat.LONG, DateFormat.LONG);
      d = df.parse("April 12, 2006 2:22:22 PM EDT");

      // throws a ParseException; detail level mismatch
      d = df.parse("12-Apr-06 2:22:22 PM");
    }
    catch (Exception e) { ... }

Printf-Style Date and Time Formatting

The printf-style formatting covered in Chapter 10 can render dates and times to strings in completely arbitrary ways, without having to resort to Calendar methods to get components.

All date and time format strings use the same conversion character, t or T, followed by a suffix character that identifies the actual format or date/time component to be generated. For example, the format string %tc turns a Date argument into the string equivalent of what you get with the standard Date toString() method:

    System.out.printf( "The date is %tc\n", new Date() );
    // The date is Thu Nov 04 22:32:00 CST 2004

As with other conversion characters, the only difference between t and T is that the latter forces all of the output to uppercase. All time and date formatting is locale-sensitive, including the names of days and months and the A.M./P.M. identifier. To format a Date for another language, simply pass the Locale as the first argument:

    System.out.printf( Locale.ITALIAN, "The date is %tc\n", new Date() );
    // The date is gio nov 04 22:32:00 CST 2004

There are two additional composite, date-only formats and three composite time-only formats, as shown in the following table. The format string description in the third column of Table 11-4 refers to date and time component formats discussed in Tables 11-5 and 11-6.

Table 11-4. Composite date and time formats

Format suffix

Example

Components

c

Thu Nov 04 22:32:00 CST 2004

%ta %tb %td %tT %tZ %tY

D

11/04/04

%tm/%td/%ty

F

2004-11-04

%tY-%tm-%td

r

10:32:00 PM

%tI:%tM:%tS %Tp

R

22:32

%tH:%tM

T

22:32:00

%tH:%tM:%tS

Table 11-5 lists formats for accessing date components.

Table 11-5. Date component formats

Format suffix

Examples

Description

a

Sun, Mon, Tue...

Abbreviated day of week

A

Sunday, Monday...

Full day of week

b

Jan, Feb, Mar, ...

Abbreviated month

B

January, February, ...

Full month

Y

1999, 2004

Four-digit year

C

2004 = 20

High two digits of year

y

1999 = 99

Low two digits of year

j

001 ... 366

Day of year

m

01 ... 13

Month of year

d

01 ... 31

Day of month

e

1 ... 31

Day of month, no leading zeros

Table 11-6 lists formats for accessing time components.

Table 11-6. Time component formats

Format suffix

Examples

Description

H

00 ... 23

24-hour clock

k

0 ... 23

24-hour clock, no leading zeros

I

01 ... 12

12-hour clock

l

1 ... 12

12-hour clock, no leading zeros

M

00 ... 59

Minute

S

00 ... 60[a]

Second

L

000 ... 999

Millisecond

p

am, pm

Morning or afternoon designator

Z

CST, EST

Time zone name

z

-0600

Time zone GMT offset

[a] The second value (60) is a convention used to support leap seconds.

Timers

Java includes two handy classes for timed code execution. If you write a clock application, for example, you might want to update the display every second. You might want to play an alarm sound at some predetermined time. You could accomplish these tasks using multiple threads and calls to Thread.sleep(). But the java.util.Timer and java.util.TimerTask classes handle this for you.

The Timer class is a scheduler. Each instance of Timer has a single thread that runs in the background, watching the clock and executing one or more TimerTasks at appropriate times. You could, for example, schedule a task to run once at a specific time like this:

    import java.util.*;

    public class Y2K {
      public static void main(String[] args) {
        Timer timer = new Timer();

        TimerTask task = new TimerTask() {
          public void run() {
            System.out.println("Y2K!");
          }
        };

        Calendar cal = new GregorianCalendar( 2000, Calendar.JANUARY, 1 );
        timer.schedule( task, cal.getTime());
      }
    }

TimerTask implements the Runnable interface. To create a task, you can simply subclass TimerTask and supply a run() method. Here, we’ve created a simple anonymous subclass of TimerTask that prints a message to System.out. Using the schedule() method of Timer, we’ve asked that the task be run on January 1, 2000. If the scheduled time has already passed (as in our example), the task is run immediately.

There are some other varieties of schedule(); you can run tasks once or at recurring intervals. There are two kinds of recurring tasks—fixed delay and fixed rate. Fixed delay means that a fixed amount of time elapses between the end of the task’s execution and the beginning of the next execution. Fixed rate means that the task should begin execution at fixed time intervals. The difference comes into play when the time to execute the task is long relative to the interval. Keep in mind that tasks are executed by the Timer’s single scheduler thread. If one task takes a very long time, other tasks may be delayed, in which case they run as soon as the thread becomes available.

You could, for example, update a clock display every second with code like this:

    Timer timer = new Timer();

    TimerTask task = new TimerTask() {
        public void run() {
            repaint(); // update the clock display
        }
    };

    timer.scheduleAtFixedRate( task, 0, 1000 );

A TimerTask can be canceled before its execution with its cancel() method.

Collections

Collections are data structures that are fundamental to all types of programming. Whenever we need to refer to a group of objects, we have some kind of collection. At the core language level, Java supports collections in the form of arrays. But arrays are static and because they have a fixed length, they are awkward for groups of things that grow and shrink over the lifetime of an application. Arrays also do not represent abstract relationships between objects well. In the early days, the Java platform had only two basic classes to address these needs: the java.util.Vector class, which represents a dynamic list of objects, and the java.util.Hashtable class, which holds a map of key/value pairs. Today, Java has a more comprehensive approach to collections called the Collections Framework. The older classes still exist, but they have been retrofitted into the framework (with some eccentricities) and are generally no longer used.

Though conceptually simple, collections are one of the most powerful parts of any programming language. Collections implement data structures that lie at the heart of managing complex problems. A great deal of basic computer science is devoted to describing the most efficient ways to implement certain types of algorithms over collections. Having these tools at your disposal and understanding how to use them can make your code both much smaller and faster. It can also save you from reinventing the wheel.

Prior to Java 5, the Collections Framework had two major drawbacks. The first was that—not having generic types to work with—collections were by necessity untyped and worked only with anonymous Objects instead of real types like Dates and Strings. This meant that you had to perform a type cast every time you took an object out of a collection. This flew in the face of Java’s compile-time type safety. But in practice, this was less a problem than it was just plain cumbersome and tedious. The second issue was that, for practical reasons, collections could work only with objects and not with primitive types. This meant that any time you wanted to put a number or other primitive type into a collection, you had to store it in a wrapper class first and unpack it later upon retrieving it. The combination of these factors made code working with collections less readable and more dangerous to boot.

This all changed with the introduction of generic types and autoboxing of primitive values. First, the introduction of generic types, as we described in Chapter 8, has made it possible for truly typesafe collections to be under the control of the programmer. Second, the introduction of autoboxing and unboxing of primitive types means that you can generally treat objects and primitives as equals where collections are concerned. The combination of these new features can significantly reduce the amount of code you write and add safety as well. As we’ll see, all of the collections classes now take advantage of these features.

The Collections Framework is based around a handful of interfaces in the java.util package. These interfaces are divided into two hierarchies. The first hierarchy descends from the Collection interface. This interface (and its descendants) represents a container that holds other objects. The second, separate hierarchy is based on the Map interface, which represents a group of key/value pairs where the key can be used to retrieve the value in an efficient way.

The Collection Interface

The mother of all collections is an interface appropriately named Collection. It serves as a container that holds other objects, its elements. It doesn’t specify exactly how the objects are organized; it doesn’t say, for example, whether duplicate objects are allowed or whether the objects are ordered in any way. These kinds of details are left to child interfaces. Nevertheless, the Collection interface defines some basic operations common to all collections:

public boolean add( element )

This method adds the supplied object to this collection. If the operation succeeds, this method returns true. If the object already exists in this collection and the collection does not permit duplicates, false is returned. Furthermore, some collections are read-only. Those collections throw an UnsupportedOperationException if this method is called.

public boolean remove( element )

This method removes the specified object from this collection. Like the add() method, this method returns true if the object is removed from the collection. If the object doesn’t exist in this collection, false is returned. Read-only collections throw an UnsupportedOperationException if this method is called.

public boolean contains( element )

This method returns true if the collection contains the specified object.

public int size()

Use this method to find the number of elements in this collection.

public boolean isEmpty()

This method returns true if this collection has no elements.

public Iterator iterator()

Use this method to examine all the elements in this collection. This method returns an Iterator, which is an object you can use to step through the collection’s elements. We’ll talk more about iterators in the next section.

Additionally, the methods addAll(), removeAll(), and containsAll() accept another Collection and add, remove, or test for all of the elements of the supplied collection.

Generics and collections

When using generics, the Collection type is parameterized with a specific type of element that the collection will hold. This makes a generic collection of “anything” into a specific collection of some type of element. The parameter type becomes the compile-time type of the element arguments in all of the methods of Collection (in this case, the add(), remove(), and contains() methods listed earlier). For example, in the following code, we create a Collection that works with Dates:

    Collection<Date> dates = new ArrayList<Date>(); // = new ArrayList<>() would 
                                                        // also work.
    dates.add( new Date() );
    dates.add( "foo" ) // Error; string type where Date expected!!

ArrayList is just one implementation of Collection; we’ll talk about it a bit later. The important thing is that we’ve declared the variable dates to be of the type Collection<Date>; that is, a collection of Dates, and we’ve allocated our ArrayList to match. Because our collection has been parameterized with the type Date, the add() method of the collection becomes add( Date date ) and attempting to add any type of object other than a Date to the list would have caused a compile-time error.

If you are working with very old Java code that predates generics, you can simply drop the types and perform the appropriate casts. For example:

    Collection dates = new ArrayList();
    dates.add( new Date() );  // unchecked, compile-time warning
    Date date = (Date)dates.get( 0 );

In this case, we’ll get a compile time warning that we’re using ArrayList in a potentially unsafe (nongeneric typesafe) way.

As we’ve described earlier in the book, this is essentially what the Java compiler is doing for us with generics. When using collections (or any generic classes) in this way under Java 5 or later, you will get compile-time warnings indicating that the usage is unchecked, meaning that it is possible to get an error at runtime if you have made a mistake. In this example, a mistake would not be caught until someone tried to retrieve the object from the collection and cast it to the expected type.

Legacy code and runtime type safety

If you are working with legacy Java code that predates Java 5 generics and you do not wish to introduce generics to it, you can still add a layer of type safety at runtime by switching to a runtime type-checked version of your collection types. Java supplies runtime-checked wrappers for all of the basic collection types. These wrappers enforce a specific Java element type at runtime by throwing ClassCastException if the wrong element is inserted. For example:

    List list = new ArrayList();
    list = Collections.checkedList( list, Date.class );
    list.add( new Date() );
    list.add( "foo" ); // Runtime ClassCastException!

Here, the static Collections.checkedList() method has wrapped our collection, list, in a wrapper that implements all of the methods of List, but checks that we are only holding Dates. The second argument to the method is the literal Date.class reference to the Class of Date. This serves to tell the wrapper what type we want to enforce. Corresponding “checked” collection methods exist for all of the basic collection interfaces that we’ll see, including the base Collection, List, Set, and Map.

Converting between collections and arrays

Converting between collections and arrays is easy. For convenience, the elements of a collection can be retrieved as an array using the following methods:

    public Object[] toArray()
    public <E> E[] toArray( E[] a )

The first method returns a plain Object array. With the second form, we can be more specific and get back an array of the correct element type. If we supply an array of sufficient size, it will be filled in with the values. But if the array is too short (e.g., zero length), a new array of the same type but the required length will be created and returned to us. So you can just pass in an empty array of the correct type like this:

    Collection<String> myCollection = ...;    
    String [] myStrings = myCollection.toArray( new String[0] );

(This trick is a little awkward and it would be nice if Java let us specify the type explicitly using a Class reference, but for some reason, this isn’t the case.) Going the other way, you can convert an array of objects to a List collection with the static asList() method of the java.util.Arrays class:

    String [] myStrings = ...;    List list = Arrays.asList( myStrings );

Iterator

An iterator is an object that lets you step through a sequence of values. This kind of operation comes up so often that it is given a standard interface: java.util.Iterator. The Iterator interface has only two primary methods:

public E next()

This method returns the next element (an element of generic type E) of the associated collection.

public boolean hasNext()

This method returns true if you have not yet stepped through all the Collection’s elements. In other words, it returns true if you can call next() to get the next element.

The following example shows how you could use an Iterator to print out every element of a collection:

    public void printElements(Collection c, PrintStream out) {
        Iterator iterator = c.iterator();
        while ( iterator.hasNext() )
            out.println( iterator.next() );
    }

In addition to the traversal methods, Iterator provides the ability to remove an element from a collection:

public void remove()

This method removes the most recent object returned from next() from the associated Collection.

Not all iterators implement remove(). It doesn’t make sense to be able to remove an element from a read-only collection, for example. If element removal is not allowed, an UnsupportedOperationException is thrown from this method. If you call remove() before first calling next(), or if you call remove() twice in a row, you’ll get an IllegalStateException.

For loop over collections

A form of the for loop, described in Chapter 4, can operate over all types of Collection objects. For example, we can now step over all of the elements of a typed collection of Date objects like so:

    Collection<Date> col = ...
    for( Date date : col )
       System.out.println( date );

This feature of the Java built-in for loop is called the "enhanced" for loop (as opposed to the pregenerics, numeric-only for loop). The enhanced for loop applies only to Collection type collections, not Maps. Maps are another type of beast that really contain two distinct sets of objects (keys and values), so it’s not obvious what your intentions would be in such a loop.

java.util.Enumeration

Prior to the introduction of the Collections API there was another iterator interface: java.util.Enumeration. It used the slightly more verbose names nextElement() and hasMoreElements(), but accomplished the same thing. Many older classes provide Enumerations where they would now use Iterator. If you aren’t worried about performance, you can just convert your Enumeration into a List with a static convenience method of the java.util.Collections class:

    Enumeration myEnumeartion = ...;    
    List list = Collections.list( myEnumeration );

Collection Types

The Collection interface has three child interfaces. Set represents a collection in which duplicate elements are not allowed. List is a collection whose elements have a specific order. The Queue interface is a buffer for objects with a notion of a “head” element that’s next in line for processing.

Set

Set has no methods besides the ones it inherits from Collection. It simply enforces its no-duplicates rule. If you try to add an element that already exists in a Set, the add() method simply returns false. SortedSet maintains elements in a prescribed order; like a sorted list that can contain no duplicates. It adds the methods add() and remove() to the Set interface. You can retrieve subsets (which are also sorted) using the subSet(), headSet(), and tailSet() methods. These methods accept one or a pair of elements that mark the boundaries. The first(), last(), and comparator() methods provide access to the first element, the last element, and the object used to compare elements (more on this later).

Java 7 adds NavigableSet, which extends SortedSet and adds methods for finding the closest match greater or lesser than a target value within the sort order of the Set. This interface can be implemented efficiently using techniques such as skip lists, which make finding ordered elements fast (Java 7 supplies such an implementation, which we’ll note later).

List

The next child interface of Collection is List. The List is an ordered collection, similar to an array but with methods for manipulating the position of elements in the list:

public boolean add( E element )

This method adds the specified element to the end of the list.

public void add( int index , E element )

This method inserts the given object at the supplied position in the list. If the position is less than zero or greater than the list length, an IndexOutOfBoundsException will be thrown. The element that was previously at the supplied position, and all elements after it, are moved up one index position.

public void remove( int index )

This method removes the element at the specified position. All subsequent elements move down one index position.

public E get( int index )

This method returns the element at the given position.

public Object set( int index , E element )

This method changes the element at the given position to the specified object. There must already be an object at the index or else an IndexOutOfBoundsException is thrown.

The type E in these methods refers to the parameterized element type of the List class. Collection, Set, and List are all interface types. We’ll look at concrete implementations of these shortly.

Queue

A Queue is a collection that acts like a buffer for elements. The queue maintains the insertion order of items placed into it and has the notion of a “head” item. Queues may be first in, first out (FIFO) or last in, first out (LIFO) depending on the implementation:

public boolean offer( E element ), public boolean add( E element )

The offer() method attempts to place the element into the queue, returning true if successful. Different Queue types may have different limits or restrictions on element types (including capacity). This method differs from the add() method inherited from Collection in that it returns a Boolean value instead of throwing an exception to indicate that the element cannot be accepted.

public E poll(), public E remove()

The poll() method removes the element at the head of the queue and returns it. This method differs from the Collection method remove() in that if the queue is empty, null is returned instead of throwing an exception.

public E peek()

This method returns the head element without removing it from the queue. If the queue is empty, null is returned.

Java 7 added Deque, which is a “double-ended” queue that supports adding, querying, and removing elements from either end of the queue (the head or the tail). Dequeue has versions of the queue methods—offer, poll, and peek—that operate on the first or last element: offerFirst(), pollFirst(), peekFirst(), offerLast(), pollLast(), peekLast(). Note that Deque extends Queue and so is still a type of Queue. If you use the plain Queue methods offer(), poll(), and peek() on a Deque, they operate as a FIFO queue. Specifically, calling offer() is equivalent to offerLast() and calling poll() or peek() is the same as calling pollFirst() or peekFirst(), respectively.

Finally, Java has a legacy Stack class that acts as a LIFO queue with “push” and “pop” operations, but Deque is generally better and should serve as a general replacement for Stack. Simply use addFirst() for “push” and pollFirst() for “pop.”

BlockingQueue

BlockingQueue is part of the java.util.concurrent package. It extends the Queue interface for queues that may have a fixed capacity or other time-based limitations and allows the user to block, waiting for insertion of an item or for an available item to retrieve. It adds timed wait versions of offer() and poll() and additional, blocking take() and put() methods:

public boolean offer( E element, long time, TimeUnit units )

This method attempts to place the element into the queue, just like the method of the base Queue interface, but blocks for up to the specified period of time as it waits for space to become available.

public E poll( long time, timeUnit unit )

This method attempts to remove the element at the head of the queue, just like the method of the base Queue interface, but blocks for up to the specified period of time as it waits for an element to become available.

public E take()

This method retrieves the element at the head of the queue, blocking if necessary until one becomes available.

public void put( E element )

This method adds an element to the queue, blocking if necessary until space becomes available.

public boolean add( E element )

This method attempts to add an element to the queue immediately. If successful, it returns true. If no space is available, it throws an IllegalStateException. This method is useful for cases where you are not expecting the queue to ever reject an item.

The Map Interface

The Collections Framework also includes the java.util.Map, which is a collection of key/value pairs. Other names for map are “dictionary” or “associative array.” Maps store and retrieve elements with key values; they are very useful for things like caches or minimalist databases. When you store a value in a map, you associate a key object with a value. When you need to look up the value, the map retrieves it using the key.

With generics, a Map type is parameterized with two types: one for the keys and one for the values. The following snippet uses a HashMap, which is an efficient type of map implementation that we’ll discuss later:

    Map<String, Date> dateMap = new HashMap<String, Date>();
    dateMap.put( "today", new Date() );
    Date today = dateMap.get( "today" );

In legacy code, maps simply map Object types to Object types and require the appropriate cast to retrieve values.

The basic operations on Map are straightforward. In the following methods, the type K refers to the key parameter type and the type V refers to the value parameter type:

public V put( K key , V value )

This method adds the specified key/value pair to the map. If the map already contains a value for the specified key, the old value is replaced and returned as the result.

public V get( K key )

This method retrieves the value corresponding to key from the map.

public V remove( K key )

This method removes the value corresponding to key from the map. The value removed is returned.

public int size()

Use this method to find the number of key/value pairs in this map.

You can retrieve all the keys or values in the map:

public Set keySet()

This method returns a Set that contains all the keys in this map.

public Collection values()

Use this method to retrieve all the values in this map. The returned Collection can contain duplicate elements.

Map has one child interface, SortedMap. A SortedMap maintains its key/value pairs sorted in a particular order according to the key values. It provides the subMap(), headMap(), and tailMap() methods for retrieving sorted map subsets. Like SortedSet, it also provides a comparator() method, which returns an object that determines how the map keys are sorted. We’ll talk more about that later. Java 7 adds a NavigableMap with functionality parallel to that of NavigableSet; namely, it adds methods to search the sorted elements for an element greater or lesser than a target value.

Finally, we should make it clear that although related, Map is not literally a type of Collection (Map does not extend the Collection interface). You might wonder why. All of the methods of the Collection interface would appear to make sense for Map, except for iterator(). A Map, again, has two sets of objects: keys and values, and separate iterators for each. This is why a Map does not implement Collection.

One more note about maps: some map implementations (including Java’s standard HashMap) allow null to be used as a key or value, but others may not.

ConcurrentMap

The ConcurrentMap interface is part of the java.util.concurrent package. It extends the base Map interface and adds atomic put, remove, and replace functionality that is useful for concurrent programming:

public V putIfAbsent( K key, V value )

This method associates the value with the key only if the key was not already in use. If the key exists, no action is taken. If the key does not exist, it is created. The resulting value (existing or new) is returned.

public boolean remove( Object key, Object value )

This method removes the mapping (key and value) only if the current value associated with the key equals the supplied value. It returns true if the value was removed, false if not.

public boolean replace( K key, V existingValue, V newValue)

This method replaces the value associated with the key only if the existing value equals the existingValue argument. It returns true if the value was replaced.

public boolean replace( K key, V value )

This method replaces the value associated with the key only if a mapping already exists for the key (i.e., it already has some value).

Collection Implementations

Up until this point, we’ve talked only about interfaces. But you can’t instantiate interfaces; you need concrete implementations. Of course, the Collections Framework includes useful implementations of all of the collections interfaces. In some cases, there are several alternatives from which to choose. To understand the tradeoffs between these implementations, it helps to have a basic understanding of a few of the most common data structures used in all programming: arrays, linked lists, trees, and hash maps. Many books have been written about these data structures, and we will not drag you into the mind-numbing details here. We’ll hit the highlights briefly as a prelude to our discussion of the Java implementations. We should stress before we go on that the differences in the implementations of Java collections are only significant when working with very large numbers of elements or with extreme time sensitivity. For the most part, they all behave well enough to be interchangeable.

Arrays

It should be fairly obvious that plain old Java arrays, shown in Figure 11-2, would be good at holding an ordered collection of elements. As we mentioned earlier, however, one limitation of arrays is that they cannot grow. This means that to support a true Collection, arrays must be copied into larger arrays as capacity demands increase. Another problem with using arrays for lists is that inserting an element into the middle of an array or taking one out generally also involves copying large parts of the array to and fro, which is an expensive operation.

Array structure
Figure 11-2. Array structure

Because of this, arrays are described as consuming constant time for retrieval, but linear time for insertion into or deletion from the body of the array. The term constant time here means that, in general, the time to retrieve an element stays roughly constant even as you add more elements to the array (this is due to the fact that arrays are fully indexed). Linear time means that the time to insert or delete an element takes longer and longer as the array adds elements; the time expense grows linearly with the number of elements. There are worse things too: exponential time, as you might imagine, means that an algorithm is useless for very large numbers of elements. Unless otherwise stated, all of the Java collection implementations work in linear time or better.

Arrays are useful when you are mostly reading or exclusively appending to the end of the collection.

Linked lists

A linked list, shown in Figure 11-3, holds its elements in a chain of nodes, each referencing the node before and after it (if any). In this way, the linked list forms an ordered collection that looks something like an array. Unlike the magic of an array, however, to retrieve an element from a linked list, you must traverse the list from either the head or tail to reach the correct spot. As you might have guessed, this is a linear-time operation that gets more expensive as the number of elements grows. The flip side is that once you’re at the spot, inserting or deleting an element is a piece of cake: simply change the references and you’re done. This means that insertions and deletions—at least near the head and tail of a linked list—are said to be in constant time.

Linked list structure
Figure 11-3. Linked list structure

Linked lists are useful when you are doing a lot of insertions or deletions on a collection. An interesting variation on the basic linked list is the “skip list,” which is a kind of linked list that maintains a hierarchy of references spanning increasing ranges of elements instead of only pointing to the next element in the chain. The idea is that when you need to jump to the middle, you can use one of these “express lane” pointers to jump to the approximate location and then move forward or backward with finer granularity as needed by descending the pointer hierarchy. Java 7 adds skip list implementations of the NavigableMap and NavigableSet interfaces in the java.util.concurrent package for concurrent programming.

Trees

A tree is like a linked list in that it holds its elements in nodes that point to their neighbors. However, a tree, as its name suggests, does not have a linear structure, but instead arranges its elements in a cascade of branches like a family tree. The power of the tree structure is in sorting and searching elements that have a specified order. A binary search tree, as shown in Figure 11-4, arranges its elements such that the children divide up a range of values. One child holds values greater than the node and one child holds values lower. By applying this knowledge recursively on a properly “balanced” tree, we can rapidly find any value. The effort to search the tree is described as log(n) time, which means that it grows only with the logarithm of the number of elements, which is much better than the linear time it would take to check all of the elements by brute force.

Tree structure
Figure 11-4. Tree structure

Trees are useful for maintaining and searching large collections of sorted elements. A similar concept that can be used for data that rarely requires updates is to use a plain sorted array with a binary search algorithm. In a binary search, you make (exponentially) decreasing size jumps into the sorted array to approximate locations for the element and then choose your next jump based on whether you have overshot or undershot the target. The Java Arrays class has several binarySearch() methods that operate on different types of arrays.

Hash maps

Hash maps are strange and magical beasts. A hash map (or hash table, as it is also called) uses a mathematical hash algorithm applied to its key value to distribute its element values into a series of “buckets.” The algorithm relies on the hash algorithm to distribute the elements as uniformly (randomly) as possible. To retrieve an element by its key simply involves searching the correct bucket. Because the hash calculation is fast and can have a large number of buckets, there are few elements to search and retrieval is very fast. As we described in Chapter 7, all Java Objects have a hash value as determined by the hashCode() method. We’ll say more about hash codes and key values for maps later in this chapter.

Hash map performance is governed by many factors, including the sophistication of the hash algorithm implemented by its elements (see Figure 11-5). In general, with a good hash function implementation, the Java HashMap operates in constant time for putting and retrieving elements. Hash maps are fast at mapping unsorted collections.

Hash map structure
Figure 11-5. Hash map structure

Java Collections implementations

Table 11-7 lists the implementations of the Java Collections Framework by interface type.

Table 11-7. Collections Framework implementation classes

Interface

Implementation

Set

HashSet

LinkedHashSet

CopyOnWriteArraySet

EnumSet

CopyOnWriteArraySet

SortedSet

TreeSet

ConcurrentSkipListSet

List

ArrayList

LinkedList

Vector [a]

Stack

CopyOnWriteArrayList

Map

HashMap

EnumMap

LinkedHashMap

IdentityHashMap

Hashtable a

ConcurrentMap

ConcurrentHashMap

ConcurrentSkipListMap

SortedMap

TreeMap

Queue / Dequeue

LinkedList

ArrayDeque

PriorityQueue

DelayQueue

SynchronousQueue

ConcurrentLinkedQueue

ConcurrentLinkedDequeue

BlockingQueue

ArrayBlockingQueue

LinkedBlockingQueue

PriorityBlockingQueue

[a] Vector and Hashtable are legacy classes and should generally be avoided in favor of ArrayList and HashMap, respectively.

ArrayList and LinkedList provide the array and linked list implementations of the List interface that was described earlier. ArrayList is satisfactory for most purposes, but you should use LinkedList when you plan to do a lot of insertions or deletions at various points in the list.

HashSet and HashMap provide a good hash map implementation of the Set and Map interfaces. The LinkedHashSet and LinkedHashMap implementations combine the hash algorithm with a linked list that maintains the insertion order of the elements. Note that these linked collections are ordered, but not sorted collections.

TreeSet and TreeMap maintain sorted collections using a tree data structure. In the case of TreeMap, it is the key values that are sorted. The sorting is accomplished by a comparator object. We’ll discuss sorting later in this chapter.

Queue is implemented both by LinkedList (which implements List, Queue, and—as of Java 7, Deque) and PriorityQueue. A PriorityQueue’s prioritization comes from a sorting order determined by a comparator supplied with its constructor. Elements that sort “least” or “lowest” have the highest priority. The various implementations of BlockingQueue mirror these for concurrency-aware queues.

Finally, IdentityHashMap is an alternate type of HashMap that uses object identity instead of object equality to determine which keys match which objects. Normally, any two objects that test equal with equals() operate as the same key in a Map. With IdentityHashMap, only the original object instance retrieves the element. We’ll talk about hash codes and keys more in the next section.

We should also mention three specialized collections that we’ll talk about later: EnumSet and EnumMap are specifically designed to work with Java enumerations. WeakHashMap uses weak references to cooperate with Java garbage collection.

Hash Codes and Key Values

The term hash in Hashtable and HashMap refers to the key hash value that these collections use to make their associations. Specifically, an element in a Hashtable or HashMap is not associated with a key strictly by the key object’s identity but rather by a function of the key’s contents. This allows keys that are equivalent to access the same object. By “equivalent,” we mean those objects that compare true with equals(). If you store an object in a Hashtable using one object as a key, you can use any other object that equals() tells you is equivalent to retrieve the stored object.

It’s easy to see why equivalence is important if you remember our discussion of strings. You may create two String objects that have the same characters in them but that are different objects in Java. In this case, the == operator tells you that the String objects are different, but the equals() method of the String class tells you that they are equivalent. Because they are equivalent, if we store an object in a HashMap using one of the String objects as a key, we can retrieve it using the other.

The hash code of an object makes this association based on content. As we mentioned in Chapter 7, the hash code is like a fingerprint of the object’s data content. HashMap uses it to store the objects so that they can be retrieved efficiently. The hash code is nothing more than a number (an integer) that is a function of the data. The number is always the same for identical data, but the hashing function is intentionally designed to generate as different (random looking) a number as possible for different combinations of data. In other words, a very small change in the data should produce a big difference in the number. It should be unlikely that two nonidentical datasets, even very similar ones, would produce the same hash code.

As we described earlier, internally, HashMap really just keeps a number of lists of objects, but it puts objects into the lists based on their hash code. When it wants to find the object again, it can look at the hash code and know immediately how to get to the appropriate list. The HashMap still might end up with a number of objects to examine, but the list should be short. For each object in the short list it finds, it does the following comparison to see if the key matches:

    if ((keyHashcode == storedKeyHashcode) && key.equals(storedKey))
        return object;

There is no prescribed way to generate hash codes. The only requirement is that they be somewhat randomly distributed and reproducible (based on the data). This means that two objects that are not the same could end up with the same hash code by accident. This is unlikely (there are 232 possible integer values); moreover, it shouldn’t cause a problem because as you can see in the preceding snippet, the HashMap ultimately checks the actual keys using equals(), as well as the hash codes, to find the match. Therefore, even if two key objects have the same hash code, they can still coexist in the HashMap as long as they don’t test equal to one another as well. (To put it another way, if two keys’ hashcodes are the same and the equals method says they are the same, then they will be considered the same key and retrieve the same value object.)

Hash codes are computed by an object’s hashCode() method, which is inherited from the Object class if it isn’t overridden. The default hashCode() method simply assigns each object instance a unique number to be used as a hash code. If a class does not override this method, each instance of the class will have a unique hash code. This goes along well with the default implementation of equals() in Object, which only compares objects for identity using ==; the effect being that these arbitrary objects serve as unique keys in maps.

You must override equals() in any classes for which equivalence of different objects is meaningful. Likewise, if you want equivalent objects to serve as equivalent keys, you must override the hashCode() method as well to return identical hash code values. To do this, you need to create some suitably randomizing, arbitrary function of the contents of your object. The only criterion for the function is that it should be almost certain to return different values for objects with different data, but the same value for objects with identical data.

Synchronized and Unsynchronized Collections

The java.util.Collections class contains important static utility methods for working with Sets and Maps. All the methods in Collections operate on interfaces, so they work regardless of the actual implementation classes you’re using. The first methods we’ll look at involve creating synchronized versions of our collections.

Most of the default collection implementations are not synchronized; that is, they are not safe for concurrent access by multiple threads. The reason for this is performance. In many applications, there is no need for synchronization, so the Collections API does not provide it by default. Instead, you can create a synchronized version of any collection using the following methods of the Collections class:

    public static Collection synchronizedCollection(Collection c)
    public static Set synchronizedSet(Set s)
    public static List synchronizedList(List list)
    public static Map synchronizedMap(Map m)
    public static SortedSet synchronizedSortedSet(SortedSet s)
    public static SortedMap synchronizedSortedMap(SortedMap m)

These methods return synchronized, threadsafe versions of the supplied collection, by wrapping them (in a new object that implements the same interface and delegates the calls to the underlying collection). For example, the following shows how to create a threadsafe List:

    List list = new ArrayList();
    List syncList = Collections.synchronizedList(list);

Multiple threads can call methods on this list safely and they will block as necessary to wait for the other threads to complete.

In contrast to the norm, the older Hashtable and Vector collections are synchronized by default (and, therefore, may be a bit slower when that’s not needed). The “copy on write” collection implementations that we’ll talk about later also do not require synchronization for their special applications. Finally, the ConcurrentHashMap and ConcurrentLinkedQueue implementations that we’ll cover later are threadsafe and designed specifically to support a high degree of concurrent access without incurring a significant penalty for their internal synchronization.

Synchronizing iterators

This is important, so remember this! Although synchronized collections are threadsafe, the Iterators returned from them are not. If you obtain an Iterator from a collection, you should do your own synchronization to ensure that the collection does not change as you’re iterating through its elements. A convention does this by synchronizing on the collection itself with a synchronized block:

    synchronized(syncList) {
        Iterator iterator = syncList.iterator();
        // do stuff with the iterator here
    }

If you do not synchronize on the collection while iterating and the collection changes, Java attempts to throw a ConcurrentModificationException. However, this is not guaranteed.

ConcurrentHashMap and ConcurrentLinkedQueue

The java.util.concurrent.ConcurrentHashMap class is part of the concurrency utilities package and provides a Map that performs well under multithreaded access. A ConcurrentHashMap is safe for access from multiple threads, but it does not necessarily block threads during operations. Instead, some degree of overlapping operations, such as concurrent reads, are permitted safely. The ConcurrentHashMap can even allow a limited number of concurrent writes to happen while reads are being performed. These operations and iterators over the map do not throw a ConcurrentModificationException, but no guarantees are made as to exactly when one thread will see another thread’s work. All views of the map are based upon the most recently committed writes.

Similarly, the ConcurrentLinkedQueue implementation provides the same sort of benefits for a linked queue, allowing some degree of overlapping writes and reads by concurrent users.

Read-Only and Read-Mostly Collections

You can use the Collections class to create read-only versions of any collection:

    public static Collection unmodifiableCollection(Collection c)
    public static Set unmodifiableSet(Set s)
    public static List unmodifiableList(List list)
    public static Map unmodifiableMap(Map m)
    public static SortedSet unmodifiableSortedSet(SortedSet s)
    public static SortedMap unmodifiableSortedMap(SortedMap m)

Making unmodifiable versions of collections is a useful way to ensure that a collection handed off to another part of your code is not modified intentionally or inadvertently. Attempting to modify a read-only collection results in an UnsupportedOperationException.

Copy-on-write (“read-mostly”) collections

The java.util.concurrent package contains the CopyOnWriteArrayList and CopyOnWriteArraySet List and Set implementations. These classes are threadsafe and do not require explicit synchronization, but are heavily optimized for read operations. Any write operation causes the entire data structure to be copied internally in a blocking operation. The advantage is that if you are almost always reading, these implementations are extremely fast and no synchronization is required.

WeakHashMap

In Chapter 5, we introduced the idea of weak references—object references that don’t prevent their objects from being removed by the garbage collector. WeakHashMap is an implementation of Map that makes use of weak references in its keys and values. This means that you don’t have to remove key/value pairs from a Map when you’re finished with them. Normally, if you removed all references to a key object from the rest of your application, the Map would still contain a reference and keep the object “alive,” preventing garbage collection. WeakHashMap changes this; once you remove all references to a key object from the rest of the application, the WeakHashMap lets go of it, too and both the key and its corresponding value (if it is similarly unreferenced) are eligible for garbage collection.

EnumSet and EnumMap

EnumSet and EnumMap are collections designed to work specifically with the limited domain of objects defined by a Java enumerated type. (Enums are discussed in Chapter 5.) Java enums are Java objects and there is no reason you can’t use them as keys or values in collections otherwise. However, the EnumSet and EnumMap classes are highly optimized, taking advantage of the knowledge that the set or keys in the map, respectively, may be one of only a few individual objects. With this knowledge, storage can be compact and fast using bit fields internally. The idea is to make using collection operations on enumerations efficient enough to replace the general usage pattern of bit-flags and make binary logic operations unnecessary. Instead of using:

    int flags = getFlags();
    if ( flags & ( Constants.ERROR | Constants.WARNING ) != 0 )

we could use set operations:

    EnumSet flags = getFlags();
    if ( flags.contains( Constants.Error) || 
        flags.contains( Constants.Warning ) )

This code may not be as terse, but it is easier to understand and should be just as fast.

Sorting Collections

The Collections utilities include methods for performing common operations like sorting. Sorting comes in two varieties:

public static void sort(List list )

This method sorts the given list. You can use this method only on lists whose elements implement the java.lang.Comparable interface. Luckily, many classes already implement this interface, including String, Date, BigInteger, and the wrapper classes for the primitive types (Integer, Double, etc.).

public static void sort(List list, Comparatorc)

Use this method to sort a list whose elements don’t implement the Comparable interface. The supplied java.util.Comparator does the work of comparing elements. You might, for example, write an ImaginaryNumber class and want to sort a list of them. You would then create a Comparator implementation that knew how to compare two imaginary numbers.

The sorted collections we discussed earlier, SortedSet and SortedMap, maintain their collections in a specified order using the Comparable interface of their elements. If the elements do not implement Comparable, you must supply a Comparator object yourself in the constructor of the implementation. For example:

    Comparator myComparator = ...
    SortedSet mySet = new TreeSet( myComparator );

Collections give you some other interesting capabilities, too. If you’re interested in learning more, check out the min(), max(), binarySearch(), and reverse() methods.

A Thrilling Example

Collections is a bread-and-butter topic, which means it’s hard to create exciting examples. The example in this section reads a text file, parses all its words, counts the number of occurrences, sorts them, and writes the results to another file. It gives you a good feel for how to use collections in your own programs. This example also shows features including generics, autoboxing, and the Scanner API.

    import java.io.*;
    import java.util.*;
     
    public class WordSort
    {
      public static void main(String[] args) throws IOException
      {
        if ( args.length < 2 ) {
          System.out.println("Usage: WordSort inputfile outputfile");
          return;
        }
        String inputfile = args[0];
        String outputfile = args[1];
     
        /*  Create the word map. Each key is a word and each value is an
            Integer that represents the number of times the word occurs
            in the input file.
        */
        Map<String,Integer> map = new TreeMap<>();
     
        Scanner scanner = new Scanner( new File(inputfile) );
        while ( scanner.hasNext() ) {
            String word = scanner.next();
            Integer count = map.get( word );
            count = ( count == null ? 1 : count +1 );
            map.put( word, count );
        }
        scanner.close();
     
        // get the map's keys
        List<String> keys = new ArrayList<>( map.keySet() );
     
        // write the results to the output file
        PrintWriter out = new PrintWriter( new FileWriter(outputfile) );
        for ( String key : keys )
          out.println( key + " : " + map.get(key) );
        out.close();
      }
    }

Suppose, for example, that you have an input file named Ian Moore.txt:

    Well it was my love that kept you going
    Kept you strong enough to fall
    And it was my heart you were breaking
    When he hurt your pride

    So how does it feel
    How does it feel
    How does it feel
    How does it feel

You could run the example on this file using the following command:

    % java WordSort "Ian Moore.txt" count.txt

The output file, count.txt, looks like this:

    And : 1
    How : 3
    Kept : 1
    So : 1
    Well : 1
    When : 1
    breaking : 1
    does : 4
    enough : 1
    ...

The results are case-sensitive: “How” and “how” are recorded separately. You could modify this behavior by converting words to all lowercase after retrieving them from the Scanner.

Properties

The java.util.Properties class is a specialized hash table for strings. Properties are generally used to hold textual configuration data. Examples of this are the Java System properties, which are passed to a Java application on the command line. We’ll cover those later in this section. More generally, you can use a Properties table to hold arbitrary configuration information for an application in an easily accessible format. The neat thing about a Properties object is that it can load and store its information in a plain text or XML text format using streams (see Chapter 12 for information on streams).

Any string values can be stored as key/value pairs in a Properties table. However, the convention is to use a dot-separated naming hierarchy to group property names into logical structures. (Unfortunately, this is just a convention, and you can’t really work with groups of properties in a hierarchical way as this might imply.) For example, you can create an empty Properties object and add String key/value pairs just as you could with a Map:

    Properties props = new Properties();
    props.setProperty("myApp.xsize", "52");
    props.setProperty("myApp.ysize", "79");

Thereafter, you can retrieve values with the getProperty() method:

    String xsize = props.getProperty( "myApp.xsize" );

If the named property doesn’t exist, getProperty() returns null. You can get an Enumeration of the property names with the propertyNames() method:

    for ( Enumeration e = props.propertyNames(); e.hasMoreElements(); ) {
        String name = e.nextElement();
        ...
    }

When you create a Properties object, you can specify a second object for default property values:

    Properties defaults = ...
    Properties props = new Properties( defaults );

Now, when you call getProperty(), the method searches the default table if it doesn’t find the named property in the current table. An alternative version of getProperty() also accepts a default value; this value is returned instead of null if the property is not found in the current or default lists:

    String xsize = props.getProperty( "myApp.xsize", "50" );

Loading and Storing

You can save a Properties table to an OutputStream using the save() method. The property information is output in a flat ASCII format. We’ll talk about I/O in the next chapter, but bear with us for now. Continuing with the previous example, output the property information using the System.out stream as follows:

    props.save( System.out, "Application Parameters" );

System.out is a standard output stream that prints to the console or command line of an application. We could also save the information to a file using a FileOutputStream as the first argument to save(). The second argument to save() is a String that is used as a header for the data. The previous code outputs something like the following to System.out:

    #Application Parameters
    #Mon Feb 12 09:24:23 CST 2001
    myApp.ysize=79
    myApp.xsize=52

The load() method reads the previously saved contents of a Properties object from an InputStream:

    FileInputStream fin;
    ...
    Properties props = new Properties()
    props.load( fin );

The list() method is useful for debugging. It prints the contents to an OutputStream in a format that is more human-readable but not retrievable by load(). It truncates long lines with an ellipsis (...).

The Properties class also contains storeToXML() and loadFromXML() methods. These operate just like the save() and load() methods but write an XML file like the following:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"
    >
    <properties>
    <comment>My Properties</comment>
    <entry key="myApp.ysize">79</entry>
    <entry key="myApp.xsize">52</entry>
    </properties>

We’ll cover XML in detail in Chapter 24.

System Properties

The java.lang.System class provides access to basic system environment information through the static System.getProperties() method. This method returns a Properties table that contains system properties. System properties take the place of environment variables in some programming environments. Table 11-8 summarizes system properties that are guaranteed to be defined in any Java environment.

Table 11-8. System properties

System property

Meaning

java.vendor

Vendor-specific string

java.vendor.url

URL of vendor

java.version

Java version

java.home

Java installation directory

java.class.version

Java class version

java.class.path

The classpath

os.name

Operating system name

os.arch

Operating system architecture

os.version

Operating system version

file.separator

File separator (such as / or \)

path.separator

Path separator (such as : or ;)

line.separator

Line separator (such as \n or \r\n)

user.name

User account name

user.home

User’s home directory

Java applets and other Java applications that run with restrictions may be prevented from reading the following properties: java.home, java.class.path, user.name, user.home, and user.dir. As you’ll see later, these restrictions are implemented by a SecurityManager object.

Your application can set system properties with the static method System.setProperty() . You can also set your own system properties when you run the Java interpreter, using the -D option:

    % java -Dfoo=bar -Dcat=Boojum MyApp

Because it is common to use system properties to provide parameters such as numbers and colors, Java provides some convenience routines for retrieving property values and parsing them into their appropriate types. The classes Boolean, Integer, Long, and Color each come with a “get” method that looks up and parses a system property. For example, Integer.getInteger("foo") looks for a system property called foo and then returns it as an Integer.

The Preferences API

The Java Preferences API accommodates the need to store both system and per-user configuration data persistently across executions of the Java VM. The Preferences API is like a portable version of the Windows registry, a mini-database in which you can keep small amounts of information, accessible to all applications. Entries are stored as name/value pairs, where the values may be of several standard types including strings, numbers, Booleans, and even short byte arrays. We should stress that the Preferences API is not intended to be used as a true database and you can’t store large amounts of data in it.

Preferences are stored logically in a tree. A preferences object is a node in the tree located by a unique path. You can think of preferences as files in a directory structure; within the file are stored one or more name/value pairs. To store or retrieve items, you ask for a preferences object for the correct path. Here is an example; we’ll explain the node lookup shortly:

    Preferences prefs = Preferences.userRoot().node("oreilly/learningjava");

    prefs.put("author", "Niemeyer");
    prefs.putInt("edition", 4);

    String author = prefs.get("author", "unknown");
    int edition = prefs.getInt("edition", -1);

In addition to the String and int type accessors, there are the following get methods for other types: getLong(), getFloat(), getDouble(), getByteArray(), and getBoolean(). Each of these get methods takes a key name and default value to be used if no value is defined. And, of course, for each get method, there is a corresponding “put” method that takes the name and a value of the corresponding type. Providing defaults in the get methods is mandatory. The intent is for applications to function even if there is no preference information or if the storage for it is not available, as we’ll discuss later.

Preferences are stored in two separate trees: system preferences and user preferences. System preferences are shared by all users of the Java installation. But user preferences are maintained separately for each user; each user sees his or her own preference information. In our example, we used the static method userRoot() to fetch the root node (preference object) for the user preferences tree. We then asked that node to find the child node at the path oreilly/learningjava, using the node() method. The corresponding systemRoot() method provides the system root node.

The node() method accepts either a relative or an absolute path. A relative path asks the node to find the path relative to itself as a base. We also could have gotten our node this way:

    Preferences prefs =
       Preferences.userRoot().node("oreilly").node("learningjava");

But node() also accepts an absolute path, in which case the base node serves only to designate the tree that the path is in. We could use the absolute path /oreilly/learningjava as the argument to any node() method and reach our preferences object.

Preferences for Classes

Java is an object-oriented language, and so it’s natural to wish to associate preference data with classes. In Chapter 12, we’ll see that Java provides special facilities for loading resource files associated with class files. The Preferences API follows this pattern by associating a node with each Java package. Its convention is simple: the node path is just the package name with the dots (.) converted to slashes (/). All classes in the package share the same node.

You can get the preference object node for a class using the static Preferences.userNodeForPackage() or Preferences.systemNodeForPackage() methods, which take a Class as an argument and return the corresponding package node for the user and system trees, respectively. For example:

    Preferences datePrefs = Preferences.systemNodeForPackage( Date.class );
    Preferences myPrefs = Preferences.userNodeForPackage( MyClass.class );
    Preferences morePrefs =
        Preferences.userNodeForPackage( myObject.getClass() );

Here, we’ve used the .class construct to refer to the Class object for the Date class in the system tree and to our own MyClass class in the user tree. The Date class is in the java.util package, so we’ll get the node /java/util in that case. You can get the Class for any object instance using the getClass() method.

Preferences Storage

There is no need to “create” nodes. When you ask for a node, you get a preferences object for that path in the tree. If you write something to it, that data is eventually placed in persistent storage, called the backing store. The backing store is the implementation-dependent storage mechanism used to hold the preference data. All the put methods return immediately, and no guarantees are made as to when the data is actually stored. You can force data to the backing store explicitly using the flush() method of the Preferences class. Conversely, you can use the sync() method to guarantee that a preferences object is up-to-date with respect to changes placed into the backing store by other applications or threads. Both flush() and sync() throw a BackingStoreException if data cannot be read or written for some reason.

You don’t have to create nodes, but you can test for the existence of a data node with the nodeExists() method, and you can remove a node and all its children with the removeNode() method. To remove a data item from a node, use the remove() method, specifying the key; or you can remove all the data from a node with the clear() method (which is not the same as removing the node).

Although the details of the backing store are implementation-dependent, the Preferences API provides a simple import/export facility that can read and write parts of a preference tree to an XML file. (The format for the file is available at http://java.sun.com/dtd/.) A preference object can be written to an output stream with the exportNode() method. The exportSubtree() method writes the node and all its children. Going the other way, the static Preferences.importPreferences() method can read the XML file and populate the appropriate tree with its data. The XML file records whether it is user or system preferences, but user data is always placed into the current user’s tree, regardless of who generated it.

It’s interesting to note that because the import mechanism writes directly to the tree, you can’t use this as a general data-to-XML storage mechanism (other APIs play that role). Also, although we said that the implementation details are not specified, it’s interesting how things really work in the current implementation. On some systems, Java creates a directory hierarchy for each tree at $JAVA_HOME/jre/.systemPrefs and $HOME/.java/.userPrefs, respectively. In each directory, there is an XML file called prefs.xml corresponding to that node.

Change Notification

Often your application should be notified if changes are made to the preferences while it’s running. You can get updates on preference changes using the PreferenceChangeListener and NodeChangeListener interfaces. These interfaces are examples of event listener interfaces, and we’ll see many examples of these in Chapters 16 through 18. We’ll also talk about the general pattern later in this chapter in the section “Observers and Observables”. For now, we’ll just say that by registering an object that implements PreferenceChangeListener with a node, you can receive updates on added, removed, and changed preference data for that node. The NodeChangeListener allows you to be told when child nodes are added to or removed from a specific node. Here is a snippet that prints all the data changes affecting our /oreilly/learningjava node:

    Preferences prefs =
        Preferences.userRoot().node("/oreilly/learningjava");

    prefs.addPreferenceChangeListener( new PreferenceChangeListener() {
        public void preferenceChange(PreferenceChangeEvent e) {
            System.out.println("Value: " + e.getKey()
                + " changed to "+ e.getNewValue() );
        }
    } );

In brief, this example listens for changes to preferences and prints them. If this example isn’t immediately clear, it should be after you’ve read about events in Chapter 16 and beyond.

The Logging API

The java.util.logging package provides a highly flexible and easy-to-use logging framework for system information, error messages, and fine-grained tracing (debugging) output. With the logging package, you can apply filters to select log messages, direct their output to one or more destinations (including files and network services), and format the messages appropriately for their consumers.

Most importantly, much of this basic logging configuration can be set up externally at runtime through the use of a logging setup properties file or an external program. For example, by setting the right properties at runtime, you can specify that log messages are to be sent both to a designated file in XML format and also logged to the system console in a digested, human-readable form. Furthermore, for each of those destinations, you can specify the level or priority of messages to be logged, discarding those below a certain threshold of significance. By following the correct source conventions in your code, you can even make it possible to adjust the logging levels for specific parts of your application, allowing you to target individual packages and classes for detailed logging without being overwhelmed by too much output. The Logging API can even be controlled remotely via Java Management Extensions MBean APIs.

Overview

Any good logging API must have at least two guiding principles. First, performance should not inhibit the developer from using log messages freely. As with Java language assertions (discussed in Chapter 4), when log messages are turned off, they should not consume any significant amount of processing time. This means that there’s no performance penalty for including logging statements as long as they’re turned off. Second, although some users may want advanced features and configuration, a logging API must have some simple mode of usage that is convenient enough for time-starved developers to use in lieu of the old standby System.out.println(). Java’s Logging API provides a simple model and many convenience methods that make it very tempting.

Loggers

The heart of the logging framework is the logger, an instance of java.util.logging.Logger. In most cases, this is the only class your code will ever have to deal with. A logger is constructed from the static Logger.getLogger() method, with a logger name as its argument. Logger names place loggers into a hierarchy with a global, root logger at the top and a tree and children below. This hierarchy allows the configuration to be inherited by parts of the tree so that logging can be automatically configured for different parts of your application. The convention is to use a separate logger instance in each major class or package and to use the dot-separated package and/or class name as the logger name. For example:

    package com.oreilly.learnjava;
    public class Book {
        static Logger log = Logger.getLogger("com.oreilly.learnjava.Book");

The logger provides a wide range of methods to log messages; some take very detailed information, and some convenience methods take only a string for ease of use. For example:

    log.warning("Disk 90% full.");
    log.info("New user joined chat room.");

We cover methods of the logger class in detail a bit later. The names warning and info are two examples of logging levels; there are seven levels ranging from SEVERE at the top to FINEST at the bottom. Distinguishing log messages in this way allows us to select the level of information that we want to see at runtime. Rather than simply logging everything and sorting through it later (with negative performance impact) we can tweak which messages are generated. We’ll talk more about logging levels in the next section.

We should also mention that for convenience in very simple applications or experiments, a logger for the name “global” is provided in the static field Logger.global. You can use it as an alternative to the old standby System.out.println() for those cases where that is still a temptation:

    Logger.global.info("Doing foo...")

Handlers

Loggers represent the client interface to the logging system, but the actual work of publishing messages to destinations (such as files or the console) is done by handler objects. Each logger may have one or more Handler objects associated with it, which includes several predefined handlers supplied with the Logging API: ConsoleHandler, FileHandler, StreamHandler, and SocketHandler. Each handler knows how to deliver messages to its respective destination. ConsoleHandler is used by the default configuration to print messages on the command line or system console. FileHandler can direct output to files using a supplied naming convention and automatically rotate the files as they become full. The others send messages to streams and sockets, respectively. There is one additional handler, MemoryHandler, that can hold a number of log messages in memory. MemoryHandler has a circular buffer, which maintains a certain number of messages until it is triggered to publish them to another designated handler.

As we said, loggers can be set to use one or more handlers. Loggers also send messages up the tree to each of their parent logger’s handlers. In the simplest configuration, this means that all messages end up distributed by the root logger’s handlers. We’ll soon see how to set up output using the standard handlers for the console, files, etc.

Filters

Before a logger hands off a message to its handlers or its parent’s handlers, it first checks whether the logging level is sufficient to proceed. If the message doesn’t meet the required level, it is discarded at the source. In addition to level, you can implement arbitrary filtering of messages by creating Filter classes that examine the log message before it is processed. A Filter class can be applied to a logger externally at runtime in the same way that the logging level, handlers, and formatters, which are discussed next, can be. A Filter may also be attached to an individual Handler to filter records at the output stage (as opposed to the source).

Formatters

Internally, messages are carried in a neutral format, including all the source information provided. It is not until they are processed by a handler that they are formatted for output by an instance of a Formatter object. The logging package comes with two basic formatters: SimpleFormatter and XMLFormatter. The SimpleFormatter is the default used for console output. It produces short, human-readable summaries of log messages. XMLFormatter encodes all the log message details into an XML record format. The DTD for the format can be found at http://java.sun.com/dtd/.

Logging Levels

Table 11-9 lists the logging levels from most to least significant.

Table 11-9. Logging API logging levels

Level

Meaning

SEVERE

Application failure

WARNING

Notification of potential problem

INFO

Messages of general interest to end users

CONFIG

Detailed system configuration information for administrators

FINE,

FINER,

FINEST

Successively more detailed application tracing information for developers

These levels fall into three camps: end user, administrator, and developer. Applications often default to logging only messages of the INFO level and above (INFO, WARNING, and SEVERE). These levels are generally seen by end users and messages logged to them should be suitable for general consumption. In other words, they should be written clearly so they make sense to an average user of the application. Often these kinds of messages are presented to the end user on a system console or in a pop-up message dialog.

The CONFIG level should be used for relatively static but detailed system information that could assist an administrator or installer. This might include information about the installed software modules, host system characteristics, and configuration parameters. These details are important, but probably not as meaningful to an end user.

The FINE, FINER, and FINEST levels are for developers or others with knowledge of the internals of the application. These should be used for tracing the application at successive levels of detail. You can define your own meanings for these. We’ll suggest a rough outline in our example, coming up next.

A Simple Example

In the following (admittedly very contrived) example, we use all the logging levels so that we can experiment with logging configuration. Although the sequence of messages is nonsensical, the text is representative of messages of that type.

    import java.util.logging.*;

    public class LogTest {
        public static void main(String argv[])
        {
            Logger logger = Logger.getLogger("com.oreilly.LogTest");

            logger.severe("Power lost - running on backup!");
            logger.warning("Database connection lost, retrying...");
            logger.info("Startup complete.");
            logger.config("Server configuration: standalone, JVM version 1.5");
            logger.fine("Loading graphing package.");
            logger.finer("Doing pie chart");
            logger.finest("Starting bubble sort: value ="+42);
        }
    }

There’s not much to this example. We ask for a logger instance for our class using the static Logger.getLogger() method, specifying a class name. The convention is to use the fully qualified class name, so we’ll pretend that our class is in a com.oreilly package.

Now, run LogTest. You should see output like the following on the system console:

    Jan 6, 2002 3:24:36 PM LogTest main
    SEVERE: Power lost - running on backup!
    Jan 6, 2002 3:24:37 PM LogTest main
    WARNING: Database connection lost, retrying...
    Jan 6, 2002 3:24:37 PM LogTest main
    INFO: Startup complete.

We see the INFO, WARNING, and SEVERE messages, each identified with a date and timestamp and the name of the class and method (LogTest main) from which they came. Notice that the lower-level messages did not appear. This is because the default logging level is normally set to INFO, meaning that only messages of severity INFO and above are logged. Also note that the output went to the system console and not to a logfile somewhere; that’s also the default. Now we’ll describe where these defaults are set and how to override them at runtime.

Logging Setup Properties

As we said in the introduction, probably the most important feature of the Logging API is the ability to configure so much of it at runtime through the use of external properties or applications. The default logging configuration is stored in the file jre/lib/logging.properties in the directory where Java is installed. It’s a standard Java properties file (of the kind we described earlier in this chapter).

The format of this file is simple. You can make changes to it, but you don’t have to. Instead, you can specify your own logging setup properties file on a case-by-case basis using a system property at runtime, as follows:

    % java -Djava.util.logging.config.file=myfile.properties

In this command line, myfile is your properties file that contains the directive, which we’ll describe next. If you want to make this file designation more permanent, you can do so by setting the filename in the corresponding entry using the Java Preferences API described earlier in this chapter. You can go even further and instead of specifying a setup file, supply a class that is responsible for setting up all logging configuration, but we won’t get into that here.

A very simple logging properties file might look like this:

    # Set the default logging level
    .level = FINEST
    # Direct output to the console
    handlers = java.util.logging.ConsoleHandler

Here, we have set the default logging level for the entire application using the .level (that’s dot-level) property. We have also used the handlers property to specify that an instance of the ConsoleHandler should be used (just like the default setup) to show messages on the console. If you run our application again, specifying this properties file as the logging setup, you will now see all our log messages.

But we’re just getting warmed up. Next, let’s look at a more complex configuration:

    # Set the default logging level
    .level = INFO

    # Ouput to file and console
    handlers = java.util.logging.FileHandler, java.util.logging.ConsoleHandler

    # Configure the file output
    java.util.logging.FileHandler.level = FINEST
    java.util.logging.FileHandler.pattern = %h/Test.log
    java.util.logging.FileHandler.limit = 25000
    java.util.logging.FileHandler.count = 4
    java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter

    # Configure the console output
    java.util.logging.ConsoleHandler.level = WARNING

    # Levels for specific classes
    com.oreilly.LogTest.level = FINEST

In this example, we have configured two log handlers: a ConsoleHandler with the logging level set to WARNING and also an instance of FileHandler that sends the output to an XML file. The file handler is configured to log messages at the FINEST level (all messages) and to rotate logfiles every 25,000 lines, keeping a maximum of four files.

The filename is controlled by the pattern property. Forward slashes in the filename are automatically localized to backslash (\) if necessary. The special symbol %h refers to the user home. You can use %t to refer to the system temporary directory. If filenames conflict, a number is appended automatically after a dot (starting at zero). Alternatively, you can use %u to indicate where a unique number should be inserted into the name. Similarly, when files rotate, a number is appended after a dot at the end. You can take control of where the rotation number is placed with the %g identifier.

In our example, we specified the XMLFormatter class. We could also have used the SimpleFormatter class to send the same kind of simple output to the console. The ConsoleHandler also allows us to specify any formatter we wish, using the formatter property.

Finally, we promised earlier that you could control logging levels for parts of your applications. To do this, set properties on your application loggers using their hierarchical names:

    # Levels for specific logger (class) names
    com.oreilly.LogTest.level = FINEST

Here, we’ve set the logging level for just our test logger, by name. The log properties follow the hierarchy, so we could set the logging level for all classes in the oreilly package with:

    com.oreilly.level = FINEST

Logging levels are set in the order in which they are read in the properties file, so set the general ones first. Also note that the levels set on the handlers allow the file handler to filter only the messages being supplied by the loggers. So setting the file handler to FINEST won’t revive messages squelched by a logger set to SEVERE (only the SEVERE messages will make it to the handler from that logger).

The Logger

In our example, we used the seven convenience methods named for the various logging levels. There are also three groups of general methods that can be used to provide more detailed information. The most general are:

    log(Level level, String msg)
    log(Level level, String msg, Object param1)
    log(Level level, String msg, Object params[])
    log(Level level, String msg, Throwable thrown)

These methods accept as their first argument a static logging level identifier from the Level class, followed by a parameter, array, or exception type. The level identifier is one of Level.SEVERE, Level.WARNING, Level.INFO, and so on.

In addition to these four methods, there are four corresponding methods named logp() that also take a source class and method name as the second and third arguments. In our example, we saw Java automatically determine that information, so why would we want to supply it? The answer is that Java may not always be able to determine the exact method name because of runtime dynamic optimization. The p in logp stands for “precise” and allows you to control this yourself.

There is yet another set of methods named logrb()—which probably should have been named logprb()—that take both the class and method names and a resource bundle name. The resource bundle localizes the messages (see the section “Resource Bundles” in Chapter 10). More generally, a logger may have a resource bundle associated with it when it is created, using another form of the getLogger method:

    Logger.getLogger("com.oreilly.LogTest", "logMessages");

In either case, the resource bundle name is passed along with the log message and can be used by the formatter. If a resource bundle is specified, the standard formatters treat the message text as a key and try to look up a localized message. Localized messages may include parameters using the standard message format notation and the form of log(), which accepts an argument array.

Finally, there are convenience methods called entering(), exiting(), and throwing() that developers can use to log detailed trace information.

Performance

In the introduction, we said that performance is a priority of the Logging API. To that end we’ve described that log messages are filtered at the source, using logging levels to cut off processing of messages early. This saves much of the expense of handling them. However, it cannot prevent certain kinds of setup work that you might do before the logging call. Specifically, because we’re passing things into the log methods, it’s common to construct detailed messages or render objects to strings as arguments. Often this kind of operation is costly. To avoid unnecessary string construction, you should wrap expensive log operations in a conditional test using the Logger isLoggable() method to test whether you should carry out the operation:

    if ( log.isLoggable( Level.CONFIG ) ) {
        log.config("Configuration: "+ loadExpensiveConfigInfo() );
    }

Observers and Observables

The java.util.Observer interface and java.util.Observable class are relatively small utilities, but they provide a glimpse of a fundamental design pattern in Java. Observers and observables are part of the MVC (Model-View-Controller) framework. It is an abstraction that lets a number of client objects (the observers) be notified whenever a certain object or resource (the observable) changes in some way. We will see this pattern used extensively in Java’s event mechanism, which is covered in Chapters 16 through 19. Although these classes are not often used directly, it’s worth looking at them in order to understand the pattern.

The Observable object has a method that an Observer calls to register its interest. When a change happens, the Observable sends a notification by calling a method in each of the Observers. The observers implement the Observer interface, which specifies that notification causes an Observer object’s update() method to be called.

In the following example, we create a MessageBoard object that holds a String message. MessageBoard extends Observable, from which it inherits the mechanism for registering observers (addObserver()) and notifying observers (notifyObservers()). To observe the MessageBoard, we have Student objects that implement the Observer interface so that they can be notified when the message changes:

    //file: MessageBoard.java
    import java.util.*;
      
    public class MessageBoard extends Observable {
        private String message;
      
        public String getMessage() {
            return message;
        }
        public void changeMessage( String message ) {
            this.message = message;
            setChanged();
            notifyObservers( message );
        }
        public static void main( String [] args ) {
            MessageBoard board = new MessageBoard();
            Student bob = new Student();
            Student joe = new Student();
            board.addObserver( bob );
            board.addObserver( joe );
            board.changeMessage("More Homework!");
        }
    } // end of class MessageBoard
      
    class Student implements Observer {
        public void update(Observable o, Object arg) {
            System.out.println( "Message board changed: " + arg );
        }
    }

Our MessageBoard object extends Observable, which provides a method called addObserver(). Each Student object registers itself using this method and receives updates via its update() method. When a new message string is set using the MessageBoard’s changeMessage() method, the Observable calls the setChanged() and notifyObservers() methods to notify the observers. notifyObservers() can take as an argument an Object to pass along as an indication of the change. This object—in this case, the String containing the new message—is passed to the observer’s update() method as its second argument. The first argument to update() is the Observable object itself.

The main() method of MessageBoard creates a MessageBoard and registers two Student objects with it. Then it changes the message. When you run the code, you should see each Student object print the message as it is notified.

You can imagine how you could implement the observer/observable relationship yourself using a List to hold the list of observers. In Chapter 16 and beyond, we’ll see that the Java AWT and Swing event model extends this design pattern to use strongly typed observables and observers, which are called events and event listeners. But for now, we turn our discussion of core utilities to another fundamental topic: I/O.



[31] The generator uses a linear congruential formula. See The Art of Computer Programming, Volume 2: Semi-numerical Algorithms by Donald Knuth (Addison-Wesley).

[32] Prior to Java 1.1, the Date class handled some of the functions of a calendar as well. Most of these methods have now been deprecated. Today, the only purpose of the Date class is to represent a point in time.

[33] Java’s GregorianCalendar class is actually both a Julian and Gregorian calendar with a programmable cut-over date. For a wealth of information about time and world time-keeping conventions, see the U.S. Navy Directorate of Time.