Chapter 7. Logging

Logging is a critical piece of application infrastructure, particularly in a modern JavaScript application. A typical JavaScript application will involve events in a browser, events on the server, events related to third-party API communication, and a real need for deep profiling and analysis of application performance characteristics. Logging serves every aspect of the application business, including:

Debugging

For debugging, it's critical to log all application errors and warnings. The log messages that serve other functions will also come in handy while you're debugging, because they can supply context when something goes wrong.

During development, it's often useful to log debug messages to investigate specific issues. In those scenarios, it's useful to log:

  • The value of a related variable

  • If a particular function was called

  • The return value for a function call

  • API responses

  • The sequence of particular events

Sometimes it's useful to know which functions called which at a particular spot in your codebase. It's possible to log a stack trace with console.trace():

  var foo = function foo() {
      console.trace();
    },
    bar = function bar() {
      foo();
    };

  bar();

  /*
  Trace
      at foo (repl:2:9)
      at bar (repl:5:1)
      at repl:1:1
      at REPLServer.self.eval (repl.js:110:21)
      at repl.js:249:20
      at REPLServer.self.eval (repl.js:122:7)
      at Interface.<anonymous> (repl.js:239:12)
      at Interface.EventEmitter.emit (events.js:95:17)
      at Interface._onLine (readline.js:202:10)
      at Interface._line (readline.js:531:8)
  */

It can also be useful to know the time it takes to complete a particular operation, especially if you want to avoid blocking the event loop. To find out, you can log start and end times. This is useful both for debugging purposes and for app performance profiling, to discover bottlenecks in your code:

  console.time('timer label');

  var foo = [];

  // Warning: Avoid large set iterations blocking 
  // the event loop like this one does.
  for (var i = 0, end = 1000000; i < end; i++) {
    foo[foo.length] = i;
  }

  console.timeEnd('timer label');

  timer label: 87ms

The timer label names your timer so that you can refer back to it when you call console.timeEnd().

Server Operations

Logs are useful for server operations. In case of a failure, it's often possible to determine which services are affected and which servers might be the source of a problem by examining the logs.

It's usually good idea to collect all of your logs in one data aggregation service so that it's easy check each critical piece of system infrastructure and ensure that it's behaving as expected. For example, you may notice that your database service is returning a higher number of errors than expected, and that could give you a great place to start looking for the source of the problem.

Security

Security breaches can be traced by examining the logs to track breakins, including originating user accounts and IP addresses. Security experts can use logs to establish a trail of breadcrumbs that can lead back to the source of the intrusion, uncover the attack vectors, and discover backdoors and malware that may have been injected.

Security forensics involves examining:

  • Who was involved

  • What was compromised

  • When it was compromised

  • How it was compromised

  • Which systems were involved

  • Attack vectors (intruder path and sequence of events)

Because a system that has been breached can no longer be trusted, it's important to get an offline snapshot of the system as soon as the breach is detected before attempting to perform any diagnostic operations.

Taking the system offline is similar to a police investigation taping off a crime scene. The idea is to preserve the evidence as much as possible. If the system is still active, the intruder could still be active on it, destroying evidence. Even if the intruder is long gone, the actions of the investigators could destroy evidence, like police offers covering a perpetrator's footprints with their own.

Note that with virtual machines, it's possible to take snapshots of a running machine. You can take an offline snapshot and isolate the running instance from other network components in order to lure the intruder into returning to the crime scene. As soon as the isolated system is accessed again, you've caught your perpetrator red-handed.

Using this technique, it's possible for investigators to observe the intruder's behavior in realtime in a safe, sandboxed environment. It's even possible to simulate full machine clusters with phony data.

Sandboxed clusters like this are sometimes established as a preventive measure prior to any known intrusion. The only purpose they serve is to lure intruders, so you know that any IP address that accesses the system is suspect. Cross-checking those IP addresses with the access logs of your other machines can reveal intrusion vectors that might otherwise go unnoticed.

Auditing

Some organizations require in-depth audit trails of every user of the system in order to verify policy compliance. In those cases, any time a user is authenticated or authorized, that event should be logged for auditing purposes. Every authentication or authorization attempt should be logged, regardless of the outcome. Combined with the detailed logs collected for security purposes and general analytics, you should be able to piece together a very detailed account of every system interaction.

Business Analytics

For business analytics, you should be logging anything that can move the needle on your business KPIs (Key Performance Indicators).

For a SaaS business (Software as a Service), there are a handful of KPIs that you must be able to measure. Be sure that you're logging enough data to calculate all of them:

Viral factor (K-factor)

Churn rate

Percentage of users who stop using the product from one month to the next.

Monthly Recurring Revenue (MRR)

The amount of revenue earned per customer per month.

Customer acquisition cost

In order to calculate this value, it's important to link conversions back to campaign spending wherever possible. Be sure that marketing campaign identifiers are logged with your transactions.

Customer Lifetime Value (CLTV)

How much does each customer spend during the lifetime of his activity, on average?

Viral Factor

Word of mouth will make or break your app. If people aren't sharing it, you won't achieve organic growth and your app will fail. The standard measure for viral factor is taken from biology's K-factor, which is designed to measure the infection potential of a virus. In order to measure it, you need to track two variables:

i = distribution

Number of invites sent by existing customers before they go inactive

c = infection

Invite conversion rate

With those measurements tracked, calculating k is easy:

k = i × c

If k = 1, your application audience will remain constant. It won't grow or shrink. For k > 1, your application is in exponential growth (this is what you want).

For k < 1, your application is in exponential decline (this is the worst-case scenario).

If your app lacks an invite mechanism and a way to log invitations, you can't directly track or log distribution.

If you're not tracking conversions (new user signups), you can't calculate your application's virality. That's like flying an airplane without knowing how high above the ground you are.

Would you board an airplane with no windshield and no altitude measurement? Always track your viral factor.

Logging Checklist

Now that you have a basic overview of typical data capture needs, it might be useful to get specific about exactly what you should be logging, and how. The following checklist will help you determine what particular data you should consider logging and go into some detail about how you might log it:

  • Requests from clients (GET, POST, etc.)

  • Errors

  • Failure alerts

  • Conversion goals (sales, social-media shares, etc.)

  • Authorizations (user requesting a protected resource)

  • Feature activations (moving between pages and views, etc.)

  • Performance metrics

  • Server response time

  • Time to page-ready event

  • Time to page-rendered event

Logging Requests

This probably goes without saying, but for most applications, you should log all your server requests, regardless of the outcome.

For each request, you may want to log the following:

  • Timestamp

  • Hostname of the server (diagnose a clustered server)

  • Request ID (to link related log messages together)

  • Requester IP address

  • Request method

  • Request URL

  • Request headers

  • Request protocol (usually HTTP or HTTPS)

  • Referrer

  • User-Agent HTTP request header

  • Response

  • Status

  • Headers

  • Response time

  • Request ID

What not to log:

  • Passwords

  • Secret tokens

  • Credit card numbers

  • Information that would hurt user privacy

  • Classified information

Make sure your request serializer is set up to redact information that could compromise your users’ privacy or your system security.

Your request logger should be the among the first items in your middleware stack. Something like Bunyan might come in handy. Bunyan is a lightweight logger that supports extensible streams and custom serialization of different types of log messages.

Bunyan allows you to create custom object serializers. Basically, a Bunyan serializer transforms the object you pass in to log the properties you want to log and ignore the properties you don't want to log.

You can add as many serializers as you want. It comes with serializers for req, res, and err out of the box, but you'll probably want to customize them based on the needs of your app. Here are some replacement serializers that add requestId and responseTime to the standard serializers and make sure that the correct IP gets logged, even if your server is behind a reverse proxy (a very common production configuration):

  var serializers = {
    req: function reqSerializer(req) {
      if (!req || !req.connection) {
        return req;
      }

      return {
        url: req.url,
        method: req.method,
        protocol: req.protocol,
        requestId: req.requestId,

        // In case there's a proxy server:
        ip: req.headers['x-forwarded-for'] ||
          req.connection.remoteAddress,
        headers: req.headers
      };
    },
    res: function resSerializer(res) {
      if (!res) {
        return res;
      }

      return {
        statusCode: res.statusCode,
        headers: res._header,
        requestId: res.requestId,
        responseTime: res.responseTime
      };
    },
    err: function errSerializer(err) {
      if (!err || !err.stack) {
        return err;      
      }

      return {
          message: err.message,
          name: err.name,
          stack: getFullStack(err),
          code: err.code,
          signal: err.signal,
          requestId: err.requestId
      };    
    }
  };

To use a serializer, you need to pass an object with a key sharing the name of the serializer into the log call. For example, instead of:

  log.info(req);

You want to trigger the request serializer by passing in the request on the req key:

  log.info({req: req});

You'll want middleware to log requests, responses, and errors:

  log.requestLogger = function
      createRequestLogger() {

    return function requestLogger(req, res,
        next) {

      // Used to calculate response times:
      var startTime = Date.now();

      // Add a unique identifier to the request.
      req.requestId = cuid();

      // Log the request
      log.info({req: req});

      // Make sure responses get logged, too:
      req.on('end', function () {
        res.responseTime = Date.now() - startTime;
        res.requestId = req.requestId;
        log.info({res: res});
      });

      next();
    };
  };

  log.errorLogger = function
      createErrorLogger() {

    return function errorLogger(err, req, res,
        next) {

      // Add the requestId so we can link the
      // error back to the originating request.
      err.requestId = req.requestId;

      log.error({err: err});
      next(err);
    };
  };

Most apps will also need a way to log client-side events. For this, you can use a /log.gif endpoint.

Counterintuitively, a GET endpoint has significant advantages over a POST endpoint for this purpose. Unlike POST requests, it can easily track email opens, goal-page views on third-party ecommerce and affiliate pages, and other events and impressions on third-party sites. For fast and flexible client-side logging, a tracking pixel is still the way to go:

  // Tracking pixel / web bug
  // 
  // Using a 1x1 transparent gif allows you to
  // use the logger in emails or embed the 
  // tracking pixel on third party sites without
  // requiring JavaScript.
  log.route = function route() {
    return function pixel(req, res) {
      var data;

      if (settings.logParams && req.params) {
        data = mixIn({}, req.params, {
          requestId: req.requestId
        });
        log.info(req.params);
      }

      res.header('content-type', 'image/gif');

      // GIF images can be so small, it's
      // easy to just inline it instead of
      // loading from a file:
      res.send(
        'GIF89a\u0001\u0000\u0001\u0000' +
        '\u00A1\u0001\u0000\u0000\u0000\u0000' +
        '\u00FF\u00FF\u00FF\u00FF\u00FF\u00FF' +
        '\u00FF\u00FF\u00FF\u0021\u00F9\u0004' +
        '\u0001\u000A\u0000\u0001\u0000\u002C' +
        '\u0000\u0000\u0000\u0000\u0001\u0000' +
        '\u0001\u0000\u0000\u0002\u0002\u004C' +
        '\u0001\u0000;');
    };
  };

For your convenience, I've published all this on npm as bunyan-request-logger. To use it in your app:

  $ npm install --save bunyan-request-logger

There's a demo client side /log.gif endpoint service included that shows how to use it. You can find it in ./examples/log-only.js. Here's what it looks like:

  'use strict';

  var express = require('express'),
    logger = require('../request-logger.js'),
    noCache = require('connect-cache-control'),
    log = logger(),
    app = express(),
    port = 3000;

  app.use( log.requestLogger() );

  // Route to handle client side log messages.
  //
  // This route prepends the cache-control
  // middleware so that the browser always logs
  // to the server instead of fetching a useless
  // OK message from its cache.
  app.get( '/log.gif', noCache, log.route() );

  app.listen(port, function () {
    log.info('Listening on port ' + port);
  });

Logging Errors

In an error log message, there are three critical pieces of information:

  • Timestamp

  • Error message

  • Stack trace

The most useful of these, perhaps, is the stack trace. It tells you where an error was thrown, right down to the line number.

In some cases, that information alone is enough to determine the root cause of the error. In other cases, you'll need to do more digging, but at least you'll know where to start looking.

To add error logging to the previous example app, from bunyan-request-logger (available in the examples folder):

  // Log request errors:
  app.use(log.errorLogger);

That refers to the following Express error handler:

  log.errorLogger = function
      createErrorLogger() {

    return function errorLogger(err, req, res,
        next) {

      // Add the requestId so we can link the
      // error back to the originating request.
      err.requestId = req.requestId;

      log.error({err: err});
      next(err);
    };
  };

As you can see, the middleware logs the error and passes the error to the next error handler in the middleware stack. It's easier to log first than it is to remember to add logging calls to every possible path in the error handler.

Remember, regardless of the outcome of the error handling, you're going to log the client response anyway. That's already handled by the request logger, so you don't have to worry about it.

Here's a sample app that demonstrates error logging in context. To use it, you'll need to install express-error-handler:

  $ npm install --save express-error-handler

The code for the sample app is in bunyan-request-logger/examples/app.js:

  'use strict';

  var express = require('express'),
    logger = require('../request-logger.js'),
    noCache = require('connect-cache-control'),
    errorHandler = require('express-error-handler'),
    log = logger(),
    app = express(),
    env = process.env,
    port = env.myapp_port || 3000,
    http = require('http'),
    server;

  app.use( log.requestLogger() );

  // Route to handle client-side log messages.
  //
  // This route prepends the cache-control
  // middleware so that the browser always logs
  // to the server instead of fetching a useless
  // OK message from its cache.
  app.get( '/log.gif', noCache, log.route() );

  // Route that triggers a sample error:
  app.get('/error', function createError(req,
      res, next) {
    var err = new Error('Sample error');
    err.status = 500;
    next(err);
  });

  // Log request errors:
  app.use( log.errorLogger() );

  // Create the server object that we can pass
  // in to the error handler:
  server = http.createServer(app);

  // Respond to errors and conditionally shut
  // down the server. Pass in the server object
  // so the error handler can shut it down
  // gracefully:
  app.use( errorHandler({server: server}) );

  server.listen(port, function () {
    log.info('Listening on port ' + port);
  });

Sample Log Output

Take a look at what happens in the console when you start the sample app:

  $ cd ./node_modules/bunyan-request-logger
  $ node ./examples/app.js

Default Bunyan log messages look like this:

  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":30,
     "msg":"Listening on port 3000",
     "time":"2013-10-07T07:52:59.236Z",
     "v":0
  }

The app name and host name make it easy to identify the particular source of the log message, even after it's aggregated on a different server. Anything you pass to the log.info() call will appear as the message, and of course, the timestamp tells you when the event was logged.

A successful GET request log entry looks like this:

  $ curl localhost:3000/log?somedata=yay
  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":30,
     "req":{
        "url":"/log?somedata=yay",
        "method":"GET",
        "protocol":"http",
        "requestId":"chmhekaby0000y0ixelvt7y7u",
        "ip":"127.0.0.1",
        "headers":{
           "user-agent":"curl/7.24.0 
           (x86_64-apple-darwin12.0) 
           libcurl/7.24.0 OpenSSL/0.9.8x 
           zlib/1.2.5",
           "host":"localhost:3000",
           "accept":"*/*"
        }
     },
     "msg":"",
     "time":"2013-10-07T07:53:07.150Z",
     "v":0
  }

As with the previous message, all the basics are there, but this one includes the serialized request object, including the requested URL, method, protocol, the request ID that will help you tie this log entry to other events triggered by this particular request, the user's IP address, and request headers.

The corresponding response log entry looks like this:

  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":30,
     "res":{
        "statusCode":200,
        "headers":"HTTP/1.1 200 OK
        X-Powered-By: Express
        Expires: 0
        Cache-Control: no-store, no-cache,
        must-revalidate, max-age=0
        Pragma: no-cache
        Content-Type: text/plain
        Content-Length: 2
        Date: Mon, 07 Oct 2013 07:53:07 GMT
        Connection: keep-alive",
        "requestId":"chmhekaby0000y0ixelvt7y7u",
        "responseTime":3
     },
     "msg":"",
     "time":"2013-10-07T07:53:07.154Z",
     "v":0
  }

This time we get the corresponding response object, including the response headers, and critically, the status code. Status codes can let you know about all sorts of problems, such as authorization issues, broken links, 500 errors, and a whole lot more. Each endpoint might have a list of expected status codes, and any log items that deviate from that list should be investigated. Again, the request ID helps you link this particular log entry to the specific request that triggered it.

When a request triggers an error, first it logs the request as usual. As you can see, there's nothing in particular about this request that would indicate that anything went wrong (yet):

  $ curl localhost:3000/error
  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":30,
     "req":{
        "url":"/error",
        "method":"GET",
        "protocol":"http",
        "requestId":"chmhekbuq0001y0ix6k6brxq6",
        "ip":"127.0.0.1",
        "headers":{
           "user-agent":"curl/7.24.0
           (x86_64-apple-darwin12.0)
           libcurl/7.24.0 OpenSSL/0.9.8x
           zlib/1.2.5",
           "host":"localhost:3000",
           "accept":"*/*"
        }
     },
     "msg":"",
     "time":"2013-10-07T07:53:09.122Z",
     "v":0
  }

Then it logs the error object, complete with a stack trace (truncated here for sanity):

  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":50,
     "err":{
        "message":"Sample error",
        "name":"Error",
        "requestId":"chmhekbuq0001y0ix6k6brxq6",
        "stack":"Error: Sample error
        at createError (/Users/user/Dropbox/dev/
        pja-source/bunyan-request-logger/examples/
        app.js:39:13)
        at ..."
     },
     "msg":"",
     "time":"2013-10-07T07:53:09.124Z",
     "v":0
  }

Logging the stack trace here could give you a good start on finding the root cause.

Finally, it logs the error response that gets sent to the user:

  {
     "name":"unnamed app",
     "hostname":"unnamed-host",
     "pid":62136,
     "level":30,
     "res":{
        "statusCode":500,
        "headers":"HTTP/1.1 500 Internal Server
        Error
        X-Powered-By: Express
        Content-Type: text/plain
        Content-Length: 21
        Date: Mon, 07 Oct 2013 07:53:09 GMT
        Connection: keep-alive",
        "requestId":"chmhekbuq0001y0ix6k6brxq6",
        "responseTime":3
     },
     "msg":"",
     "time":"2013-10-07T07:53:09.125Z",
     "v":0
  }

Logging Service Alerts

All of the services you depend on should be constantly monitored for availability, and alerts should be configured. When an alert is triggered, the alert should be logged in addition to having the alert dispatched to the technicians in charge of repairing the service.

Be sure that you're also logging all your service start-ups so that you can also log when the service becomes available again.

Logging Goals

Conversion goals can be measured using tools like Google Analytics without explicitly building the log calls into the source code. You should also log (at a minimum):

  • New visitors

  • New signups

  • Customer transactions

  • Application invites

For transactions:

  • Timestamp

  • User's IP address

  • User

  • A flag indicating whether the user is a new or repeat customer

  • Purchased items

  • Total transaction amount

  • Entry page

  • Keyword used to find the entry page, if any

  • Marketing campaign, if known

  • Referring user, if known

Most of this data will come from the user session and shopping cart.

Profiling and Instrumentation

It's possible to profile performance and capture errors without explicitly logging all over your codebase. You can instrument code using tools such as New Relic and AppDynamics as a potent supplement to your own log messaging.

Both providers deliver deep insight into the critical performance metrics of your app. New Relic and AppDynamics provide a PaaS service and a freemium model so that you can deploy and evaluate the solution to determine whether or not it will meet your needs before you scale up and pay full price.

New Relic recently developed a Node.js agent, and AppDynamics has acquired Nodetime in order to provide instrumentation for Node.js applications. You should strongly consider instrumentation tools like these before you deploy your production application.

Getting started with New Relic is easy:

  $ npm install --save newrelic

  # assuming ./ is app root
  $ cp node_modules/newrelic/newrelic.js ./

Edit newrelic.js and replace license_key's value with the license key for your account. You might want to replace it with an environment variable so that you can commit the file without committing your key to your application repository.

Add the following as the first line of your app's main module:

  require('newrelic');

The next time you start your app, it will start reporting profiling data.

Similarly, for AppDynamics Nodetime:

  $ npm install nodetime
  require('nodetime').profile({
    accountKey: your_account_key
  });

Either of these services can show you round-trip times, and Google Analytics can supply timing information as well (under Content → Site Speed).

Logging Client-Side Events

Sometimes actions will occur on the client that you want to keep track of. For those events, it's helpful to establish a log endpoint that simply records all incoming messages. You've already seen how you can do that, simply by logging all incoming requests and then creating a specialized /log.gif endpoint that doesn't cache and returns a 1 × 1 transparent beacon.

Here's a stripped down implementation, sans error handling:

var express = express = require('express'),
  logger = require('../request-logger.js'),
  noCache = require('connect-cache-control'),
  log = logger(),
  app = express(),
  port = 3000;

app.use( log.requestLogger() );

// Route to handle client-side log messages.
//
// Counter to intuition, client-side logging
// works best with GET requests.
// 
// AJAX POST sends headers and body in two steps,
// which slows it down.
// 
// This route prepends the cache-control
// middleware so that the browser always logs
// to the server instead of fetching a useless
// No Content message from its cache.
app.get( '/log.gif', noCache, log.route() );

app.listen(port, function () {
  log.info('Listening on port ' + port);
});

Deciphering Data

Logs don't do you any good if you have to sift through mountains of data to find the information you're really looking for. It's important to invest in the right tools to help you find the data you need, quickly. Google Analytics provides a lot of data logging, parsing, and visualization out of the box, along with the ability to log custom events, set up custom goals to track, performance profiling, and a lot more. You'll get a lot of actionable data just by adding Google Analytics to your technology mix.

Another great way to gain application insights is to connect your logging mechanisms to data mining tools such as Splunk, Mixpanel, or KISSmetrics.

Splunk turns mountainous logs into actionable operational intelligence. It can aggregate data from just about any machine-data source, including server logs, network activity, even customer support issues, and phone logs. It has an uncanny ability to make sense of all of it and make it easy to search for data and create detailed reports that allow you to highlight the story that's important to the individual using the data.

For example, the security team can create custom reports that follow an intruder's path spanning multiple systems, while somebody from the marketing is simultaneously looking at a graph of conversion rate changes before and during a promotion.

Mixpanel and KISSmetrics are both focused more specifically on understanding important metrics for web applications. KISSmetrics prides themselves on their ability to give you detailed insights about your customers, while Mixpanel touts the ease of creating and using custom dashboards to track KPIs.

Conclusion

I hope you have a newfound respect for the importance of logging in your application. As you can see, logging can provide a lot of very valuable information.

There is a reason that airplanes have lots of dials on their dashboards. As an ASP, your logging is the nervous system that supplies all of the information for those dials. If you're not delivering actionable data, you're flying blind.