Chapter 1. Introduction

We live in a 3D world. People move, think and experience in three dimensions.

Much of our media is also 3D— though it is usually presented on flat screens. Animated films are created from computer-generated 3D images. Online map services allow us to explore our destination, virtually, in a 3D environment. Most video games, whether running on dedicated consoles or mobile phones, are rendered in 3D. Even the news has gone 3D: the sight of a CNN analyst meandering through a virtual set, comically awkward a few years ago, has become an accepted part of the broadcast vocabulary as cable channels vie for increasing attention in a twenty-four hour news cycle.

3D graphics is nearly as old as the computer itself, tracing its roots back to the 1960’s. It has been used in applications spanning engineering, education, training, architecture, finance, sales and marketing, gaming and entertainment. Historically, 3D applications have relied on high-end computer systems and expensive software. But that has changed in the last decade. 3D processing hardware is now shipped in every computer and mobile device, with the consumer smart phone of today possessing more graphics power than the professional workstation of fifteen years before. More importantly, the software required to render 3D is now not only universally accessible; it’s free. It’s called a web browser.

Figure 1-1 shows an excerpt from 100,000 Stars, a browser-based 3D flythrough simulation of our stellar neighbors in the Milky Way. Using the mouse, you can rotate about the galactic plane and zoom in to a star of interest. Stars are represented with renderings that approximate their apparent magnitude and color. Each star is labeled with its common name; when you mouse over the label, it highlights. Click on the label, and an overlay appears displaying the Wikipedia entry for that star. Click on a hyperlink in the overlay text, and the browser will launch that link in a new tab. 100,000 Stars is a stunningly produced interactive experience featuring beautiful renderings, pulsing animations, a majestic soundtrack, and an artfully integrated 2D user interface.

The 100,000 Stars project by Google ()
Figure 1-1. The 100,000 Stars project by Google (http://workshop.chromeexperiments.com/stars/)

100,000 Stars was created as an experiment by Google’s Data Arts Team to demonstrate the rich capabilities of the Chrome browser. While the application is experimental, the technologies underlying it are not: it was built using HTML5 features available today in most browsers. The galaxy and stars are rendered in real-time using WebGL, the new standard for hardware-accelerated 3D web graphics; the labels are placed relative to their stars using 3D transforms now available in CSS3; and the overlays blend seamlessly with the 3D content because browsers combine, or composite, all page elements into a unified presentation.

Just a few years ago, an experience like 100,000 Stars could only have been achieved in a native client application requiring a large download and installation, produced using complex tools in a time-consuming and expensive development process. Today, it can be built using a browser, free and open source tools, and a standard web technology stack. What’s more, updates are instantly available by simply reloading the page; information from anywhere on the web can be loaded via URL; and hyperlinks from the 3D can take you to more information.

This book is about taking advantage of the awesome power of the modern browser to create a new breed of connected, visual application. Some of this breed will look a lot like its ancestors, essentially ports of traditional 3D products, refactored to reach new customers and reduce costs. But far more exciting are the possibilities for novel consumer applications in advertising, product marketing, customer support, education, training, tourism, gaming and entertainment-- to name a few. 3D brings a new dimension to the interactive experience; combined with web technology, the third dimension is now accessible to everyone on the planet.

100,000 Stars is a tour de force in interactive media development. Michael Chang, one of the creators, wrote a great case study of the project. To see what went into its development, go to

http://www.html5rocks.com/en/tutorials/casestudies/100000stars/

HTML5: A New Visual Medium

HTML has come a long way since the days of static pages, forms and the Submit button. In the early 2000’s browsers introduced rich interaction by allowing portions of a page to be changed dynamically via Ajax techniques. Still, the ways in which pages could be changed with Ajax were constrained by the graphical features of HTML and CSS. If a developer wished to go beyond those limits, they had to use media plugins such as Flash and QuickTime.

This was pretty much the status quo during the 2000’s, but things have changed over the last few years. Several browser advances under development during this period came together into HTML5. With HTML5, the web browser has become a platform capable of running sophisticated applications that rival native code in features and performance. HTML5 represents a massive overhaul to the HTML standard, including syntax cleanups, new JavaScript language features and APIs, mobile capabilities, and breakthrough multimedia support. Central to the HTML5 platform is a set of advanced graphics technologies that are the focus of this book:

  • WebGL for hardware-accelerated 3D rendering with JavaScript. Based on the time-tested graphics API OpenGL, WebGL is a standard supported by nearly all web browsers on the desktop as well as a growing number of mobile browsers.

  • CSS3 3D transforms, transitions and custom filters for advanced page effects. CSS has evolved over the past several years to include 3D rendering and animation features accessible through style sheet language.

  • The Canvas Element and its 2D drawing context API. Universally supported in browsers, this JavaScript API allows developers to draw arbitrary graphics to the surface of a DOM element. Though Canvas is a 2D API, with the help of additional JavaScript libraries it can be used to render 3D effects—providing an alternative for platforms where WebGL or CSS3 3D are not supported.

Each of these features has its strengths, weaknesses and technical tradeoffs, and each has a role to play in delivering interactive and visually compelling 3D experiences. Which ones you use can depend on several factors—what you are trying to build, which platforms you have to support, performance concerns and so on. Let say, for example, that you are creating a first-person shooter game and you need the highest-quality graphics. This will be hard to pull off without using WebGL’s extensive access to the rendering hardware. On the other hand, maybe you are developing a fancy channel tuner interface for a video web site, including live video thumbnails, rotation effects on rollovers, and dissolve transitions between clips; in that case CSS3 might have everything you need to deliver a killer experience.

“And one standard to rule them all...”

What most web developers think of informally as HTML5 is actually a collection of technologies and standards. Some of these are already fully ratified by the W3C and implemented in all browsers. Others are less mature as standards, but nevertheless widely supported. Still others, such as WebGL, are mature and stable standards, but not controlled by the W3C.

The Browser as Platform

HTML5 brings rich graphics to the web; by itself this would not amount to much without the presence of other essential browser improvements. In particular, a handful of advances have paved the way for true rich Internet application development with HTML5:

  • JavaScript Virtual Machine (VM) Performance. WebGL and Canvas 2D are JavaScript APIs; animation and interaction will only run as fast as the JavaScript code behind it. A few years ago, virtual machine performance would have made 3D development a non-starter for practical use. Thankfully, today’s VMs scream.

  • Accelerated Compositing. The browser is responsible for combining, or compositing, the various elements on the page quickly and without visual tearing or other adverse effects. As content has become more dynamic, browsers have made huge improvements in compositing, including using the 3D hardware-rendering pipeline for all visual elements, both 2D and 3D.

  • Animation Support. The function requestAnimationFrame() was introduced as an alternative to using setInterval() or setTimeout() to drive animations— greatly enhancing performance and eliminating visual artifacts.

HTML5 browsers also include features for multi-threaded programming (Web Workers), full duplex TCP/IP networking (WebSockets), local data storage, and more that developers can use to deliver world-class application functionality. These features, taken together with WebGL, CSS3 3D and the Canvas, represent a revolutionary new platform for delivering connected visual applications on any computer or device.

Figure 1-2 shows a demonstration version of Epic Games’ Epic Citadel running (as of this writing) in a development build of Firefox. Epic Citadel uses WebGL to render the graphics, but what really sets this work apart is the breakthrough in game engine performance. The game uses a version of Epic’s Unreal engine that has been ported from its native C++/operating system-dependent code to a browser-based implementation, using the Emscripten compiler (https://github.com/kripken/emscripten/wiki) and asm.js, a new optimized low-level subset of JavaScript. By simply entering a URL, web browser users can access a beautifully rendered, full screen console game experience running at 60 frames per second, with very little download time and no installation required.

Epic Citadel Demonstration Running in Firefox: 60FPS Browser Gaming Powered by WebGL and asm.js
Figure 1-2. Epic Citadel Demonstration Running in Firefox: 60FPS Browser Gaming Powered by WebGL and asm.js

Browser Realities

As of this writing, 3D feature coverage is not complete across the various browsers. Worse, each browser supports a slightly different subset. We will explore these issues in detail in subsequent chapters, but here are the highlights:

  • WebGL is supported in all desktop browsers with the exception of Internet Explorer (which currently represents under 30% market share and shrinking); however, word has leaked to the public that Microsoft has implemented WebGL for IE internally and will ship it sometime in 2013;

  • WebGL is fully supported in mobile Chrome, Intel Tizen and the Blackberry 10 browser. WebGL is supported in a limited fashion in mobile Safari (in the iAds framework);

  • CSS Custom Filters are only supported experimentally in desktop Chrome, Safari, mobile Safari and Blackberry 10 — not in IE or Firefox.

Clearly, this is not an optimal situation, but it’s the sort of thing that comes with the territory when developing web applications. Cross-browser support has always been notoriously difficult; with the explosion of features in HTML5 and the proliferation of devices and operating systems, it hasn’t gotten any better. The only consolation is that the alternative is far worse: native applications are even harder to build, test, deploy and port. Oh well... such is the life of a web developer in the twenty-first century.

With all these standards, we should be approaching a state where we only have to write our code once. However, as we have become painfully aware, the mantra “write once—run anywhere,” has been replaced by the lament, “write once—debug everywhere.”

3D Graphics Basics

What is 3D?

Given that you picked up this book, chances are you have at least an informal idea about what we are talking about when we use the term 3D graphics. But to make sure we are clear, we are going to get formal and examine a definition. Here is the Wikipedia entry (from http://en.wikipedia.org/wiki/3D_computer_graphics).

3D computer graphics (in contrast to 2D computer graphics) are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be stored for viewing later or displayed in real-time.

Let’s break this down into its components: 1) the data is represented in a 3D coordinate system; 2) it is ultimately drawn (“rendered”) as a 2D image, for example, on your computer monitor; and 3) it can be displayed in real-time: when the 3D data changes as it is being animated or manipulated by the user, the rendered image is updated without a perceivable delay. This last part is key for creating interactive applications. In fact, it is so important that it has spawned a multi-billion dollar industry dedicated to specialized graphics hardware supporting real-time 3D rendering, with several companies you have probably heard of such NVIDIA, ATI, and Qualcomm leading the charge.

As important as what this definition says is what it doesn’t say: 3D graphics does not require special input hardware like trackballs and joysticks—though those can greatly enhance a 3D experience. Nor does it require custom display hardware: no stereo glasses required; no OmniMax theatre tickets as the price of entry. 3D graphics are most commonly rendered on a flat, 2D display. This is not to say that 3D can’t be displayed in stereo and seen with glasses or on a stereo TV— simply that it’s not a requirement.

3D programming requires new skills and knowledge beyond that of the typical web developer. However, armed with a little starter knowledge and the right tools, we can get going fairly quickly. The remainder of this chapter is devoted to understanding basic 3D programming concepts that will be used throughout the book. It is by no means exhaustive—entire books are devoted to learning the subject in detail—but it should be enough to get started. If you already have experience with 3D programming, feel free to move on to Chapter 2.

3D Coordinate Systems

If you are familiar with 2D Cartesian coordinate systems such as the window coordinates of an HTML document, you know about x and y values. These 2D coordinates define where <div> tags are located on a page, or the where virtual ‘pen’ or ‘brush’ draws in in the HTML Canvas element. Similarly, 3D drawing takes place (not surprisingly) in a 3D coordinate system, where the additional coordinate, z, describes depth, i.e. how far into or out of the screen an object is drawn. The coordinate systems we will work with in this book are arranged as depicted in Figure 1-2, with x running horizontally left to right, y running vertically, and positive z coming out of the screen. If you are already comfortable with the concept of the 2D coordinate system, the transition to a 3D coordinate system should be straightforward.

Note that WebGL defines positive y as going from the bottom to the top of the window, while the 2D Canvas API and CSS transforms define positive y as going down. This is unfortunate, but it reflects the different heritages of the two technologies: WebGL is based on long-lived graphics standards that use the y-up convention, while Canvas and CSS are based on the HTML coordinate y-down convention—itself a descendant of time-worn window system coordinate schemes. If you end up working in both technologies on a project, you will have to keep this distinction straight. But it could be worse... z could also be reversed! Fortunately, it’s not.

A 3D Coordinate System Creative Commons Attribution-Share Alike 3.0 Unported License
Figure 1-3. A 3D Coordinate System https://commons.wikimedia.org/wiki/File:3D_coordinate_system.svg Creative Commons Attribution-Share Alike 3.0 Unported License

Meshes, Polygons and Vertices

While there are several ways to draw 3D graphics, by far the most common is to use a mesh. A mesh is an object composed of one or more polygonal shapes, constructed out of vertices (x, y, z triples) defining coordinate positions in 3D space. The polygons most typically used in meshes are triangles (groups of three vertices) and quads (groups of four vertices). 3D meshes are often referred to as models.

Figure 1-3 illustrates a 3D mesh. The dark lines outline the quads that comprise the mesh, defining the shape of the face. (You would not see these lines in the final rendered image; they are included for reference.) The x, y and z components of the mesh’s vertices define the shape only; surface properties of the mesh, such as the color and shading, are defined using additional attributes, as we will discuss below.

A 3D Mesh Creative Commons Attribution-Share Alike 3.0 Unported license
Figure 1-4. A 3D Mesh http://upload.wikimedia.org/wikipedia/commons/8/88/Blender3D_UVTexTut1.pngCreative Commons Attribution-Share Alike 3.0 Unported license

Materials, Textures and Lights

The surface of a mesh is defined using additional attributes beyond the x, y, and z vertex positions. Surface attributes can be as simple as a single solid color, or they can be complex, comprising several pieces of information that define, for example, how light reflects off the object or how shiny the object looks. Surface information can also be represented using one or more bitmaps, known as texture maps (or simply textures). Textures can define the literal surface look (such as an image printed on a T-shirt), or they can be combined with other textures to achieve sophisticated effects such as bumpiness or iridescence. In most graphics systems, the surface properties of a mesh are referred to collectively as materials. Materials typically rely on the presence of one or more lights, which (as you may have guessed) define how a scene is illuminated.

The head in Figure 1-2 has a material with a purple color and shading defined by a light source emanating from the left of the model. Note the shadows on the right side of the face.

Transforms and Matrices

3D meshes are defined by the positions of their vertices. It would get really tedious to change a mesh’s vertex positions every time you want to move it to a different part of the view, especially if the mesh were continually animating. For this reason, most 3D systems support transforms, operations that move the mesh by a relative amount without having to loop through every vertex, explicitly changing its position. Transforms allow a rendered mesh to be scaled, rotated and translated (moved) around, without actually changing any values in its vertices.

A transform is typically represented by a matrix, a mathematical object containing an array of values used to compute the transformed positions of vertices. If you are a linear algebra geek like me, you probably feel comfortable with this idea. If not, please don’t break into a cold sweat. The toolkits we are using in this book lets us treat matrices like black boxes: we just say translate, rotate or scale and the right thing happens.

Cameras, Perspective, Viewports and Projections

Every rendered scene requires a point of view from which the user will be viewing it. 3D systems typically use a camera, an object that defines where (relative to the scene) the user is positioned and oriented, as well as other real-world camera properties such as the size of the field of view, which defines perspective (i.e. objects farther away appearing smaller). The camera’s properties combine to deliver the final rendered image of a 3D scene into a 2D viewport defined by the window or canvas.

Cameras are almost always represented using a couple of matrices. The first matrix defines the position and orientation of the camera, much like the matrix used for transforms (see above). The second matrix is a specialized one that represents the translation from the 3D coordinates of the camera into the 2D drawing space of the viewport. It is called the projection matrix. I know: more math. But the details of camera matrices are nicely hidden in most tools, so you usually can just point, shoot and render.

Figure 1-4 depicts the core concepts of the camera, viewport and projection. At the lower left we see an icon of an eye; this represents the location of the camera. The red vector pointing to the right (in this diagram labeled as the X axis) represents the direction in which the camera is pointing. The blue cubes are the objects in the 3D scene. The green and red rectangles are, respectively, the near and far clipping planes. These two planes define the boundaries of a subset of the 3D space, known as the view volume or view frustum. Only objects within the view volume are actually rendered to the screen. The near clipping plane is equivalent to the viewport, where we will see the final rendered image.

Camera, Viewport and Projection Reproduced with permission
Figure 1-5. Camera, Viewport and Projection http://obviam.net/index.php/3d-programming-with-android-projections-perspective/Reproduced with permission

Cameras are extremely powerful, as they ultimately define the viewer’s relationship to a 3D scene and provide a sense of realism. They also provide another weapon in the animator’s arsenal: by dynamically moving around the camera you can create cinematic effects and control the narrative experience.

Shaders

In order to render the final image for a mesh, a developer must define exactly how vertices, transforms, materials, lights and the camera interact with each other to create that image. This is done using shaders. A shader (also known as a programmable shader) is a chunk of program code that implements algorithms to get the pixels for a mesh onto the screen. The graphics hardware understands vertices, textures and little else; it has no concept of material, light, transform, or camera. Those high-level structures are interpreted by the shader program. Shaders are typically defined in a high-level C-like language and compiled into code usable by the graphics-processing unit (GPU).

All modern computers and devices come equipped with a graphics-processing unit, or GPU, a separate processor from the CPU that is dedicated to rendering 3D graphics. The majority of the 3D programming techniques discussed this book assume the presence of a GPU.

Shaders put amazing power at the programmer’s fingertips: full control over every pixel, each time the image is rendered. Shaders power the incredible visuals we see in Hollywood special effects, “CG” animated films, and real-time rendering in today’s video games. With shader support now in web browsers, we can get the same production value as a top video game in our WebGL applications, as well as fine control over how CSS elements are presented and animated on a page.

Figure 1-5 shows a WebGL water simulation rendered using a programmable shader. The rippling water and dancing lights are incredibly realistic, and you can interact with the scene while it is simulating, all in real-time. Reminder: this is running in a web browser!

WebGL water simulation using programmable shaders, by Evan Wallace ()
Figure 1-6. WebGL water simulation using programmable shaders, by Evan Wallace (http://madebyevan.com/webgl-water/)

These types of effects aren’t limited to WebGL. Figure 1-6 shows the before/after of a DOM element using a CSS Custom Filter to create a “crumple” effect. When the mouse is rolled over the element, a shader program distorts the vertices that comprise the display rectangle for the element, animating the vertices over a short time interval until they appear like crumpled paper. What is most significant about animating with CSS Custom Filters is that the contents of the DOM element are standard HTML: a few bits of text with styles, plus an image. CSS Custom Filters allow web developers to leverage their existing knowledge of HTML while creating new eye-catching interactive effects.

Crumple Shader, a CSS3 Custom Filter by Altered Qualia ()
Figure 1-7. Crumple Shader, a CSS3 Custom Filter by Altered Qualia (http://alteredqualia.com/css-shaders/crumple.html)

Here are a few subtle things to note about shaders relative to the technologies we will cover in the book:

  • WebGL and CSS custom filters both use shaders defined in the OpenGL ES Shader Language (called GLSL ES). There are some differences between the shaders you write for WebGL vs. CSS, but the base languages are identical.

  • WebGL requires the developer to supply shaders in order for objects to be drawn. If no shader is supplied, or there is an error in compiling or loading the shader, nothing will render on the screen.

  • With CSS3 Filters, shaders are optional. When shaders are used with a CSS3 Filter, it is referred to as a custom filter.

  • The Canvas 2D API does not support programmable shaders. If you plan to employ 2D Canvas drawing as a fallback to WebGL rendering, you will need to accommodate for this in your rendering code. More on this in Chapter XX.

Shaders represent a bit of a learning curve, with new concepts, another programming language, and great care required. If you find this daunting, don’t worry. There are many popular open source libraries and tools to choose from that hide the gory details of shaders. You may even be able to get through your entire 3D programming career without ever writing a line of GLSL code—though I recommend you try it anyway, just to be able to say you did.

Those are the basics of 3D graphics. Each of the technologies in the book treats the details a little differently, but the concepts translate fairly well across. In the next few chapters we are going to dive deep into the details of creating 3D with WebGL, CSS3 and Canvas 2D. It’s time: everyone—into the pool!