Chapter 3. <audio>/<video> for Publishers

One of the most exciting features of HTML5 is that it offers native support for audio and video content. On the Web, this means that reliance on browser plugins in order to facilitate display of multimedia content is becoming a thing of the past. On the ereader side, HTML5 and EPUB 3 open the door to embedding this same multimedia content directly within an ebook. Let’s take a quick look at HTML5’s new <audio> and <video> elements.

A Two-Minute Introduction to the <audio> and <video> Elements

The standard HTML5 <audio> element looks like this:

<audio id="new_slang">
<source src="new_slang.wav" type="audio/wav"/>
<source src="new_slang.mp3" type="audio/mp3"/>
<source src="new_slang.ogg" type="audio/ogg"/>
<em>(Sorry, &lt;audio&gt; element not supported in your
  browser/ereader, so you will not be able to listen to
  this song.)</em>

The <audio> element serves as a container, which holds a series of <source> elements that reference your audio files (src attribute) in whichever formats you have available (type attribute). If you only have one format available, you can abbreviate the markup as follows:

<audio id="new_slang" src="new_slang.wav">No song for you!</audio>

However, current best practice is to provide audio in multiple formats—usually WAV, MP3, and Ogg—in order to ensure compatibility across the range of HTML5 audio–compliant browsers and ereaders (see “HTML5 Audio/Video Compatibility in the Browser and Ereaders”).

The <audio> element also accepts a handful of optional boolean attributes for customizing playback: controls, which displays a standard set of audio playback control buttons for the user; autoplay, which makes the audio play automatically, as soon as it’s been loaded; and loop, which makes the audio repeat over and over and over...

<audio id="new_slang" src="new_slang.wav" controls autoplay loop>
No song for you!

Note that HTML5 permits boolean attributes to be supplied without a corresponding value, but at the present time, for better compatibility in ereaders that are expecting XHTML content, I recommend including attribute values equal to the attribute name (which is also valid in EPUB 3):

<audio id="new_slang" src="new_slang.wav" controls="controls" autoplay="autoplay" 
loop="loop">No song for you!

The standard HTML5 <video> element is structured similarly to <audio>:

<video id="dancing_pony" width="300" height="300">
<source src="dancing_pony.mp4" type="video/mp4"/>
<source src="dancing_pony.ogg" type="video/ogg"/>
(Sorry, &lt;video&gt; element not supported in your
  browser/ereader, so you will not be able to watch the pony dance.)

The width and height attributes on the <video> element specify the dimensions of the video. Additionally, <video> also supports the same boolean controls, autoplay, and loop attributes as <audio>, as well as the same shorthand markup if you only have one video format:

<video id="dancing_pony" width="300" height="300" src="dancing_pony.mp4" 
controls="controls" autoplay="autoplay" loop="loop">
No pony for you!

Also, as with <audio>, browser/ereader compatibility varies for different video formats. Encoding video in both MPEG-4 and Ogg formats is a safe bet (see “HTML5 Audio/Video Compatibility in the Browser and Ereaders” for more details). In the following sections, we’ll look at a couple of simple demos of audio and video in action.

An Audio-Enabled Glossary

One great use of HTML5 audio element is to add supplemental text-to-speech functionality to your book content. In this example, we’ll add audio functionality to a glossary so that you can click/tap a button to hear the pronunciation of each term. We’ll use the <audio> element to embed the sound bites, and JavaScript to control the audio playback. Example 3-1 shows the HTML for our glossary, which defines a few terms ebook publishers will likely be familiar with; <audio> elements are highlighted in bold.

Example 3-1. Audio-enabled glossary HTML (glossary.html)
<!DOCTYPE html>
<html lang="en">
<meta charset="UTF-8">
<title>Digital Publishing Mini-Glossary</title>
<script src="modernizr-1.6.min.js"></script>
<script src="glossary.js"></script>
<style media="screen" type="text/css">
dl {
  width: 400px;

dt {
  padding-top: 10px;
  padding-bottom: 5px;
  font-style: italic;
  color: red;

dd {
  margin-left: 1.5em;

.play-button {
  font-style: normal;
  color: blue;
  padding: 3px;
  border:2px solid;
  border-color: black;
  background-color: gray;

dt .play-button {
  margin-left: 6px;
<h1>Digital Publishing Mini-Glossary</h1>
<p>Click the <span class="play-button">&#x25b6;</span> button to hear the
  pronunciation of a term</p>
<!--Audio content -->
<audio id="epub">
<source src="audio/epub.wav" type="audio/wav"/>
<source src="audio/epub.mp3" type="audio/mp3"/>
<source src="audio/epub.ogg" type="audio/ogg"/>
<em>(Sorry, &lt;audio&gt; element not supported in your
<audio id="mobi">
<source src="audio/mobi.wav" type="audio/wav"/>
<source src="audio/mobi.mp3" type="audio/mp3"/>
<source src="audio/mobi.ogg" type="audio/ogg"/>
<audio id="pdf">
<source src="audio/pdf.wav" type="audio/wav"/>
<source src="audio/pdf.mp3" type="audio/mp3"/>
<source src="audio/pdf.ogg" type="audio/ogg"/>
<div class="glossary">
<dt>EPUB <input type="submit" class="play-button" 
id="epub_button" value="&#x25b6;"/></dt>
<dd>An open standard for reflowable ebook content created and maintained by the <a
  href="">International Digital Publishing Forum
  (IDPF)</a> based on HTML, CSS, and XML technologies. Version 3.0 of
  EPUB will support HTML5.</dd>
<dt>Mobipocket <input type="submit" class="play-button" 
id="mobi_button" value="&#x25b6;"/></dt>
<dd>A proprietary standard for reflowable ebook content developed by <a
  href="">Mobipocket SA</a>,
  and used by Amazon on its hardware and software Kindle
<dt>Portable Document Format (PDF) <input type="submit" class="play-button" 
id="pdf_button" value="&#x25b6;"/></dt>
<dd>An open standard for page-based (non-reflowable) electronic documents created by 
Adobe Systems that has been in use since the 1990s. Many ereader devices support
  PDF files, as well as EPUB or Mobi.</dd>

Each glossary term is followed by an <input> button styled with CSS to resemble a play button. Figure 3-1 shows the glossary displayed in iBooks for iPad.

Audio-enabled glossary in iBooks
Figure 3-1. Audio-enabled glossary in iBooks

Next, we’ll write some JavaScript that initiates the audio playback when one of the <input> buttons is clicked. Example 3-2 shows the code.

Example 3-2. Glossary JavaScript (glossary.js)
window.addEventListener('load', eventWindowLoaded, false);

function eventWindowLoaded() {
    if (audio_support()) {

function audio_support () {

function set_up_audio() {
    var epub_audio = document.getElementById("epub");
    var mobi_audio = document.getElementById("mobi");
    var pdf_audio = document.getElementById("pdf");
    // Add play button functionality
    var epub_play_button = document.getElementById("epub_button");
    var mobi_play_button = document.getElementById("mobi_button");
    var pdf_play_button = document.getElementById("pdf_button");
    epub_play_button.addEventListener("click", play_epub, false);
    mobi_play_button.addEventListener("click", play_mobi, false);
    pdf_play_button.addEventListener("click", play_pdf, false);
    function play_epub() {;
    function play_mobi() {;
    function play_pdf() {;

As we’ve seen in previous examples, event listeners are used to track when each term’s play button is clicked, and call the corresponding play_format function. The one piece of audio-specific code is the play() method (highlighted in bold above) called on each of the <audio> elements. As you’d expect, this triggers the playback of the audio.

Try loading the glossary in your browser to hear the terms spoken aloud in all their glory. You can also download the code and audio media from GitHub.

An HTML5 Video About HTML5 Canvas

Chapter 1 gave an overview of the HTML Canvas and many of its applications, but wouldn’t it have been cool if we had also included a video illustrating the Canvas in action? Well, now we know how to do that with the <video> element. Example 3-3 shows an HTML5 page that includes a clip from O’Reilly’s Client-side Graphics with HTML5 Canvases demoing a Canvas adaptation of the arcade game Asteroids.

Example 3-3. Native HTML5 video content (video.html)
<!DOCTYPE html>
<html lang="en">
<meta charset="UTF-8">
<title>HTML5 Video Illustrating HTML5 Canvas</title>
<h1>HTML5 Video Illustrating HTML5 Canvas</h1>
<p>Check out this excerpt from <a
href=""><em>Client-side Graphics
with HTML5 Canvases</em></a> showing the retro arcade game Asteroids implemented
using HTML5 Canvas.</p>
<video id="asteroids_video" width="480" height="270" controls="controls">
<source src="video/html5_asteroids.mp4" type="video/mp4"/>
<source src="video/html5_asteroids.ogg" type="video/ogg"/>
<em>(Sorry, &lt;video&gt; element not supported in your
  browser/ereader, so you will not be able to watch this video.)</em>

Note the width and height values specified in order to set the dimensions of the video, and the addition of controls attribute to give the user access to the traditional video-player buttons for controlling playback. For increased web browser compatibility, two video files are made available: one in MPEG-4 format and one in Ogg format.

If you’re planning to embed <video> content in EPUB, however, at this time, I’d recommend limiting video files to MP4 format, which is currently supported by both iBooks and NOOK Color/Tablet. Ogg files are not supported by either of these ereaders, and may interfere with video display.

Additionally, when embedding video in EPUB, you may want to optimize for file size, as large video files can quickly bloat your EPUB document—another good reason to stick with just one video format.

Take a look at the video clip in your browser. The code and video clips are available for download in GitHub.

EPUB 3 Media Overlays

The preceding examples are well suited to situations in which you want to intersperse audio and video throughout your content, but what if you want to incorporate more comprehensive functionality—say, provide an audio track for an entire book? For cases like these, EPUB 3 provides a specification for media overlay documents that allows you to sync audio with text:

Books featuring synchronized audio narration are found in mainstream e-books, educational tools and e-books formatted for persons with print disabilities. In EPUB 3, these types of books are created by using Media Overlay Documents to describe the timing for the pre-recorded audio narration and how it relates to the EPUB Content Document markup. The file format for Media Overlays is defined as a subset of SMIL, a W3C recommendation for representing synchronized multimedia information in XML.

The Media Overlays feature is designed to be transparent to EPUB Reading Systems that do snot support the feature. The inclusion of Media Overlays in an EPUB Publication has no impact on the ability of Media Overlay-unaware Reading Systems to render that Publication as a “regular” EPUB Publication.

Although future versions of this specification may incorporate support for video media (e.g., synchronized text/sign-language books), this version supports only synchronizing audio media with the EPUB Content Document.[1]

As stated above, media overlays are currently limited only to audio content (no support for syncing video to text at the present time), and furthermore, support for overlays is optional, so EPUB 3–compliant ereaders are allowed to ignore them.

To sync audio with text using media overlays, you make use of Media Overlay Documents, which are based on the Synchronized Multimedia Integration Language (SMIL) standard, an XML vocabulary for multimedia content. Media Overlay Documents are structured as a series of <par> elements that map text in the HTML content documents to the appropriate portion of corresponding audio files. For example:

<par id="hamlet_act_3_scene_1">
   <text src="act3_scene_1.xhtml#to_be_or_not_to_be"/>
   <audio src="to_be_or_not_to_be.mp3" clipBegin="0s clipEnd="45s"/>

Full details and sample Media Overlay Document structure can be found here in the spec. Details on how to incorporate Media Overlay documents into the EPUB 3 package document are also covered here.

HTML5 Audio/Video Compatibility in the Browser and Ereaders

HTML5 Audio/Video is currently supported across most major web browsers (including Firefox, Safari, Google Chrome, and even [finally!] Internet Exporer), but the specific audio/video formats supported vary from platform to platform, as the HTML5 spec itself is currently format-agnostic. Wikipedia has some nice tables tracking the current status of HTML5 audio and video support across the different browsers, but here’s a quick summary of audio formats you should supply to ensure good compatibility:

  • HTML5 Audio: WAV, MP3, Ogg

  • HTML5 Video: H.264 MPEG-4, Ogg

On EPUB ereaders, HTML5 audio/video support is more widespread than support for either Canvas or Geolocation. Here’s a rundown of formats supported by the major HTML5 Audio/Video–compliant ereaders:

iBooks (v.1.1.1 and higher) for iPhone/iPod/iPad

Video: MP4 (H.264)

Audio: MP3, AAC, WAV

NOOK Color

Video: “3gp, 3g2, mp4, m4v; MPEG-4 Simple Profile up to 854x480; H.263 up to 352x288; H.264 Baseline profile up to 854x480”[2]

Audio: MP3, WAV, Ogg

Kindle for iOS (no other Kindle platforms support <audio> or <video>)

Video: MP4 (H.264)

Audio: MP3

Adobe Digital Editions does not support HTML5 audio/video, but does support the embedding of Flash video in EPUB documents; see Liza Daly’s tutorial, “Using Flash video in ePub,” for details.

Bibliography/Additional Resources

If you’re interested in learning more about HTML5 Audio and Video, you may be interested in some of these resources:

HTML5 Media by Shelley Powers (O’Reilly Media)

A comprehensive look at incorporating audio/video content in HTML5 documents, converting media files to different formats, styling media with CSS, and advanced scripting with JavaScript,

Native Video in HTML5: An O’Reilly Breakdown by David Griffiths (O’Reilly Media)

Nice series of video tutorials on HTML5 video

HTML5 Canvas by Steve Fulton and Jeff Fulton (O’Reilly Media)

Chapter 6 of HTML5 Canvas, “Mixing HTML5 and Canvas,” shows how to “draw” video content on the Canvas, and take advantage of the Canvas API to manipulate video in exciting ways.

“Jaraoke” by Randall A. Gordon

A slick implementation of karaoke using HTML5 audio

jPlayer’s “HTML5 <audio> and Audio() Support Tester”

Test your Web browser’s audio format support.

[1] From 8 September 2011 draft of “EPUB Media Overlays 3.0” specification:

[2] From the NOOK Color FAQs: Note that these FAQs have been removed from the NOOK support site, but I was unable to find anything more up to date online.