# Chapter 1. Working with Strings and Regular Expressions

## Introduction to Strings as Object and Literal

The JavaScript string is the most fundamental data type in JavaScript. Though you may get numeric values from web page forms, the values are retrieved as strings, which you then have to convert into numeric values.

Strings are also used as parameters when invoking server-side application calls through Ajax, as well as forming the basic serialization format of every JavaScript object. One of the methods that all JavaScript objects share is toString, which returns a string containing the serialized format of the object.

A JavaScript string can be both a primitive data type or a String object. As a primitive type, it joins with four other JavaScript primitive types: number, Boolean (true or false), null (no value), and undefined (unknown). In addition, as a primitive data type, strings are also JavaScript literals: a collection that includes numbers (as either floating point or integer), the literal format for arrays, objects, and regular expressions, as well as numbers and Booleans.

String literals are created just by quoting some text in single or double quotes. It doesn’t matter what type of quote you use, though you may want to adjust what you use based on the string content. If the string contains a single quote, you’ll want to use double quotes:

var str = "This isn't a String object";

Or you can use single quotes, and escape the contained single quote:

var str = 'This isn\'t a String object":

You create a String object using the new operator and the String constructor:

var str = "this is a string literal";

var strObj = new String("this is a string object");

If you use tools such as JSLint, you will get a warning when you use the String constructor.

The reason why is that string literals and objects are not the same thing, but are often treated the same. However, if you try to use strict equality with the two different types, the equality expression will fail because the data types differ. Still, it is through the String object that we have access to various functions.

When we invoke a function on a string literal, what’s really happening is that the literal is wrapped in an object, the function call is processed, and then the temporary String object is discarded. From a performance perspective, then, it makes more sense to use String objects if we know we’re going to use String functions.

## Concatenating Two or More Variables

### Problem

You want to concatenate two or more variables into a single string.

### Solution

Concatenate the variables using the addition (+) operator:

var string1 = "This is a ";
var string2 = "number: ";
var number1 = 5;

// creates a new string with "This is a number: 5"
var stringResult = string1 + string2 + number1;

### Discussion

The addition operator (+) is typically used to add numbers together:

var newValue = 1 + 3; // result is 4

In JavaScript, though, the addition operator is overloaded, which means it can be used for multiple data types, including strings. When used with strings, the results are concatenated.

var string3 = string1 + string2;

or you can add multiple strings:

var string1 = "This";
var string2 = "is";
var string3 = "a";
var string4 = "test";
var stringResult = string1 + " " + string2 + " " +
string3 + " " + string4; // result is "This is a test"

You can also concatenate variables of different data types into a string, as long as one of the variables (or values) is a string. For instance, the following code snippet adds the numbers numerically:

var result = 4 + 5 + 3; // 12

But adding in a string changes everything:

var result = "" + 5 + 4 + 3; // string with "543"

As in real estate, though, location is everything. If the string is concatenated after the numbers, the results can be surprising:

var result = 5 + 4 + 3; // string with "12"

If the + operator is used with numeric values, it adds the values. It’s only when it encounters the different data type (the string) is the result converted into a string. Based on this information, what do you think the following result would contain?

var result = 5 + 4 + "" + 3;

The string coercion works with other data types, such as booleans, and dates:

var dt = new Date(); // creates date object with current date and time

var result = "Today's date is " + dt; // string with text and date

The reason why this form of concatenation works is that all standard JavaScript objects inherit a method, toString that returns a string representation of the object’s contents. When "adding" the string to the object, the toString method on the object is invoked and the results concatenated to the string.

There is a shortcut to concatenating strings, and that’s the JavaScript shorthand assignment operator (+=). The following code snippet, which uses this operator:

var oldValue = "apples";
oldValue += " and oranges"; // string now has "apples and oranges"

is equivalent to:

var oldValue = "apples";
oldValue = oldValue + " and oranges";

The shorthand assignment operator works with strings by concatenating the string on the right side of the operator to the end of the string on the left.

There is a built-in String method that can concatenate multiple strings: concat. It takes one or more string parameters, each of which are appended to the end of the string object:

// returns "This is a string"
var nwStrng = "".concat("This ","is ","a ","string");

The concat method can be a simpler way to generate a string from multiple values, such as generating a string from several form fields. However, the use of the addition operator is the more commonly used approach.

## Conditionally Comparing Strings

### Problem

You want to compare two strings to see if they’re the same.

### Solution

Use the equality operator (==) within a conditional test:

var strName = prompt("What's your name?", "");

if (strName == "Shelley") {
} else {
}

If you want to ensure that two variables are of the same type as well as having the same content, use the strict equality operator (===):

// true only if both variables are string literal
// or string objects, and the string is exactly the same
if (someString === anotherString) {
...
}

### Discussion

Two strings can be compared using the equality operator (==). When used within a conditional statement, a block of code is run if the test evaluates to true (the strings are equal):

if (strName == "Shelley") {
}

If the strings are not equal, the first statement following the conditional statement block is processed. If an if…else conditional statement is used, the block of code following the else keyword is the one that’s processed:

if (strName == "Shelley") {
...
} else {
}

There are factors that can influence the success of the string comparison. For instance, strings have case, and can consist of uppercase characters, lowercase characters, or a combination of both. Unless case is an issue, you’ll most likely want to convert the string to all lowercase or uppercase, using the built-in String methods toLowerCase and toUpperCase, before making the comparison, as shown in the following code:

var strName = prompt("What's your name?", "");

if (strName.toUpperCase () == "SHELLEY") {
} else {
}

Note that the toUpperCase method (and toLowerCase) do not take any parameters.

In “Concatenating Two or More Variables”, I discussed that data type conversion occurs automatically when concatenating different data types with a string. This same type of data type conversion also occurs with the equality operator if one value is a string. In the following, the number 10.00 is converted into the string 10, and then used in the comparison:

var numVal = 10.00;
if (numVal == "10") alert ("The value is ten"); succeeds

There may be times, though, when you don’t want automatic data conversion to occur—when you want the comparison to fail if the values are of different data types. For instance, if one value is a string literal and the other is a String object, you might want the comparison to fail because the two variables are of different data types, regardless of their perspective values. In this case, you’ll want to use a different equality operator, the strict equality operator (===):

var  strObject = new  String("Boston");
var  strLiteral  =  "Boston";

if (strObject  ==  strLiteral) // this comparison succeeds

...

if (strObject === strLiteral) // fails - different data types

The comparison fails if the two variables being compared are different data types, even though their primitive string values are the same.

Sometimes, you might want to specifically test that two strings are not alike, rather than whether they are alike. The operators to use then are the inequality operator (!=) and strict inequality operator (!==). Unlike the equality operators, the comparison only succeeds if the two values are not equal:

var strnOne  =  "one";
var strnTwo  =  "two";
if (strnOne != strnTwo) // true

The strict inequality operator returns true if the strings are not the same value or the data type of the two operands (values on either side of the operator) is different:

var strObject = new String("Boston");
var strLiteral = "Boston";
if (strObject !== strLiteral) // true, data types differ

Comparison operators work numerically with numbers, but lexically with strings. For instance, the value dog would be lexically greater than cat, because the letter d in dog occurs later in the alphabet than the letter c in cat:

var sOne = "cat";
var sTwo = "dog"
if (sOne > sTwo // false, because "cat" is lexically less than "dog"

If two string literals only vary based on case, the uppercase characters are lexically greater than the lowercase letter:

var sOne  = "Cat";
var sTwo  = "cat";
if (sOne >=  sTwo) // true, because 'C' is lexically greater than 'c'

There is no strict greater than or strict less than operators, so it makes no difference if the data type of the operands differs:

var sOne = new String("cat");
var sTwo = "cat";
if (sOne <= sTwo) // same literal, so true

If you’re comparing strings that contain numbers, you’ll most likely want to convert the values to numbers first. The reason why is that lexically, a value such as "12" is greater than "111", but numerically, the number 12 is less than 111.

var a = '12';
var b = '111';

if (a > b) {
console.log('greater');
} else {
console.log('lesser');
}

if (parseInt(a,10) > parseInt(b,10)) {
console.log('greater');
} else {
console.log('lesser');
}

By not converting the strings to numbers, you might get unexpected results. Of course, by converting the strings to numbers, you might get unexpected results, too. That’s just part of the fun with developing in JavaScript.

## Finding a Substring in a String

### Problem

You want to find out if a substring (a specific sequence of characters) exists in a string.

### Solution

Use the String object’s built-in indexOf method to find the position of the substring, if it exists:

var testValue = "This is the Cookbook's test string";
var subsValue = "Cookbook";

var iValue = testValue.indexOf(subsValue); // returns 12

if (iValue != -1) // succeeds, because substring exists

### Discussion

The String indexOf method returns a number representing the index, or position of the first character of the substring, with 0 being the index position of the first character in the string.

To test if the substring doesn’t exist, you can compare the returned value to –1, which is the value returned if the substring isn’t found:

if (iValue != -1) // true if substring found

The indexOf method takes two parameters: the substring, and an optional second parameter, an index value of where to begin a search:

var tstString = "This apple is my apple";
var iValue = tstString.indexOf("apple", 10); // returns 17, index of second substring

The indexOf method works from left to right, but sometimes you might want to find the index of a substring by searching within the string from right to left. There’s another String method, lastIndexOf, which returns the index position of the last occurrence of a substring within a string:

var txtString = "This apple is my apple";
var iValue = tstString.lastIndexOf("apple"); // returns 17

Like indexOf, lastIndexOf also takes an optional second parameter, which is an index value of where to start the search, counted from the right:

"This apple is my apple".lastIndexOf("apple"); // returns 17
"This apple is my apple".lastIndexOf("apple",12); // returns 5
"This apple is my apple".lastIndexOf("apple", 3); // returns -1

Notice that the value returned from lastIndexOf changes based on the starting position, as counted from the string’s right.

It’s odd to see a String method called directly on quoted text, but in JavaScript, there’s no difference in calling the method on a string literal, directly, or on a string variable—at least, not from the perspective of the developer. There is a difference, of course, to the JavaScript engine.

## Breaking a Keyword String into Separate Keywords

### Problem

You have a string with keywords, separated by commas. You want to break the string into an array of separate keywords, and then print the keywords out with a keyword label.

### Solution

Use the String split method to split the string on the commas. Loop through the array, printing out the separate values. Example 1-1 shows a complete web page demonstrating this approach. The keywords are provided by the web page reader, via a prompt window, and are then processed and printed out to the web page.

Example 1-1. Demonstrating use of String split to get keyword list
<!DOCTYPE html>
<html>
<title>Example 1-1</title>
<script type="text/javascript">

// get keyword list
var keywordList = prompt("Enter keywords, separated by commas","");

// use split to create array of keywords
var  arrayList = keywordList.split(",");

// build result HTML
var resultString = "";
for (var i = 0; i < arrayList.length; i++) {
resultString+="keyword: " + arrayList[i] + "<br />";
}

// print out to page
var  blk = document.getElementById("result");
blk.innerHTML = resultString;
}

</script>
<body>
<div id="result">
</div>
</body>
</html>

### Discussion

The String split method takes two parameters: a required parameter with the character representing the separator for the split method; the second parameter (optional) is a number representing a count of the number of splits to make. In Example 1-1, the separator is a comma (,), and no second parameter is provided. An example of using the second parameter is the following, where the use of the second parameter would generate an array with only two entries:

var strList = "keyword1,keyword2,keyword3,keyword4";
var arrayList = strList.split(",",2); // results in two element array

Not specifying the second parameter will split on every occurrence of the separator found:

var arrayList = strList.split(","); // four element array

Here’s an interesting use of split: if you want to split a string on every character, specify the empty string ('') or ("") as the separator:

var arrayList = strList.split("");

You can also use a regular expression as the parameter to split, though this can be a little tricky. For instance, to find the same sentence list as returned from the example code in the solution, you could use a couple of regular expressions:

var sentence = "This is one sentence. This is a sentence with a list of items:
cherries, oranges, apples, bananas.";
var val = sentence.split(/:/);
alert(val[1].split(/\./)[0]);

The regular expression looks for a colon first, which is then used for the first split. The second split uses a regular expression on the resulting value from the first split, to look for the period. The list is then in the first array element of this result.

Tricky, and a little hard to get your head around, but using regular expressions with split could be a handy option when nothing else works.

## Inserting Special Characters

### Problem

You want to insert a special character, such as a line feed, into a string.

### Solution

Use one of the escape sequences in the string. For instance, to add the copyright symbol into a block of text to be added to the page, use the escape sequence \u00A9:

var resultString = "<p>This page \u00A9 Shelley Powers </p>";

// print out to page
var blk = document.getElementById("result");
blk.innerHTML = resultString;

### Discussion

The escape sequences in JavaScript all begin with the backslash character, (\). This character lets the application processing the string know that what follows is a sequence of characters that need special handling.

Table 1-1 lists the other escape sequences.

Table 1-1. Escape sequences
 Sequence Character \' Single quote \" Double quote \|Backslash \b Backspace \f Form feed \n Newline \r Carriage return \t Horizontal tab _ddd_ Octal sequence (3 digits: ddd) \x_dd_ Hexadecimal sequence (2 digits: dd) \u_dddd_

The last three escape sequences in Table 1-1 are patterns, where providing different values will result in differing escape sequences.

The first several escape sequences listed in Table 1-1 can also be represented as a Unicode escape sequence. For instance, the horizontal tab (\t), can also be represented as the Unicode escape sequence, \u0009. Of course, if the user agent disregards the special character, as browsers do with the horizontal tab, the use is moot.

One of the most common uses of escape sequences is to include double or single quotes within strings delimited by the same character:

var newString = 'You can\'t use single quotes in a string surrounded by single
quotes';

## Trimming Whitespace from the Ends of a String

### Problem

You want to trim the whitespace around a string that you’ve accessed from a form element.

### Solution

Use the String trim method. In the following code snippet, text from a textarea element are split based on new line character (\n), and the resulting lines are trimmed for trailing and leading whitespace before being concatenated into a new string.

var txtBox = document.getElementById("test");
var lines = txtBox.value.split("\n");
var resultString = "";

for (var i = 0; i < lines.length; i++) {
var strng = lines[i].trim();
resultString += strng + "-";
}

### Discussion

Prior to the release of ECMAScript 5, you had to use regular expressions and the String replace method to trim the unwanted whitespace from around a string. Now, trimming a string is as simple as calling the trim method.

Where things can get complicated is if you want to trim only the leading or following white space. Here is where the browsers differ.

Microsoft supports ltrim and rtrim for trimming white space from the left or right of the string, while other browsers support trimLeft and trimRight. There is no standardization in ECMAScript on trimming white space from the left or the right, only.

The best approach for left or right trimming only is to define the methods directly, making use of regular expressions. What you name them is up to you. You can define the custom methods directly on the String object, using the prototype, though you need to be aware that using the same name as one already defined for the object in the browser will override the standard implementation:

String.prototype.ltrim = function() {
return this.replace(/^\s+/,"");
}
String.prototype.rtrim = function() {
return this.replace(/\s+$/,""); } Or you can create stand alone functions: function trimLeft(str) { return str.replace(/^\s+/,""); } function trimRight(str) { return str.replace(/\s+$/,"");
}

Speaking of the use of regular expressions, I cover these in the last sections of this chapter.

## Introduction to Regular Expressions and the RegExp Object

Regular expressions are search patterns that can be used to find text that matches a given pattern. For instance, we can look for a substring Cookbook within a longer string using the indexOf method:

var testValue = "This is the Cookbook's test string";
var subsValue = "Cookbook";

var iValue = testValue.indexOf(subsValue); // returns value of 12, index of substring

This code snippet works because we are looking for an exact match, which is all that indexOf supports.

What if we want a more general search? For instance, we want to search for the words Cook and Book, in strings such as Joe’s Cooking Book or JavaScript Cookbook?

When we’re looking for strings that match a pattern rather than an exact substring, we need to use regular expressions.

JavaScript provides for regular expression literals, delimited with backslashes:

var re = /regular expression/;

The regular expression pattern is contained between opening and closing forward slashes. Note that this pattern is not a string: you do not want to use single or double quotes around the pattern, unless the quotes themselves are part of the pattern to match.

Regular expressions are made up of characters, either alone or in combination with special characters, that provide for more complex matching. For instance, the following is a regular expression for a pattern that matches against a string that contains the word Cook and the word Book in that order, and separated by one or more whitespace characters:

var re = /Cook\s+Book/;

The special characters in this example are the backslash character (\), which has two purposes: either it’s used with a regular character, to designate that it’s a special character; or it’s used with a special character, such as the plus sign (+), to designate that the character should be treated literally. In this case, the backslash is used with s, which transforms the letter s to a special character designating a whitespace character, such as a space, tab, line feed, or form feed. The \s special character is followed by the plus sign, \s+, which is a signal to match the preceding character (in this example, a whitespace character) one or more times. This regular expression would work with the following:

Cook Book

It would also work with the following:

Cook     Book

It would not work with:

CookBook

It doesn’t matter how much whitespace is between Cook and Book, because of the use of \s+. However, the use of the plus sign does require at least one whitespace character.

Table 1-2 shows the most commonly used special characters in JavaScript applications.

Table 1-2. Regular expression special characters

## Finding and Highlighting All Instances of a Pattern

### Problem

You want to find all instances of a pattern within a string.

### Solution

Use the RegExp exec method and the global flag (g) in a loop to locate all instances of a pattern, such as any word that begins with t and ends with e, with any number of characters in between:

var searchString = "Now is the time and this is the time and
that is the time and we all have the time and no
one has the whatever";
var pattern = /t\w*e/g;
var matchArray;

var str = "<p>";
while((matchArray = pattern.exec(searchString)) !== null) {
str+="at " + matchArray.index + " we found " + matchArray[0] + "<br />";
}
str+="</p>";
document.getElementById("results").innerHTML=str;

### Discussion

The RegExp exec method executes the regular expression, returning null if a match is not found or an array of information if a match is found. Included in the returned array is the actual matched value, the index in the string where the match is found, any parenthetical substring matches, and the original string.

index
The index of the located match
input
The original input string

$0$ or accessing array directly:: The matched value

$1$,…,$n$
Parenthetical substring matches

In the solution, the index where the match was found is printed out in addition to the matched value.

The solution also uses the global flag (g). This triggers the RegExp object to preserve the location of each match, and to begin the search after the previously discovered match. When used in a loop, we can find all instances where the pattern matches the string. In the solution, the following are printed out:

at 7 we found the
at 11 we found time
at 28 we found the
at 32 we found time
at 49 we found the
at 53 we found time
at 74 we found the
at 78 we found time
at 98 we found the
at 105 we found teve

Both time and the match the pattern, but so does teve from whatever.

Let’s look at the nature of global searching in action. In Example 1-2, a web page is created with a textarea and an input text box for accessing both a search string and a pattern. The pattern is used to create a RegExp object, which is then applied against the string. A result string is built, consisting of both the unmatched text and the matched text, except the matched text is surrounded by a span element, with a CSS class used to highlight the text. The resulting string is then inserted into the page, using the innerHTML for a div element.

Example 1-2. Using exec and global flag to search and highlight all matches in a text string
<!DOCTYPE html>
<html>
<title>Searching for strings</title>
<style type="text/css">
.found
{
background-color: #ff0;
}
</style>
<script type="text/javascript">

function doSearch() {
// get pattern
var pattern = document.getElementById("pattern").value;
var re = new RegExp(pattern,"g");

// get string
var searchString = document.getElementById("incoming").value;

var matchArray;
var resultString = "<pre>";
var first=0; var last=0;

// find each match
while((matchArray = re.exec(searchString)) !== null) {
last = matchArray.index;
// get all of string up to match, concatenate
resultString += searchString.substring(first, last);

resultString += "<span class='found'>" + matchArray[0] + "</span>";
first = re.lastIndex;
}

// finish off string
resultString += searchString.substring(first,searchString.length);
resultString += "</pre>";

// insert into page
document.getElementById("searchResult").innerHTML = resultString;
}

</script>
<body>
<textarea id="incoming" cols="150" rows="10">
</textarea>
<p>
Search pattern: <input id="pattern" type="text" /></p>
<button type="button" onclick="doSearch()">Search for pattern</button>
<div id="searchResult"></div>
</body>
</html>

Figure 1-1 shows the application in action on William Wordsworth’s poem, The Kitten and the Falling Leaves, after a search for the following pattern:

lea(f|ve)

All instances of leaves are highlighted.

The bar (|) is a conditional test, and will match a word based on the value on either side of the bar. So a word like leaf matches, as well as a word like leave, but not a word like leap.

You can access the last index found through the RegExp’s lastIndex property. The lastIndex property is handy if you want to track both the first and last matches.

“Replacing Patterns with New Strings” describes another way to do a standard find-and-replace behavior.

## Replacing Patterns with New Strings

### Problem

You want to replace all matched substrings with a new substring.

### Solution

Use the String object’s replace method, with a regular expression:

var searchString = "Now is the time, this is the time";
var re = /t\w{2}e/g;
var replacement = searchString.replace(re, "place");
console.log(replacement); // Now is the place, this is the place

### Discussion

In Example 1-2 in “Finding and Highlighting All Instances of a Pattern”, we used the RegExp global flag (g) in order to track each occurrence of the regular expression. Each match was highlighted using a span element and CSS.

## Searching for Special Characters

### Problem

We need to search for regular expression special characters themselves.

### Solution

Use the backslash to escape the pattern-matching character:

var re = /\\d/;
var pattern = "\\d{4}";
var pattern2 = pattern.replace(re,"\\D");

### Discussion

In the solution, a regular expression is created that’s equivalent to the special character, \d, used to match on any number. The pattern is, itself, escaped, in the string that needs to be searched. The number special character is then replaced with the special character that searches for anything but a number, \D.

Sounds a little convoluted, so I’ll demonstrate with a longer application. Example 1-3 shows a small application that first searches for a sequence of four numbers in a string, and replaces them with four asterisks (***\*). Next, the application will modify the search pattern, by replacing the \d with \D, and then running it against the same string.

Example 1-3. Regular expression matching on regular expression characters
<!DOCTYPE html>
<html>
<title>Replacement Insanity</title>
<body>
<p>content</p>
<script>

// search for \d
var re = /\\d/;
var pattern = "\\d{4}";
var str = "I want 1111 to find 3334 certain 5343 things 8484";
var re2 = new RegExp(pattern,"g");
var str1 = str.replace(re2,"****");
console.log(str1);
var pattern2 = pattern.replace(re,"\\D");
var re3 = new RegExp(pattern2,"g");
var str2 = str.replace(re3, "****");
console.log(str2);

</script>
</body>
</html>

Here is the original string:

I want 1111 to find 3334 certain 5343 things 8484

The first string printed out is the original string with the numbers converted into asterisks:

I want **** to find **** certain **** things ****

The second string printed out is the same string, but after blocks of four non-sequential numeric characters have been converted into asterisks:

****nt 1111******** 3334******** 5343********8484

Though this example is short, it demonstrates some of the challenges when you want to search on regular expression characters themselves. Not all characters or white space have been converted, because of the requirement for four sequential non-numberic characters.