Chapter 10. Storing Structured Data

You can learn more about working with records in Chapter 7 of Erlang Programming, Section 3.9 of Programming Erlang, Section 2.11 of Erlang and OTP in Action, and Chapter 9 of Learn You Some Erlang For Great Good!. ETS and DETS are in Chapter 10 of Erlang Programming, Chapter 15 of Programming Erlang, Section 2.14 and Chapter 6 of Erlang and OTP in Action, and Chapter 25 of Learn You Some Erlang For Great Good!. Mnesia is covered in Chapter 13 of Erlang Programming, Chapter 17 of Programming Erlang, Section 2.7 of Erlang and OTP in Action, and Chapter 29 of Learn You Some Erlang For Great Good!.

Étude 10-1: Using ETS

In honor of Erlang’s heritage as a language designed for telephony applications, this étude will set up a small database that keeps track of phone calls.

Part One

Create a file named phone_records.hrl that defines a record with these fields:

  • Phone number
  • Starting date (month, day, and year)
  • Starting time (hours, minutes, and seconds)
  • End date (month, day, and year)
  • End time (hours, minutes, and seconds)

You may name the record whatever you wish, and you may use any field names you wish.

Part Two

In a module named phone_ets, create an ETS table for phone calls by reading a file. The function that does this will be named setup/1, and its argument will be the name of the file containing the data.

Copy the following text into a file named call_data.csv and save the file in the same directory where you did part one.

650-555-3326,2013-03-10,09:01:47,2013-03-10,09:05:11
415-555-7871,2013-03-10,09:02:20,2013-03-10,09:05:09
729-555-8855,2013-03-10,09:00:55,2013-03-10,09:02:18
729-555-8855,2013-03-10,09:02:57,2013-03-10,09:03:56
213-555-0172,2013-03-10,09:00:59,2013-03-10,09:03:49
946-555-9760,2013-03-10,09:01:20,2013-03-10,09:03:10
301-555-0433,2013-03-10,09:01:44,2013-03-10,09:04:06
301-555-0433,2013-03-10,09:05:17,2013-03-10,09:07:53
301-555-0433,2013-03-10,09:10:05,2013-03-10,09:13:14
729-555-8855,2013-03-10,09:04:40,2013-03-10,09:07:29
213-555-0172,2013-03-10,09:04:26,2013-03-10,09:06:00
213-555-0172,2013-03-10,09:06:59,2013-03-10,09:10:35
946-555-9760,2013-03-10,09:03:36,2013-03-10,09:04:23
838-555-1099,2013-03-10,09:00:43,2013-03-10,09:02:44
650-555-3326,2013-03-10,09:05:48,2013-03-10,09:09:08
838-555-1099,2013-03-10,09:03:43,2013-03-10,09:06:26
838-555-1099,2013-03-10,09:07:54,2013-03-10,09:10:10
301-555-0433,2013-03-10,09:14:07,2013-03-10,09:15:08
415-555-7871,2013-03-10,09:06:15,2013-03-10,09:09:32
650-555-3326,2013-03-10,09:10:12,2013-03-10,09:13:09

So, how do you read a file? Take just the first three lines, and put them into a file called smallfile.csv, then do the following commands from erl

1> {ResultCode, InputFile} = file:open("smallfile.csv", [read]).
{ok,<0.33.0>}
2> io:get_line(InputFile, "").
"650-555-3326,2013-03-10,09:01:47,2013-03-10,09:05:11\n"
3> io:get_line(InputFile, "").
"415-555-7871,2013-03-10,09:02:20,2013-03-10,09:05:09\n"
4> io:get_line(InputFile, "").
"729-555-8855,2013-03-10,09:00:55,2013-03-10,09:02:18\n"
5> io:get_line(InputFile, "").
eof
6> file:open("nosuchfile", [read]).
{error,enoent}

In the preceding example, lines 1 through 5 show how to open a file and read it. You can tell you are at the end of file when you get an atom (eof) instead of a list (remember, Erlang strings are lists). Line 6 shows what happens if you try to open a file that doesn’t exist.

The phone number is the key for this data. Since there are multiple calls per phone number, you will need a bag type table. To get the individual items from each line, use re:split/2, much as you did in Étude 5-2.

Part Three

Write functions to summarize the number of minutes for a single phone number (summary/1) or for all phone numbers. (summary/0). These functions return a list of tuples in the form:

[{phoneNumber1, minutes]},{phoneNumber2, minutes}, …]

You could write your own code to do time and date calculations to figure out the duration of a phone call, but there’s a limit on how much you really want to re-invent the wheel, especially with something as complex as calendar calculations. Consider, for example, a call that begins on 31 December 2013 at 11:58:36 p.m. and ends on 1 January 2014 at 12:14:22 p.m. I don’t even want to think about calls that start on 28 February and go to the next day.

So, instead, use the calendar:datetime_to_gregorian_seconds/1 function to convert a date and time to the number of seconds since the year zero. (I swear I am not making this up.) The argument to this function is a tuple in the form:

{{year, month, day}, {hours, minutes, seconds}} %% for example
{{2013, 07, 14}, {14, 49, 21}}

Round up any number of seconds to the next minute for ech call. Thus, if a call lasts 4 minutes and 6 seconds, round it up to 5 minutes. Hint: add 59 to the total number of seconds before you div 60.

Now might be the time to rewrite part two so that your dates and times are stored in the appropriate format. That way, you do the conversion from string to tuple only once, instead of every time you ask for a summary.

Here is the sample output.

1> c(phone_ets).
{ok,phone_ets}
2> phone_ets:setup("call_data.csv").
ok
3> phone_ets:summary("415-555-7871").
[{"415-555-7871",7}]
4> phone_ets:summary().
[{"946-555-9760",3},
 {"415-555-7871",7},
 {"729-555-8855",6},
 {"301-555-0433",12},
 {"213-555-0172",9},
 {"650-555-3326",11}]

See a suggested solution in Appendix A.

Étude 10-2: Using Mnesia

I have good news and bad news. First, the bad news. Mnesia is not a relational detabase management system. If you try to use a query list comprehension to join three tables, Erlang will complain that joins with more than two tables are not efficient.

Now, the good news. While trying to find a way around this, I discovered something about query list comprehensions that is really pretty neat, and I’m happy to share it with you.

In this étude, you will use add a table of customer names and use Mnesia query list comprehensions to join data from those tables when producing a summary.

Part One

You will need to add a record for customers to phone_records.hrl. Its fields will be:

  • Phone Number (this is the key)
  • Customer’s last name
  • Customer’s first name
  • Customer’s middle name
  • Rate paid per minute (float)

Again, you may name the record whatever you wish, and you may use any field names you wish.

Part Two

In a module named phone_mnesia, create the Mnesia tables for the two files. The function that does this will be named setup/2, and its arguments will be the names of the file containing the data.

Use the phone call data from Étude 10-1, and use this data for the customers. Put it in a file named customer_data.csv or whatever other name you wish.

213-555-0172,Nakamura,Noriko,,0.12
301-555-0433,Ekberg,Erik,Engvald,0.07
415-555-7871,Alvarez,Alberto,Agulto,0.15
650-555-3326,Girard,Georges,Gaston,0.10
729-555-8855,Tran,Truong,Thai,0.09
838-555-1099,Smith,Samuel,Steven,0.10
946-555-9760,Bobrov,Bogdan,Borisovitch,0.14

You could write two functions that all open a file, read data, split it into fields, write the data to the Mnesia table, and then keep going until end-of-file. These would share a lot of common code. Instead, try writing just one function that does the reading, and pass a higher-order function to it to do the appropriate "split-and-write" operation.

When I solved this problem, my fill_table/5 function took these arguments:

  • The name of the table (an atom)
  • The name of the file to read (a string)
  • The function that adds the data (a higher-order fun)
  • The record_info for the field
  • The type of table. The phone call data is a bag, the customer data is a set.

Part Three

Write a function named summary/3 that takes a last name, first name, and middle name. It produces a tuple that contains the person’s phone number, total number of minutes, and total cost for those minutes.

Here is some sample output.

1> c(phone_mnesia).
{ok,phone_mnesia}
2> phone_mnesia:setup("call_data.csv", "customer_data.csv").
{atomic,ok}
3> phone_mnesia:summary("Smith", "Samuel", "Steven").
[{"838-555-1099",9,0.9}]
4> phone_mnesia:summary("Nakamura", "Noriko", "").
[{"213-555-0172",9,1.08}]

As promised, here’s the good news about query list comprehensions. In this module, you need to access the customer table to match the phone number to the name when collecting the calls for the customer. You also need to access the customer table in order to access the customer’s rate per minute. You don’t want to have to write the specification for the guards on the customer table twice.

As Introducing Erlang notes, "you can use the qlc:q function to hold a list comprehension and the qlc:e function to process it." Specifically, the qlc:q function returns a query handle which you can evaluate and which you can also use in place of a list name in a query list comprehension.

Here’s an example. Let’s say you have tables of people and their pets. In the pet table, the owner_id references the id_number of someone in the person table.

-record(person,
  {id_number, name, age, gender, city, amount_owed}).
-record(animal,
  {id_number, name, species, gender, owner_id}).

You could do a query like this to find a specific set of people, and then to find information about their pets:

get_info() ->
  People = mnesia:transaction(
    fun() -> qlc:e(
      qlc:q( [ P ||
        P <- mnesia:table(person),
        P#person.age >= 21,
        P#person.gender == "M",
        P#person.city == "Podunk"]
        )
      )
    end
  ),

  Pets = mnesia:transaction(
    fun() -> qlc:e(
      qlc:q( [{A#animal.name, A#animal.species, P#person.name} ||
        P <- mnesia:table(person),
        P#person.age >= 21,
        P#person.gender == "M",
        P#person.city == "Podunk",
        A <- mnesia:table(animal),
        A#animal.owner_id == P#person.id_number])
      )
    end
  ),
  [People, Pets].

To avoid duplicating the list and guards for the person table, you can make a query list handle for that query and use it again in the animal search. Note that you don’t have to be in a transacation to create a query handle, but you must be in a transaction to process it.

get_info_easier() ->

  %% "Pre-process" the list comprehension for finding people

  QHandle = qlc:q( [ P ||
    P <- mnesia:table(person),
    P#person.age >= 21,
    P#person.gender == "M",
    P#person.city == "Podunk"]
  ),

  %% Evaluate it to retrieve the people you want

  People = mnesia:transaction(
    fun() -> qlc:e( QHandle ) end
  ),

  %% And use the handle again when retrieving
  %% information about their pets

  Pets = mnesia:transaction(
    fun() -> qlc:e(
      qlc:q( [{A#animal.name, A#animal.species, P#person.name} ||
        P <- QHandle,
        A <- mnesia:table(animal),
        A#animal.owner_id == P#person.id_number])
      )
    end
  ),
  [People, Pets].

See a suggested solution in Appendix A.