A Concise Guide to Erlang
Copyright ©2010, David Matuszek

About Erlang

Erlang is an expression-oriented, single-assignment, garbage-collected, purely functional language. There are no loops, so recursion is heavily used.

Erlang is quite a small language. It is of interest primarily because of its approach to concurrency, using Actors. Actors have subsequently been incorporated into other languages, most importantly Clojure and Scala. Erlang is most suitable for building extremely reliable, fault-tolerant systems that do not need to be shut down in order to be upgraded. Its extremely convenient bit-manipulation makes it an excellent language for low-level communications.

Running Erlang

As with many languages, Erlang can be run in a REPL (Read-Eval-Print-Loop) "shell." Short pieces of code can be tested directly in the shell. To start the shell, enter erl at the command line. Within the shell,

Compared to the REPL for many other languages, Erlang's REPL is quite limited:

Directives

Every Erlang program should begin with a module directive, of the form
          -module(filename).
and saved in a file with the name filename.erl.

To provide functions defined in this file to other programs, use
          -export([function1/arity1, ..., functionN/arityN]).
where the "arity" is the number of parameters expected by the function.

To use functions defined in another file, use
          -import(filename, [function1/arity1, ..., functionN/arityN]).
where the "arity" is as above. Imported methods may be called without a filename: prefix.

To define a record:
          -record(Name, {Key1 = Default1, ..., KeyN = DefaultN}).
where the Keys are atoms; the default values are optional. Records may be defined in Erlang source files or in files with the extension .hrl. , but may not be defined in the REPL.

To specify compiler options:
          -compile(Options).
The export_all option is useful for debugging, but should be avoided in production code.

Documentation

Comments begin with a % character and continue to the end of the line.

Erlang used EDoc (inspired by Javadoc). EDoc comments go before a module or a function. Some of the tags that can be used for a module are @author, @copyright, @deprecated, @doc (followed by XHTML), and @version. Some of the tags that can be used for a function are @deprecated, @doc (followed by XHTML), @private, and @spec.

Variables

Erlang is a single-assignment language. That is, once a variable has been given a value, it cannot be given a different value. In this sense it is like algebra rather than like most conventional programming languages.

Variables must begin with a capital letter or an underscore, and are composed of letters, digits, and underscores.

The special variable _ is a "don't care" variable--it does not retain its value. It is as if every occurrance of _ is a new, different variable.

Erlang issues a warning if a variable occurs only once in a function. To eliminate this warning, use an underscore as the first character of the variable name.

Data types

Erlang has:

Type tests and conversions

To test for or convert "strings," recall that strings are actually lists of integers.

Type tests

is_atom(X) is_function(X) is_number(X) is_tuple(X)
is_binary(X) is_function(X, N) is_pid(X) is_record(X)
is_constant(X) is_integer(X) is_port(X) is_record(X, Tag)
is_float(X) is_list(X) is_reference(X) is_record(X, Tag, N)

Type conversions

atom_to_list(Atom) float_to_list(Float) list_to_binary(List) round(Float)
binary_to_list(Binary) integer_to_list(Integer) list_to_integer(List) trunc(Float)
float(Integer) list_to_atom(List) list_to_tuple(List)  
float(List) list_to_existing_atom(List) tuple_to_list(Tuple)  

Operations

Arithmetic operations
Operation Description
+X unary plus
-X unary minus
X * Y

multplication

X / Y division (yields a float)
X div Y integer division
X rem Y remainder
X + Y addition
X - Y subtraction
Term Comparisons
Comparison Description
X < Y less than
X =< Y equal or less than (not X <= Y !)
X == Y

equal and not equal; use only for comparing integers and floats

X /= Y
X >= Y greater or equal
X > Y greater
X =:= Y equal/identical to
X =/= Y unequal/not identical to
Any term may be compared with any other term. The ordering is: number < atom < reference < fun < port < pid < tuple < list < binary.
Boolean operations
Operation Description
not X not
X and Y and
X or Y

or

X xor Y exclusive or
X andalso Y short-circuit and
X orelse Y short-circuit or
Bitwise operations
Operation Description
bnot X bitwise not
X band Y bitwise and
X bor Y

bitwise or

X bxor Y bitwise exclusive or
X bsl N bitshift left by N
X bsr N bitshift right by N

Pattern matching

The pattern matching expression

Pattern matching is the fundamental operation in Erlang. A simple pattern matching expression looks like an assignment statement in other languages:
     pattern = expression.
This says to evaluate the expression, and try to match the result to the pattern. In this context, it is an error if the pattern match does not succeed. Note that every statement in Erlang ends with a period.

In general, pattern matching succeeds in the following cases:

Examples

Variable = expression.
The expression is evaluated.
[H|T] = expression.
If the value of the expression is a nonempty list, H is matched against the head of the list (the first element) and T is matched against the tail of the list (the remaining elements). If either fails to match, or if the expression does not evaluate to a nonempty list, the pattern match fails. Note that H and T may be variables, literals, or expressions.
[H1, H2, ..., HN|T] = expression.
H1, H2, ..., HN are matched against the first N elements of the list, and T is matched against the remaining elements. If any part fails to match, the pattern match fails.
{A, B, C} = {X, Y, Z}.
The expressions on the right are evaluated and compared, in order, against the patterns on the left (that is, A=X, B=Y, C=Z). In order for the pattern match to succeed, the tuples must be the same length, and corresponding parts must match.
#Name{Key = Variable, ..., Key = Variable} = Record.
The Variables are matched against the values of the named Keys in the Record.
<<Pattern:Size, ..., Pattern:Size>> = Binary.
The values in the Binary are unpacked into their component parts and matched against the Patterns.
 

Case expressions

The case expression uses pattern matching, and has the following syntax:

case Expression of
Pattern1 [when Guard1] -> Expression_sequence1;
Pattern2 [when Guard2] -> Expression_sequence2;
...
PatternN [when GuardN] -> Expression_sequenceN
end

The brackets indicate that the when part (which is just a condition) is optional. The expression is evaluated, and the patterns are tried, in order. When a matching pattern is found (and whose associated guard, if present, is true), the corresponding expression sequence is evaluated. The value of an expression sequence is the value of the last expression, and that becomes the value of the case.

If expressions

The if expression is like a case expression without the pattern matching.

if
Guard1 ->
Expression_sequence1;
Guard2 ->
Expression_sequence2;
...
GuardN ->
Expression_sequenceN
end

The value of the if expression is the value of the expression sequence that is chosen. The value of an expression sequence is the value of the last expression executed. It is an error if no guard succeeds; hence, it is common to use true as the last guard.

Guards

Guards may not have side effects. To ensure this, user-defined functions are not allowed in guards. Things that may be uses are: type tests, boolean operators, bitwise operators, arithmetic operators, relational operators, and the following BIFs (Built In Functions):

abs(Number) hd(List) node(X) size(TupleOrBinary)
element(Integer, Tuple) length(List) round(Number) trunc(Number)
float(Number) node() self() tl(List)

Defining functions

A function is a value, or first-class object. That means it can be assigned to a variable, or given as an argument to a function, or returned as the value of a function.

Named functions

The syntax for a named function is a series of one or more clauses:

name(Patterns1) -> Body1;
name(Patterns2) -> Body2;
...
name(PatternsN) -> BodyN.
where

Recursion

Recursion is when a function calls itself, either directly (f calls f) or indirectly (f calls g, which calls h, ..., which calls f). Any program which uses a loop can be rewritten to use recursion, and vice versa. Erlang has no loops, therefore recursion is used heavily.

Here is one way to write the equivalent of a loop in Erlang:

myFunction(args1) ->
args2 = SomeExpression(args1);
myFunction(args2).

Tail recursion is when the recursive call is the very last thing done in the function. As an example, the usual definition of the factorial function,
    factorial(0) -> 1;
    factorial(N) -> N * factorial(N - 1).

is not tail recursive, because a multiplication is performed after the recursive call.

In general, each recursive call adds information to an internal stack; very deep recursions can cause Erlang to run out of memory. Tail recursion is desirable because the compiler can easily change a tail recursion into a loop, which does not add information to the stack, and therefore does not cause memory problems.

Functions that are not tail recursive (such as factorial) can usually be rewritten as tail recursive functions, with the aid of a helper function. As with many optimizations, this is not recommended until proven necessary, because the resultant code is harder to read and understand.

Anonymous functions

The syntax for an anonymous function is

fun(Patterns1) -> Body1;
(Patterns2) -> Body2;
...
(PatternsN) -> BodyN
end

Functions as first-class objects

Functions are values. That is, they may be assigned to variables, passed as arguments to functions, and returned as the result of functions.

An anonymous function may be used as a literal value. A named function may be referred to by using the syntax fun FunctionName/Arity.

Lists

A list literal can be written as a bracketed, comma-separated list of values. The values may be of different types. Example: [5, "abc", [3.2, {a, <<255>>}].

A list comprension has the syntax [Expression || GeneratorGuardOrGenerator, ..., GuardOrGenerator]
where

Example list comprehension:
     N = [1, 2, 3, 4, 5].
     L = [10 * X + Y || X <- N, Y <- N, X < Y].  % Result is [12,13,14,15,23,24,25,34,35,45]

hd(L) returns the first element in the list L; tl(L) returns the list of remaining elements.

Selected operations on lists

The following operations are predefined.

The following functions are in the lists module. To call them, either first import them, or prepend lists: to the function call.The definitions are copied from http://www.erlang.org/doc/man/lists.html. Of these, the operations map, filter, foldl, and seq are the most commonly used.

Selected operations on strings

Strings are lists of ASCII values, so all the list operations apply. The following are in the string module, so either import them or prepend each function call with string:.The definitions are copied from http://www.erlang.org/doc/man/string.html.

Records

Records are declared in a file with the syntax -record(Name, {Key1 = Default1, ..., KeyN = DefaultN}).

To read the record declarations from a file, use the function rr("records.hrl").

To define a record, use the syntax
          Variable1 = #Name{Key = Value, ..., Key = Value}.

The default value is used for any omitted Key=Value pairs. A new, modified record may be created with the syntax
          Variable2 = Variable1#Name{Key = Value, ..., Key = Value}.

Values may be extracted from a record by using pattern matching:
          #Name{Key = Variable, ..., Key = Variable} = Record.
This assigns to the Variables the corresponding Values in the Record.

Pattern matching may be used in function definitions:
          FunctionName(#Name{Key = Variable, ..., Key = Variable} = Variable) -> FunctionBody.
This makes the selected values, and the entire record (the last Variable) available in the function body.

A record is actually a tuple; the keys are just syntactic sugar available to the compiler. The function rf(Record) tells Erlang to drop the keys and treat the variable Record as the tuple {Name, Variable1, ..., VariableN}. This changes the appearance of the variable in the program, not its actual value.

The process dictionary

The process dictionary is a private, mutable hash table that is private to the current process. Keys are atoms; the value associated with a key may be changed. The use of a process dictionary negates many of the advantages of a single-assignment functional language, hence its use is strongly discouraged. Supplied operations are:

put(Key, Value) -> OldValue
Associates the Value with the Key, returning the previous value associated with the Key, or the atom undefined.
get(Key) -> Value
Returns the Value currently associated with the Key, or the atom undefined.
get() -> [{Key, Value}, ..., {Key, Value}]
Returns a list of all Key/Value tuples.
get_keys(Value) -> [Key, ..., Key]
Returns a list of Keys having the given Value.
erase(Key) -> Value
Returns the Value currently associated with the Key, or the atom undefined, and removes the Key/Value pair from the process dictionary.
erase() -> [{Key, Value}, ..., {Key, Value}]
Returns a list of all Key/Value tuples, and erases the contents of the process dictionary.

Concurrency

Concurrent programming is very simple in Erlang. There are three primitives:

Primitive Description
Pid = spawn(Fun)

Creates and starts new process ("Actor") and tells it to evaluate Fun. The new process is a very lightweight Thread, managed by Erlang, not an operating system process.

A previously defined function may be passed in with the syntax fun FunctionName/arity.

Pid ! Message Sends the Message to the Pid process. This is an asynchronous operation, that is, execution continues without waiting for a reply. The value of the expression is the Message itself.
receive
  Pattern1 [when Guard1] -> Expression_sequence1;
  Pattern2 [when Guard2] -> Expression_sequence2;
  ...
  PatternN [when GuardN] -> Expression_sequenceN
after Timeout ->
  TimeoutExpressionSequence
end

The semantics are similar to that of the case expression. The syntax is a bit complex so that a process can handle messages of many different types.

The after clause is optional. If used:

  • A positive Timeout will cause Erlang to execute the TimeoutExpressionSequence if no message is received within that number of milliseconds.
  • A zero Timeout will cause Erlang to handle a matching message, if any, then immediately execute the TimeoutExpressionSequence.
  • A Timeout of the atom infinity will cause the TimeoutExpressionSequence to never be executed.

If there are no patterns between receive and after, the statement "sleeps" for the given number of milliseconds.

Every process has a mailbox. Messages go into the mailbox on a first-come, first-served basis. When the receiving process examines the mailbox (with the receive statement), it takes the first message that can be matched by some Pattern, and executes the corresponding Expression_sequence. Unmatched messages are left in the mailbox.

To send a message to a process, you must know its process id (Pid). If you want the process to send you back a response, you must tell it your Pid, usually as part of the message; for example, Pid ! {MyPid, MessageData}.

It is also possible to register a Pid, thus making it globally available. Here are the BIFs (built-in functions) for doing that:

Exceptions

In addition to exceptions resulting from program errors, there are three kinds of exceptions that the programmer can deliberately generate:

Code which might throw an exception can be placed in a try..catch statement, with this syntax:

try FunctionOrExpressionSequence of
    Pattern1 [when Guard1] -> Expressions1;
    ...
catch
    ExType1:ExPattern1 [when ExGuard1] -> ExExpressions1;
    ...
after
    AfterExpressions
end

and this semantics:

Input/Output

As with most languages, there are a lot of I/O routines. Files can be read as binary, as a sequence of lines, or as Erlang terms. This paper describes only line-oriented I/O.

On output, data is interpolated (inserted) into the FormatString at the following locations (excluding ~n):

Input from the console

Line = io:get_line(Prompt). % Prompt is a string or an atom

Output to the console

io:format(FormatString, ListOfData).

Input from a file

{ok, Stream} = file:open(FileName, read).
Line = io:get_line(S, ''). % May return eof
file:close(S)

Output to a file

{ok, Stream} = file:open(FileName, write).
io:format(S, FormatString, ListOfData).
file:close(S).

Unix-like commands

cd("path/to/directory").
Change directory. Uses forward slashes (/) even on Windows, and argument must be quoted. Responds ok even if given a bad path, so look
pwd().
Prints the working directory.
ls().
Prints the contents of the current directory.
ls("path/to/directory").
Prints the contents of the given directory.
 

Unit Testing

Unit tests can be put in the same file (module) as the functions being tested, or they can be placed in a separate file. Tests in a separate file have access only to the functions being exported.

To do unit testing, include the declaration

-include_lib("eunit/include/eunit.hrl").
Test functions must have a name ending in _test_. The body of the test function is typically a list of calls to an assert function. Some assert functions are:

Detailed information is available at http://www.erlang.org/doc/apps/eunit/.