Functions that give access to the system from OCaml are grouped into two modules. The first module, Sys, contains those functions common to Unix and other operating systems under which OCaml runs. The second module, Unix, contains everything specific to Unix.
In what follows, we will refer to identifiers from the
Unix modules without specifying which modules they come from. That is, we
will suppose that we are within the scope of the directives
open Sys and
open Unix. In complete examples, we explicitly write
open, in order to be truly complete.
Unix modules can redefine certain
identifiers of the
Pervasives module, hiding previous
definitions. For example,
Pervasives.stdin is different from
Unix.stdin. The previous definitions can always be obtained
through a prefix.
To compile an OCaml program that uses the Unix library, do this:
where the program
prog is assumed to comprise of the three modules
mod3. The modules can also be compiled separately:
and linked with:
In both cases, the argument
unix.cma is the
written in OCaml. To use the native-code compiler rather than the
bytecode compiler, replace
If the compilation tool
ocamlbuild is used, simply add the
following line to the
The Unix system can also be accessed from the interactive system,
also known as the “toplevel”. If your platform supports dynamic
linking of C libraries, start an
ocaml toplevel and type in the
Otherwise, you will need to create an interactive system containing the pre-loaded system functions:
This toplevel can be started by:
When running a program from a shell (command interpreter), the shell passes arguments and an environment to the program. The arguments are words on the command line that follow the name of the command. The environment is a set of strings of the form variable=value, representing the global bindings of environment variables: bindings set with setenv var=val for the csh shell, or with var=val; export var for the sh shell.
The arguments passed to the program are in the string array
The environment of the program is obtained by the function
A more convenient way of looking up the environment is to use the
Sys.getenv v returns the value associated with the variable name
the environment, raising the exception
Not_found if this
variable is not bound.
As a first example, here is the
echo program, which prints a
list of its arguments, as does the Unix command of the same name.
A program can be terminated at any point with a call to
The argument is the return code to send back to the calling program. The
convention is to return 0 if all has gone well, and to return a
non-zero code to signal an error. In conditional constructions, the
sh shell interprets the return code 0 as the boolean
“true”, and all non-zero codes as the boolean “false”.
When a program terminates normally after executing all of the
expressions of which it is composed, it makes an implicit call to
exit 0. When a program terminates prematurely because an
exception was raised but not caught, it makes an implicit call to
exit always flushes the buffers of all channels open for
writing. The function
at_exit lets one register other actions
to be carried out when the program terminates.
The last function to be registered is called first. A function registered with
at_exit cannot be unregistered. However, this is not a
real restriction: we can easily get the same effect with a function
whose execution depends on a global variable.
Unless otherwise indicated, all functions in the
raise the exception
Unix_error in case of error.
The second argument of the
Unix_error exception is the name of
the system call that raised the error. The third argument identifies,
if possible, the object on which the error occurred; for example, in
the case of a system call taking a file name as an argument, this file name will be
in the third position in
Unix_error. Finally, the first argument
of the exception is an error code indicating the nature of the
error. It belongs to the variant type
Constructors of this type have the same names and meanings as those
used in the posix convention and certain errors from
unix98 and bsd. All other errors use the constructor
Given the semantics of exceptions, an error that is not specifically
foreseen and intercepted by a
try propagates up to the top of a
program and causes it to terminate prematurely. In small
applications, treating unforeseen errors as fatal is a good practice.
However, it is appropriate to display the error clearly. To do this,
Unix module supplies the
handle_unix_error f x applies function
f to the
x. If this raises the exception
message is displayed describing the error, and the program is
exit 2. A typical use is
where the function
prog : unit -> unit executes the body of the
program. For reference, here is how
Functions of the form
prerr_xxx are like the functions
print_xxx, except that they write on the error channel
stderr rather than on the standard output channel
The primitive error_message, of type
error -> string, returns a message describing the error given as an
argument (line 16). The argument number zero of the
Sys.argv.(0), contains the name of the command
that was used to invoke the program (line 6).
handle_unix_error handles fatal errors, i.e. errors
that stop the program. An advantage of OCaml is that it requires
all errors to be handled, if only at the highest level by
halting the program. Indeed, any error in a system call raises an
exception, and the execution thread in progress is interrupted up to
the level where the exception is explicitly caught and handled. This avoids
continuing the program in an inconsistent state.
Errors of type
Unix_error can, of course, be
selectively matched. We will often see the following
function later on:
which is used to execute a function and to restart it automatically when it executes a system call that is interrupted (see section 4.5).
As we will see throughout the examples, system programming often repeats the same patterns. To reduce the code of each application to its essentials, we will want to define library functions that factor out the common parts.
Whereas in a complete program one knows precisely which errors can be raised (and these are often fatal, resulting in the program being stopped), we generally do not know the execution context in the case of library functions. We cannot suppose that all errors are fatal. It is therefore necessary to let the error return to the caller, which will decide on a suitable course of action (e.g. stop the program, or handle or ignore the error). However, the library function in general will not allow the error to simply pass through, since it must maintain the system in a consistent state. For example, a library function that opens a file and then applies an operation to its file descriptor must take care to close the descriptor in all cases, including those where the processing of the file causes an error. This is in order to avoid a file descriptor leak, leading to the exhaustion of file descriptors.
Furthermore, the operation applied to a file may be defined by a function that was received as an argument, and we don’t know precisely when or how it can fail (but the caller in general will know). We are thus often led to protect the body of the processing with “finalization” code, which must be executed just before the function returns, whether normally or exceptionally.
There is no built-in finalize construct
the OCaml language, but it can be easily defined1:
This function takes the main body
f and the finalizer
finally, each in the form of a function, and two parameters
y, which are passed to their respective functions. The body
of the program
f x is executed first, and its result is kept
aside to be returned after the execution of the finalizer
finally. In case the program fails, i.e. raises an exception
the finalizer is run and the exception
exn is raised
again. If both the main function and the finalizer fail, the
finalizer’s exception is raised (one could choose to have the main
function’s exception raised instead).
In the rest of this course, we use an auxiliary library
which contains several useful functions like
try_finalize that are often
used in the examples. We will introduce them as they are needed. To
compile the examples of the course, the definitions of the
module need to be collected and compiled.
Misc module also contains certain functions, added for
illustration purposes, that will not be used in the course. These
simply enrich the
Unix library, sometimes by redefining the
behavior of certain functions. The
Misc module must thus take
precedence over the
The course provides numerous examples. They can be compiled with OCaml, version 4.01.0. Some programs will have to be slightly modified in order to work with older versions.
There are two kinds of examples: “library functions” (very general functions that can be reused) and small applications. It is important to distinguish between the two. In the case of library functions, we want their context of use to be as general as possible. We will thus carefully specify their interface and attentively treat all particular cases. In the case of small applications, an error is often fatal and causes the program to stop executing. It is sufficient to report the cause of an error, without needing to return to a consistent state, since the program is stopped immediately thereafter.