Functions that give access to the system from OCaml are grouped into two modules. The first module, Sys, contains those functions common to Unix and other operating systems under which OCaml runs. The second module, Unix, contains everything specific to Unix.
In what follows, we will refer to identifiers from the Sys
and
Unix
modules without specifying which modules they come from. That is, we
will suppose that we are within the scope of the directives
open Sys
and open Unix
. In complete examples, we explicitly write
open
, in order to be truly complete.
The Sys
and Unix
modules can redefine certain
identifiers of the Pervasives
module, hiding previous
definitions. For example, Pervasives.stdin
is different from
Unix.stdin
. The previous definitions can always be obtained
through a prefix.
To compile an OCaml program that uses the Unix library, do this:
where the program prog
is assumed to comprise of the three modules mod1
,
mod2
and mod3
. The modules can also be compiled separately:
and linked with:
In both cases, the argument unix.cma
is the Unix
library
written in OCaml. To use the native-code compiler rather than the
bytecode compiler, replace ocamlc
with ocamlopt
and
unix.cma
with unix.cmxa
.
If the compilation tool ocamlbuild
is used, simply add the
following line to the
_tags
file:
The Unix system can also be accessed from the interactive system,
also known as the “toplevel”. If your platform supports dynamic
linking of C libraries, start an ocaml
toplevel and type in the
directive:
Otherwise, you will need to create an interactive system containing the pre-loaded system functions:
This toplevel can be started by:
When running a program from a shell (command interpreter), the shell passes arguments and an environment to the program. The arguments are words on the command line that follow the name of the command. The environment is a set of strings of the form variable=value, representing the global bindings of environment variables: bindings set with setenv var=val for the csh shell, or with var=val; export var for the sh shell.
The arguments passed to the program are in the string array
Sys.argv
:
The environment of the program is obtained by the function
Unix.environment
:
A more convenient way of looking up the environment is to use the
function Sys.getenv
:
Sys.getenv v
returns the value associated with the variable name
v
in
the environment, raising the exception Not_found
if this
variable is not bound.
As a first example, here is the echo
program, which prints a
list of its arguments, as does the Unix command of the same name.
A program can be terminated at any point with a call to exit
:
The argument is the return code to send back to the calling program. The
convention is to return 0 if all has gone well, and to return a
non-zero code to signal an error. In conditional constructions, the
sh
shell interprets the return code 0 as the boolean
“true”, and all non-zero codes as the boolean “false”.
When a program terminates normally after executing all of the
expressions of which it is composed, it makes an implicit call to
exit 0
. When a program terminates prematurely because an
exception was raised but not caught, it makes an implicit call to
exit 2
.
The function exit
always flushes the buffers of all channels open for
writing. The function at_exit
lets one register other actions
to be carried out when the program terminates.
The last function to be registered is called first. A function registered with
at_exit
cannot be unregistered. However, this is not a
real restriction: we can easily get the same effect with a function
whose execution depends on a global variable.
Unless otherwise indicated, all functions in the Unix
module
raise the exception Unix_error
in case of error.
The second argument of the Unix_error
exception is the name of
the system call that raised the error. The third argument identifies,
if possible, the object on which the error occurred; for example, in
the case of a system call taking a file name as an argument, this file name will be
in the third position in Unix_error
. Finally, the first argument
of the exception is an error code indicating the nature of the
error. It belongs to the variant type error
:
Constructors of this type have the same names and meanings as those
used in the posix convention and certain errors from
unix98 and bsd. All other errors use the constructor EUNKOWNERR
.
Given the semantics of exceptions, an error that is not specifically
foreseen and intercepted by a try
propagates up to the top of a
program and causes it to terminate prematurely. In small
applications, treating unforeseen errors as fatal is a good practice.
However, it is appropriate to display the error clearly. To do this,
the Unix
module supplies the handle_unix_error
function:
The call handle_unix_error f x
applies function f
to the
argument x
. If this raises the exception Unix_error
, a
message is displayed describing the error, and the program is
terminated with exit 2
. A typical use is
where the function prog : unit -> unit
executes the body of the
program. For reference, here is how handle_unix_error
is
implemented.
Functions of the form prerr_xxx
are like the functions
print_xxx
, except that they write on the error channel
stderr
rather than on the standard output channel stdout
.
The primitive error_message, of type
error -> string
, returns a message describing the error given as an
argument (line 16). The argument number zero of the
program, namely Sys.argv.(0)
, contains the name of the command
that was used to invoke the program (line 6).
The function handle_unix_error
handles fatal errors, i.e. errors
that stop the program. An advantage of OCaml is that it requires
all errors to be handled, if only at the highest level by
halting the program. Indeed, any error in a system call raises an
exception, and the execution thread in progress is interrupted up to
the level where the exception is explicitly caught and handled. This avoids
continuing the program in an inconsistent state.
Errors of type Unix_error
can, of course, be
selectively matched. We will often see the following
function later on:
which is used to execute a function and to restart it automatically when it executes a system call that is interrupted (see section 4.5).
As we will see throughout the examples, system programming often repeats the same patterns. To reduce the code of each application to its essentials, we will want to define library functions that factor out the common parts.
Whereas in a complete program one knows precisely which errors can be raised (and these are often fatal, resulting in the program being stopped), we generally do not know the execution context in the case of library functions. We cannot suppose that all errors are fatal. It is therefore necessary to let the error return to the caller, which will decide on a suitable course of action (e.g. stop the program, or handle or ignore the error). However, the library function in general will not allow the error to simply pass through, since it must maintain the system in a consistent state. For example, a library function that opens a file and then applies an operation to its file descriptor must take care to close the descriptor in all cases, including those where the processing of the file causes an error. This is in order to avoid a file descriptor leak, leading to the exhaustion of file descriptors.
Furthermore, the operation applied to a file may be defined by a function that was received as an argument, and we don’t know precisely when or how it can fail (but the caller in general will know). We are thus often led to protect the body of the processing with “finalization” code, which must be executed just before the function returns, whether normally or exceptionally.
There is no built-in finalize construct try
…finalize
in
the OCaml language, but it can be easily defined1:
This function takes the main body f
and the finalizer
finally
, each in the form of a function, and two parameters x
and y
, which are passed to their respective functions. The body
of the program f x
is executed first, and its result is kept
aside to be returned after the execution of the finalizer
finally
. In case the program fails, i.e. raises an exception exn
,
the finalizer is run and the exception exn
is raised
again. If both the main function and the finalizer fail, the
finalizer’s exception is raised (one could choose to have the main
function’s exception raised instead).
In the rest of this course, we use an auxiliary library Misc
which contains several useful functions like try_finalize
that are often
used in the examples. We will introduce them as they are needed. To
compile the examples of the course, the definitions of the Misc
module need to be collected and compiled.
The Misc
module also contains certain functions, added for
illustration purposes, that will not be used in the course. These
simply enrich the Unix
library, sometimes by redefining the
behavior of certain functions. The Misc
module must thus take
precedence over the Unix
module.
The course provides numerous examples. They can be compiled with OCaml, version 4.01.0. Some programs will have to be slightly modified in order to work with older versions.
There are two kinds of examples: “library functions” (very general functions that can be reused) and small applications. It is important to distinguish between the two. In the case of library functions, we want their context of use to be as general as possible. We will thus carefully specify their interface and attentively treat all particular cases. In the case of small applications, an error is often fatal and causes the program to stop executing. It is sufficient to report the cause of an error, without needing to return to a consistent state, since the program is stopped immediately thereafter.