Generalities

Functions that give access to the system from OCaml are grouped into two modules. The first module, Sys, contains those functions common to Unix and other operating systems under which OCaml runs. The second module, Unix, contains everything specific to Unix.

In what follows, we will refer to identifiers from the Sys and Unix modules without specifying which modules they come from. That is, we will suppose that we are within the scope of the directives open Sys and open Unix. In complete examples, we explicitly write open, in order to be truly complete.

The Sys and Unix modules can redefine certain identifiers of the Pervasives module, hiding previous definitions. For example, Pervasives.stdin is different from Unix.stdin. The previous definitions can always be obtained through a prefix.

where the program prog is assumed to comprise of the three modules mod1, mod2 and mod3. The modules can also be compiled separately:

In both cases, the argument unix.cma is the Unix library written in OCaml. To use the native-code compiler rather than the bytecode compiler, replace ocamlc with ocamlopt and unix.cma with unix.cmxa.

If the compilation tool ocamlbuild is used, simply add the following line to the _tags file:

The Unix system can also be accessed from the interactive system, also known as the “toplevel”. If your platform supports dynamic linking of C libraries, start an ocaml toplevel and type in the directive:

Otherwise, you will need to create an interactive system containing the pre-loaded system functions:

1.2 Interface with the calling program

When running a program from a shell (command interpreter), the shell passes arguments and an environment to the program. The arguments are words on the command line that follow the name of the command. The environment is a set of strings of the form variable=value, representing the global bindings of environment variables: bindings set with setenv var=val for the csh shell, or with var=val; export var for the sh shell.

A more convenient way of looking up the environment is to use the function Sys.getenv:

Sys.getenv v returns the value associated with the variable name v in the environment, raising the exception Not_found if this variable is not bound.

Example

As a first example, here is the echo program, which prints a list of its arguments, as does the Unix command of the same name.

let echo () = let len = Array.length Sys.argv in if len > 1 then begin print_string Sys.argv.(1); for i = 2 to len - 1 do print_char ' '; print_string Sys.argv.(i); done; print_newline (); end;; echo ();;

* * *

The argument is the return code to send back to the calling program. The convention is to return 0 if all has gone well, and to return a non-zero code to signal an error. In conditional constructions, the sh shell interprets the return code 0 as the boolean “true”, and all non-zero codes as the boolean “false”. When a program terminates normally after executing all of the expressions of which it is composed, it makes an implicit call to exit 0. When a program terminates prematurely because an exception was raised but not caught, it makes an implicit call to exit 2. The function exit always flushes the buffers of all channels open for writing. The function at_exit lets one register other actions to be carried out when the program terminates.

The last function to be registered is called first. A function registered with at_exit cannot be unregistered. However, this is not a real restriction: we can easily get the same effect with a function whose execution depends on a global variable.

1.3 Error handling

Unless otherwise indicated, all functions in the Unix module raise the exception Unix_error in case of error.

The second argument of the Unix_error exception is the name of the system call that raised the error. The third argument identifies, if possible, the object on which the error occurred; for example, in the case of a system call taking a file name as an argument, this file name will be in the third position in Unix_error. Finally, the first argument of the exception is an error code indicating the nature of the error. It belongs to the variant type error:

Constructors of this type have the same names and meanings as those used in the posix convention and certain errors from unix98 and bsd. All other errors use the constructor EUNKOWNERR.

Given the semantics of exceptions, an error that is not specifically foreseen and intercepted by a try propagates up to the top of a program and causes it to terminate prematurely. In small applications, treating unforeseen errors as fatal is a good practice. However, it is appropriate to display the error clearly. To do this, the Unix module supplies the handle_unix_error function:

The call handle_unix_error f x applies function f to the argument x. If this raises the exception Unix_error, a message is displayed describing the error, and the program is terminated with exit 2. A typical use is

where the function prog : unit -> unit executes the body of the program. For reference, here is how handle_unix_error is implemented.

1 open Unix;; 2 let handle_unix_error f arg = 3 try 4 f arg 5 with Unix_error(err, fun_name, arg) -> 6 prerr_string Sys.argv.(0); 7 prerr_string ": \""; 8 prerr_string fun_name; 9 prerr_string "\" failed"; 10 if String.length arg > 0 then begin 11 prerr_string " on \""; 12 prerr_string arg; 13 prerr_string "\"" 14 end; 15 prerr_string ": "; 16 prerr_endline (error_message err); 17 exit 2;;

Functions of the form prerr_xxx are like the functions print_xxx, except that they write on the error channel stderr rather than on the standard output channel stdout.

The primitive error_message , of type error -> string, returns a message describing the error given as an argument (line 16). The argument number zero of the program, namely Sys.argv.(0), contains the name of the command that was used to invoke the program (line 6).

The function handle_unix_error handles fatal errors, i.e. errors that stop the program. An advantage of OCaml is that it requires all errors to be handled, if only at the highest level by halting the program. Indeed, any error in a system call raises an exception, and the execution thread in progress is interrupted up to the level where the exception is explicitly caught and handled. This avoids continuing the program in an inconsistent state.

Errors of type Unix_error can, of course, be selectively matched. We will often see the following function later on:

which is used to execute a function and to restart it automatically when it executes a system call that is interrupted (see section 4.5).

1.4 Library functions

As we will see throughout the examples, system programming often repeats the same patterns. To reduce the code of each application to its essentials, we will want to define library functions that factor out the common parts.

Whereas in a complete program one knows precisely which errors can be raised (and these are often fatal, resulting in the program being stopped), we generally do not know the execution context in the case of library functions. We cannot suppose that all errors are fatal. It is therefore necessary to let the error return to the caller, which will decide on a suitable course of action (e.g. stop the program, or handle or ignore the error). However, the library function in general will not allow the error to simply pass through, since it must maintain the system in a consistent state. For example, a library function that opens a file and then applies an operation to its file descriptor must take care to close the descriptor in all cases, including those where the processing of the file causes an error. This is in order to avoid a file descriptor leak, leading to the exhaustion of file descriptors.

Furthermore, the operation applied to a file may be defined by a function that was received as an argument, and we don’t know precisely when or how it can fail (but the caller in general will know). We are thus often led to protect the body of the processing with “finalization” code, which must be executed just before the function returns, whether normally or exceptionally.

There is no built-in finalize construct try …finalize in the OCaml language, but it can be easily defined¹:

let try_finalize f x finally y = let res = try f x with exn -> finally y; raise exn in finally y; res

This function takes the main body f and the finalizer finally, each in the form of a function, and two parameters x and y, which are passed to their respective functions. The body of the program f x is executed first, and its result is kept aside to be returned after the execution of the finalizer finally. In case the program fails, i.e. raises an exception exn, the finalizer is run and the exception exn is raised again. If both the main function and the finalizer fail, the finalizer’s exception is raised (one could choose to have the main function’s exception raised instead).

Note

In the rest of this course, we use an auxiliary library Misc which contains several useful functions like try_finalize that are often used in the examples. We will introduce them as they are needed. To compile the examples of the course, the definitions of the Misc module need to be collected and compiled.

The Misc module also contains certain functions, added for illustration purposes, that will not be used in the course. These simply enrich the Unix library, sometimes by redefining the behavior of certain functions. The Misc module must thus take precedence over the Unix module.

Examples

The course provides numerous examples. They can be compiled with OCaml, version 4.01.0. Some programs will have to be slightly modified in order to work with older versions.

There are two kinds of examples: “library functions” (very general functions that can be reused) and small applications. It is important to distinguish between the two. In the case of library functions, we want their context of use to be as general as possible. We will thus carefully specify their interface and attentively treat all particular cases. In the case of small applications, an error is often fatal and causes the program to stop executing. It is sufficient to report the cause of an error, without needing to return to a consistent state, since the program is stopped immediately thereafter.