Quicksort lends itself well to parallelization, because the sorting is done recursively on independent sub-arrays. This can be delegated to threads whose only synchronization is to wait for all the threads to finish their sorting, in order for the sub-array to be sorted.

1 let qsort cmp arr =
2 let rec qsort lo hi =
3 if hi - lo > 0 then
4 begin
5 let mid = (lo + hi) lsr 1 in
6 if cmp arr.(mid) arr.(lo) then swap arr mid lo;
7 if cmp arr.(hi) arr.(mid) then
8 begin
9 swap arr mid hi;
10 if cmp arr.(mid) arr.(lo) then swap arr mid lo
11 end;
12 let pivot = arr.(mid) in
13 let i = ref (lo + 1) and j = ref (hi - 1) in
14 while !i < !j do
15 while not (cmp pivot arr.(!i)) do incr i done;
16 while not (cmp arr.(!j) pivot) do decr j done;
17 if !i < !j then swap arr !i !j;
18 done;
19 let u = Thread.create (qsort lo) (!i-1) in
20 let v = Thread.create (qsort (!i+1)) hi in
21 Thread.join u;
22 Thread.join v
23 end in
24 qsort 0 (Array.length arr - 1);;

It would be correct, but uninteresting, to exchange lines 20 and 21. In effect, that would wait for the lower portion of the array to be sorted before starting the sort of the upper portion. We would thus obtain the behavior of a sequential program, with the additional cost of threads but without obtaining any benefit.

In practice, we should limit parallelization to a reasonable factor and continue sequentially after that.

* * *