pH Manifesto

pH: a parallel Haskell

Arvind (MIT)
Lennart Augustsson (Chalmers)
Jamey Hicks (Motorola MCRC)
Rishiyur Nikhil (DEC CRL)
Simon Peyton Jones (Glasgow)
Joe Stoy (Oxford)
John Williams (IBM research)

This note describes pH, a parallel variant of Haskell[1]. It is also an invitation to join this effort.

Why pH?

pH is an attempt to draw together the lazy functional community (as represented by Haskell) and the dataflow community (as represented by Id and Sisal). By sharing a large common language base (Haskell), we hope that we will find it easier to share ideas and implementations, exchange programs and reduce harmful diversity. Application programmers will find it easier to grapple with only one syntax and type system.

We do not seek uniformity. There are clear differences between pH and traditional Haskell. But having a common framework will allow us to highlight and study these differences more easily than when the languages involved differed more widely.

We hope that pH will become a collaborative effort, as the Haskell language itself is, rather than being associated with any one research group. If enough people are interested, we will form some kind of committee and mailing list to steer the language. If you are interested in joining in the development of pH, or just want to comment on it, send email to pH@abp.lcs.mit.edu. If you would like to subscribe to the pH mailing list, send email to pH-request@abp.lcs.mit.edu.

What is pH?

pH is all about parallelism. There is more than one approach to parallel implementations of functional languages. For example, it is possible to exploit the implicit parallelism of a Haskell program by concurrent evaluation of the arguments of strict operators. The absence of side effects ensures that this concurrent evaluation cannot change the result. Strictness analysis techniques widen the scope of this idea, by allowing parallel evaluation of any strict function argument. Further, it is possible to provide programmer annotations to indicate sub-expressions which should be evaluated in parallel, even when the compiler cannot prove that the value of all the sub-expressions will be required. Unlike similar annotations in imperative languages, the annotations can only affect termination, and not the results (if any). All these approaches take *demand-driven* evaluation as the starting point: in the absence of annotations, expressions are evaluated only if their value is sure to be required.

An alternative approach is to use a *parallel* evaluation order: that is, all redexes are reduced in parallel, not merely those whose results are sure to be needed. This strategy also implements non-strict semantics, provided that no redex has its evaluation delayed indefinitely. A parallel evaluation strategy is followed by the Id programming language[2], in which any redex not in the body of a conditional or lambda abstraction will be reduced.

pH is a variant of the Haskell programming language which follows Id's parallel evaluation strategy:

           pH = Haskell syntax and type system
                 + Id evaluation strategy

In practice, implementations of pH are unlikely to guarantee that every redex will eventually be reduced, which in turn may mean that some programs fail to give a result and terminate when they would do so in "traditional" Haskell. However, if pH gives a proper (non-error, non-bottom) result for a particular program, then so does "traditional" Haskell, and, these results are the same.

I-structures

Both Haskell and Id provide arrays, constructed using array comprehensions. But sometimes array comprehensions aren't quite what you want. So Id provides I-structures as the "nuts and bolts" through which array comprehensions are implemented, and which can also be used by the sophisticated programmer to implement other encapsulated functional data structures. I-structures are interesting in their own right: via type declarations they can also be used to allow arbitrary data structures with assignable "holes" in them. However, each of these holes can be filled at most once. I-structures are closely related to *logic variables* and are used in similar manner in languages like KL/1 and Strand.

I-structures depend critically on a precise specification of the evaluation strategy --- specifically, it must be easy to tell just what will be evaluated and what will not. Id's parallel evaluation strategy makes this question easy to answer, in contrast to a demand-driven strategy where it is hard to say just what will get evaluated. (The notion of a control block in an Id program, that is, a group of statements that is guaranteed to be completely executed once initiated, corresponds closely to similar notions in imperative languages.) I-structures preserve determinacy (aka the Church-Rosser property) but do not allow duplication (and hence free-wheeling substitution) of unevaluated expressions, since single assignments may thereby become multiple assignments, which are forbidden.

pH adopts I-structures.

(Historical note. The introduction of array comprehensions has done much to eliminate the need for explicit I-structure programming, so that I-structures are now much less prominent in Id programs than they used to be. Nowadays, I-structures are mainly used by the compiler in desugaring of array and list comprehensions, library code, or in certain performance-critical loops which, for example, generate two arrays at once. They are also used extensively as a way to express complex constructs in more primitive forms.)

State and assignment

Purely-functional languages deliberately lack assignment, though this leads to a lack of expressiveness which is sometimes trying. Strict (almost)-functional languages, such as SML or Scheme, often re-introduce assignment, but at the cost of specifying a precise sequential evaluation order for all programs, even when it would not otherwise be required. Introducing state into non-strict functional languages is an active area of research (e.g. monads, mutable abstract data types), but these approaches come mostly at the cost of imposing some sequentiality, in order to preserve a well-defined functional semantics.

Id demonstrates an alternative approach, based on M-structures, which does not come at the cost of ubiquitous sequentiality. It doesn't come for free either -- there is a substantial semantic cost -- but M-structures allow one to express some parallel algorithms which simply cannot be expressed without them. Proper programming with M-structures requires more fine grain control over the default evaluation order of Id. This is achieved through the use of local barriers, that is, sequencing of a group of statements, in Id. M-structures lose determinacy.

pH adopts M-structures, but we recognize that they are still immature. We hope, that with some experience, a more disciplined use of M-structures will evolve and will be more tractable semantically.

Language structure

pH is a layered language:

pH(F) is the purely-functional subset of pH. Its main addition to Haskell is for- and while-loops. For example, one could compute the sum of the integers between 1 and n like this:

      let sum = 0
      in for i <- [1..n] do 
              next sum = sum + i
      finally sum

Loops are purely functional, and can easily be translated into a tail recursive function. Ability to group statements for sequential execution is also part of pH(F), and may be used for resource control.

pH(I) adds I-structures to pH(F).

pH(M) adds M-structures to pH(I). Full pH is the same as pH(M).

Stricter versions of pH:

It is well known that it is often more difficult to compile non-strict languages efficiently than the corresponding strict languages. Thus it also makes sense to explore another dimension: making pH stricter in different ways. Three variants immediately come to mind:

Make function applications strict;
Make all data structures strict;
Both of the above.

All these variants should affect only the termination and not the answers produced. However, for pH(M), these variants would also reduce the set of possible solutions.

Pragmas and Other Annotations:

We expect that there will be a common set of pragmas as well as implementation-specific sets of pragmas for efficient compiling of pH on a variety of architectures. In particular, we expect to introduce storage layout pragmas in the spirit of HPF (High Performance FORTRAN), to ease the compilation on distributed memory machines.

Each dialect of Haskell may have different pragmas, but their correct use should not affect the result of the un-pragma'd program.

Implementations:

Initial implementations of pH will come from Arvind's group at the MIT Lab for Computer Science, and from Nikhil at the DEC Cambridge Research Lab. These implementations would rely on Haskell front ends produced by other groups. Ultimately, they will replace their authors' Id implementations and tools. Mechanical translators from Id to pH will be easy to create --- the process is little more than transliteration.

References:

[1] Report on Programming Language Haskell: A Non-strict, Purely Functional Language, Version 1.2, ACM SIGPLAN Notices, Vol 27 No 5, May 1992.

[2] Nikhil, Rishiyur S., Id Language Reference Manual, Version 90.1, Computation Structures Group Memo 284-2, Laboratory for Computer Science, M.I.T., July 15 1991.

email to Haskell mailing list, September 1, 1993