Voyager Papers:

Boon S. Ang, Derek Chiou, Larry Rudolph, Arvind. Message Passing Support on StarT-Voyager. In Proceedings of the 5th International Conference on High Performance Computing, Chennai (Madras), India. Dec, 98.
[postscript 307k] - [compressed 112k] - [gzipped 79k] Abstract: No single message passing mechanism can efficiently support all the different types of communication that occur naturally in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve very high performance over a wide spectrum of communication types and sizes. Hardware and operating system enforced protection allows direct user-level access to message passing facilities in a multiuser environment. StarT's protection scheme improves upon past designs by not requiring strictly synchronized gang-scheduling, and by supporting non-monolithic protection domains. To minimize the development effort and cost, the machine is designed to use unmodified commercial PowerPC 604-based SMP systems as the building block. A Network End-point Subsystem (NES) card which plugs into one of each SMP's processor card slots provides the interface to Arctic, a low-latency, high-bandwidth network currently under development at MIT. This paper describes the message passing mechanisms and their predicted performance.

Boon S. Ang, Derek Chiou, Daniel Rosenband, Mike Ehrlich, Larry Rudolph, Arvind, StarT-Voyager: A Flexible Platform for Exploring Scalable SMP Issues. MIT Laboratory for Computer Science, CSG Memo 415, December 1998. (Also in Proceedings of SuperComputing '98, November 1998, Orlando, Florida. )
[postscript 336k] - [compressed 159k] - [gzipped 86k]

Boon S. Ang, Derek Chiou, Larry Rudolph, Arvind. The StarT-Voyager Parallel System. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation echniques, Paris, France. Oct, 98.
[postscript 490k] - [compressed 117k] - [gzipped 112k]


Boon S. Ang, Derek Chiou, Larry Rudolph, Arvind, Message Passing Support for Multi-grained, Multi-threading, and Multi-tasking Environments. MIT Laboratory for Computer Science, CSG Memo 394, November 1996.
[postscript 880k] - [compressed 327k] - [gzipped 228k]


Architecture

Creating a Wider Bus Using Caching Techniques

Abstract: The effective bandwidth of a bus and external communication ports can be increased by using avariant of data compression techniques that compacts words instead of data streams. The compaction is performed by caching the high order bits into a table and sending the index into the table along with the low order bits. A coherent table at the receiving end expands the word into it original form. Compaction / expansion units can be placed between processor and memory, between processor and local bus, and between devices that access the system bus. Simulations have shown that over 90% of all information transferred can be sent in a single cycle when using a 32 bit processor connected by a 16 bit wide bus to a 32 bit memory module. This is for all forms of data, address, data, and instructions, and when a cache-based processor is used.
Scheduling

Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems

Abstract: In this paper we present the design, implementation and cost benefit trade offs for various components of a gang scheduling system for workstation clusters and massively parallel system with highly efficient message passing interconnects, which are typically operated in dedicated mode. Though this system enables time-sharing of individual nodes, we architect the system so that the reliability and the efficiency of dedicated system is preserved and no significant serialization or extra resource consumption is introduced. The design we present here is highly modular and scalable and can easily be adapted to a variety of MPP systems. The system and supports various scheduling policies.

Coscheduling Based on Run-Time Identification of Activity Working Sets (301463 bytes) or compressed postscript (112989 bytes)

Abstract: This paper introduces a method for runtime identification of sets of interacting activities ("working sets") with the purpose of coscheduling them, i.e. scheduling them so that all the activities in the set execute simultaneously on distinct processors. The identification is done by monitoring access rates to shared communication objects: activities that access the same objects at a high rate thereby interact frequently, and therefore would benefit from coscheduling. Simulation results show that coscheduling with our runtime identification scheme can give better performance than uncoordinated scheduling based on a single global activity queue. The finer-grained the interactions among the activities in a working set, the better the performance differential. Moreover, coscheduling based on automatic runtime identification achieves about the same performance as coscheduling based on manual identification of working sets by the programmer. Keywords: coscheduling, gang scheduling, on-line algorithms, activity working set.

Optics:

D. Feitelson and L. Rudolph, ``The Promise of Optical Free-Space Interconnections for Concurrent Memory Access,'' Technical Report 95-6, Department of Computer Science, Hebrew University, Jerusalem, Israel, 1995. (postscript)

Larry Rudolph, ``Bit-Parallel, Free-Space, Optical Communication'' CPAM 48, 1995 (Postscript)

D.G.Feitelson, L.Rudolph, and E.Schenfeld, `` A three-dimensional optical interconnection network with distributed control''. Intl. J. Optoelectronics Vol. 10 no. 3, 1995, pp.163--177. ( Abstract and some early version of the paper in Postscript ).

D. Feitelson, E. Schenfeld, and L. Rudolph, ``Limitations on Free-Space Optical Interconnection Networks,'' Proceedings of the 3rd International Congress on Optical Science and Engineering, The Hague, The Netherlands, March 1990. ( Abstract and some early version of the paper in Postscript ).