Boon S. Ang, Derek
Chiou, Larry Rudolph,
Arvind. Message Passing
Support on StarT-Voyager. In Proceedings of the 5th International Conference
on High Performance Computing, Chennai (Madras), India. Dec, 98.
[postscript 307k]
- [compressed
112k] - [gzipped 79k]
Abstract: No single message passing mechanism can efficiently support all
the different types of communication that occur naturally in most parallel or
distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory
parallel machine, provides four message passing mechanisms to achieve very high
performance over a wide spectrum of communication types and sizes. Hardware
and operating system enforced protection allows direct user-level access to
message passing facilities in a multiuser environment. StarT's protection scheme
improves upon past designs by not requiring strictly synchronized gang-scheduling,
and by supporting non-monolithic protection domains. To minimize the development
effort and cost, the machine is designed to use unmodified commercial PowerPC
604-based SMP systems as the building block. A Network End-point Subsystem (NES)
card which plugs into one of each SMP's processor card slots provides the interface
to Arctic, a low-latency, high-bandwidth network currently under development
at MIT. This paper describes the message passing mechanisms and their predicted
performance.
Boon S. Ang, Derek
Chiou, Daniel Rosenband,
Mike Ehrlich, Larry
Rudolph, Arvind,
StarT-Voyager: A Flexible Platform for Exploring Scalable SMP Issues. MIT
Laboratory for Computer Science, CSG Memo 415, December 1998. (Also in Proceedings
of SuperComputing '98, November 1998, Orlando, Florida. )
[postscript 336k]
- [compressed
159k] - [gzipped 86k]
Boon S. Ang, Derek
Chiou, Larry Rudolph,
Arvind. The StarT-Voyager
Parallel System. In Proceedings of the 1998 International Conference on
Parallel Architectures and Compilation echniques, Paris, France. Oct, 98.
[postscript 490k]
- [compressed
117k] - [gzipped 112k]
Creating a Wider Bus Using Caching Techniques
Abstract: The effective bandwidth of a bus and external communication ports can be increased by using avariant of data compression techniques that compacts words instead of data streams. The compaction is performed by caching the high order bits into a table and sending the index into the table along with the low order bits. A coherent table at the receiving end expands the word into it original form. Compaction / expansion units can be placed between processor and memory, between processor and local bus, and between devices that access the system bus. Simulations have shown that over 90% of all information transferred can be sent in a single cycle when using a 32 bit processor connected by a 16 bit wide bus to a 32 bit memory module. This is for all forms of data, address, data, and instructions, and when a cache-based processor is used.Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems
Abstract: In this paper we present the design, implementation and cost benefit trade offs for various components of a gang scheduling system for workstation clusters and massively parallel system with highly efficient message passing interconnects, which are typically operated in dedicated mode. Though this system enables time-sharing of individual nodes, we architect the system so that the reliability and the efficiency of dedicated system is preserved and no significant serialization or extra resource consumption is introduced. The design we present here is highly modular and scalable and can easily be adapted to a variety of MPP systems. The system and supports various scheduling policies.Coscheduling Based on Run-Time Identification of Activity Working Sets (301463 bytes) or compressed postscript (112989 bytes)
Abstract: This paper introduces a method for runtime identification of sets of interacting activities ("working sets") with the purpose of coscheduling them, i.e. scheduling them so that all the activities in the set execute simultaneously on distinct processors. The identification is done by monitoring access rates to shared communication objects: activities that access the same objects at a high rate thereby interact frequently, and therefore would benefit from coscheduling. Simulation results show that coscheduling with our runtime identification scheme can give better performance than uncoordinated scheduling based on a single global activity queue. The finer-grained the interactions among the activities in a working set, the better the performance differential. Moreover, coscheduling based on automatic runtime identification achieves about the same performance as coscheduling based on manual identification of working sets by the programmer. Keywords: coscheduling, gang scheduling, on-line algorithms, activity working set.Optics:
D. Feitelson and L. Rudolph, ``The Promise of Optical Free-Space Interconnections for Concurrent Memory Access,'' Technical Report 95-6, Department of Computer Science, Hebrew University, Jerusalem, Israel, 1995. (postscript)
Larry Rudolph, ``Bit-Parallel, Free-Space, Optical Communication'' CPAM 48, 1995 (Postscript)
D.G.Feitelson, L.Rudolph, and E.Schenfeld, `` A three-dimensional optical interconnection network with distributed control''. Intl. J. Optoelectronics Vol. 10 no. 3, 1995, pp.163--177. ( Abstract and some early version of the paper in Postscript ).
D. Feitelson, E. Schenfeld, and L. Rudolph, ``Limitations on Free-Space Optical Interconnection Networks,'' Proceedings of the 3rd International Congress on Optical Science and Engineering, The Hague, The Netherlands, March 1990. ( Abstract and some early version of the paper in Postscript ).