I just met with Arvind. Looks like he just did his first serious reading. He is saying that he still does not have a good feel of what to say, but he did point out what he though are the weaknesses in the paper, and what he thinks should be mentioned. He is taking this as a "positiong" paper, and therefore needs to think and be convinced of the issues and solutions to them. He mentioned the following points: (1) main weakness: (a) why multithreaded architecture at all? What does multithreaded architecture solve? We need to provided motivation for doing it before jumping in to say what is needed to support it. [Boon's comments: I'm not sure what multithreaded architecture really is, and I'm sure if Arvind knows exactly, but some points became clearer during the course of our discussion. Will talk to them later.] (b) In general, he is looking for a better framework to the paper. (2) The topics that will be discussed are: (a) Post Monsoon assessment (b) weakness (incompleteness) of *T (ISCA '90) (c) new *T model (d) new *T implementation (may side-step somewhat) (3) He suggested the following "leadin" to the paper. (a) Talk about several common programming models, describe them, talk about possible optimizations (software, RTS), and architectural features/hacks that can help. purpose is to arrive at a set of features needed for parallel processing. Models: - Data parallel: single-threaded; global-memory access; - threaded C: multiple threads of control, spawning and synchronization. - Id like model both strict and non-strict versions: multiple threads of control; frames and heap... Granularity of parallel execution; partitioning of data; static vs dynamic scheduling... * need global accesses * need some notion of synchronization: various granularity. * size of thread. Maybe talk about Dataflow graphs. (b) Two fundamental issues: (a) memory latency options: split-phase caches (b) synchronization options: busy waiting context switching. (c) Monsoon's solution to these issues. What we have learned; what we have missed. Dataflow: uniform solution for both problems; always "split-phase"; switching to something else immediately good message passing capabilities -> simple, coherent solution. Arvind also mentioned keeping path to local & global memory separate as a shortcoming, making it difficult to add caching. (d) *T like solution --> multi-threaded ideas. Ideas have creeped up in Monsoon: - state only left in frame slots. - hardware continuation stack. ... [Things sort of tailed off here. Not clear what *T (old) is good or not good for.] (e) New *T model: - addition of global caches. - multitheading *not* for tolerating memory latency, but for synchronization. (f) either separate ourselves from Alewife and FLASH, or say that we have all reached the same point, starting from different positions. The Plan is for Arvind to read the TAM (jpdc) paper and FLASH paper tomorrow. I'll go read the 61X manual. We'll get back on Thu to discuss again before jumping in to more writing and reorganization. Will keep you informed. How are things at home? Is it freezing cold? - Boon