Modern processors are capable of very high performance,
but their operation bears little resemblance to the simple
models that have been used to develop algorithms. Consequently
many "classical" algorithms are not well suited to modern
machines. This disparity and how it might be
addressed are the subject of this seminar.
We will read technical papers in the following areas
"Memory-friendly" algorithms that explicitly take into account
the memory hierarchy in their design and operation.
Algorithms for searching, sorting, and basic
linear algebra operations will be studied from this perspective.
"Friendly-memory" machines that use latency-hiding
mechanisms to decrease or eliminate the effect of the
memory hierarchy for independent memory operations.
Examples mechanisms include vector units and multithreading.
Design-time, compile-time and run-time techniques that support
and automate some of the techniques above.
Participants in this seminar are expected to present at least
one of the papers and to contribute to the discussion of the
other papers.
Announcements
Reading Assignments
For 02/20/98: Narlikar & Blelloch, "Space-Efficient Implementation
of Nested Parallelism"
For 02/13/98: Blelloch, Gibbons & Matias, "Provably Efficient Scheduling
for Languages with Fine-Grained Parallelism".
For 02/06/98: Blumofe & Leiserson, "Scheduling Multithreaded
Computations by Work Stealing".
For 01/30/98: Blumofe & Leiserson, "Space-Efficient Scheduling of
Multithreaded Computations".
For 01/23/98: Blumofe et al., "Cilk An
Efficient Multithreaded Runtime System".
For 01/16/98: Vishkin, "Can parallel algorithms enhance serial
implementation?" (CACM position statement and longer technical
report).
Reading List
(NB!: Not finalized yet)
Latency-hiding
Vishkin, U., "Can parallel algorithms enhance serial implementation?",
Communications of the ACM, 39,9 (1996), 88-91
David S. Lecomber, K. Ronald Sujithan, and Jonathan M. D. Hill
"Architecture-independent locality analysis and efficient
PRAM simulations"
In HPCN'97, Springer-Verlag, Vienna, April 1997.
G. Blelloch, , P. Gibbons, Y. Matias, M. Zagha, "Accounting for
Memory Bank Contention and Delay in High-Bandwidth Multiprocessors",
SPAA 96, ACM.
P. Gibbons, Y. Matias, V. Ramachandran, "Efficient Low-Contention
Parallel Algorithms", Proc. 6th SPAA, ACM, 1994.
Cilk: An
Efficient Multithreaded Runtime System, by Robert D. Blumofe,
Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson,
Keith H. Randall, and Yuli Zhou, In 5th ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, July 19-21, 1995,
Santa Barbara, California, pp. 207-216 ( PPOPP '95).
Silvan Toledo, Improving memory performance of sparse matrix
operations.
Language and compile-time techniques
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali.
Data-centric Multi-level Blocking .; In Programming Language
Design and Implementation, June 1997.