$ Revised: Thu Nov 5 2009 by prins@cs.unc.edu
Overview
This is an introductory graduate course covering several aspects of
high-performance computing, primarily focused on parallel computing.
Upon completion, you should
- be able to design and analyze parallel algorithms for a variety of
problems and computational models,
- be familiar with the hardware and software organization of
high-performance parallel computing systems, and
- have experience with the implementation of parallel applications
on high-performance computing systems, and be able to measure,
tune, and report on their performance.
Additional information including the course syllabus can be found in the
course overview.
All parallel programming models discussed in this class are supported on
BASS
which will be available for use in this class.
Announcements
- The shared memory node available for performance analysis is no longer running any background jobs that interfere with scaling to large processor counts. You may resubmit scaling results for PA1 up to Tue Nov 3.
- According to the registrar, the final exam for this class will be held Saturday Dec 12 at 4PM (!!). To avoid this uncivil and inconvenient time, the final will be a take-home exam to be taken over a 24 hour period, probably Dec 10 - 11.
- Midterm exam in class Thu Oct 15 - open book/papers/notes.
Reading Assignments
- (for Tue Nov 10) Read Foster, Chapter 8,
Overview of MPI.
- (for Tue Oct 27) Read Skillicorn et al., Questions and
Answers about BSP.
- (for Thu Oct 8) Read Nyland et al.
Fast N-Body Simulation with CUDA
- (for Tue Oct 6) Read Patterson & Hennesey Ch. 8, sections
8.3 - 8.6 (shared memory, synchronization primitives in shared
memory, and memory consistency models).
- (for Thu Oct 1) Read
Memory consistency models tutorial.
- (For Thu Sep 24)
Read
The Implementation of the Cilk-5 Multithreaded Language,
sections 1 - 4.
- (For Tue Sep 22)
Open MP Tutorial secns 4.8 - 7.
- (For Tue Sep 15) Look through
Open MP Tutorial up through worksharing DO/for (secns 1-4.6).
- (For Thu Sep 10) Read the overview of
Memory Hierarchy in Cache-based Systems.
- (For Tue Sep 8) PRAM Handout, (review section 4.1), section 5.
- (For Thu Sep 3) PRAM Handout, sections 3.4, 3.6.
- (For Tue Sep 1) PRAM handout, sections 3.2, 3.3, 3.5.
- (For Thu Aug 27) Read
PRAM Handout secns 1, 2, 3.1.
Written and Programming Assignments
- (Tue Aug 27) Written Assignment #1 - due Tue Sep 10.
- (Tue Sep 15) Programming Assignment #1 - due Tue Oct 6.
- (Tue Nov 3) Written Assignment #2 - due Tue Nov 17.
On-line Handouts
(some material local-access only)
Software
NOTE: Some of the following material is out of date and will be updated
during the semester
OpenMP ![[NEW]](Images/new.gif)
- OpenMP reference: Specification of
OpenMP binding for C/C++.
- Bass-specific material
-
Getting started on Bass.
- When you login to bass.cs.unc.edu you are connected to the front end. You can compile
programs there. Don't run your program there for more than a few seconds or with more than 4
processors! To get accurate performance information run your programs
on a dedicated node as a batch job with a shell script myjob using
qsub -P comp633 -pe smp 16 myjob
or interactively via
qlogin -P comp633 -pe smp 16
Do not park yourself on this node as everyone else in the class will be held up.
In any case, each login or batch job is terminated after 5 minutes.
- A directory with the sample diffusion program
discussed in class.
- Command lines for the compilation and execution of programs on
bass.unc.edu. To access the SunStudio Ceres compiler (5.10), you
need to make sure that /opt/sunstudioceres/bin is on your path before
/usr/bin (else you will get the gcc compiler).
- C compilation to create a sequential program (compiler ignores OpenMP
directives and does not link with the OpenMP runtime library):
cc -fast -o prog prog.c (SunStudio Ceres compiler 5.10) or
gcc -O3 -o prog prog.c (Gnu C compiler 4.1.2)
- C compilation to create a parallel program (OpenMP directives honored
and program linked with the OpenMP runtime library)
cc -xopenmp=parallel -fast -o prog prog.c (SunStudio Ceres compiler 5.10) or
gcc -fopenmp -O3 -o prog prog.c (Gnu C compiler 4.1)
Cilk
- This Cilk reference manual
refers to a slightly older revision of the Cilk system, but
is accurate with respect to the language definition.
- Currently there is no platform in the department running
Cilk, but implementations exist for shared memory multiprocessor
linux platforms, and can be installed.
Cilk++
CUDA
Java
- Java threads reference material
- Java threads execute in parallel on the following CS machines
- Linux/ia32
- java 1.6.0 (Java 2 runtime) in
/usr/java/bin/ on seca (4 proc), and swan (2 proc * 2-way hyperthreading).
UPC
MPI
- MPI reference material
- Sample command lines for the compilation and execution of C/MPI programs
on linux Cluster Topsail (C++ and Fortran programs can also use MPI).
- To specify compilation and runtime environment:
topsail% module load hpc/mvapich-intel
- To compile and link with the MPI library (be sure to #include
mpi.h in your programs):
topsail% mpicc -o prog prog.c
- To submit job to run using 64 processors:
topsail% bsub -q 128cpu -x -n 64 -o result%J -a mvapich mpirun ./prog
Bibliography
This list will evolve throughout the semester. Specific reading
assignments are listed above.
- PRAM Algorithms, S. Chatterjee, J. Prins,
course notes, 2007.
-
Memory Hierarchy in Cache-Based Systems,
R. v.d. Pas, Sun Microsystems, 2003.
-
OpenMP tutorial, Blaise Barney
- Multithreaded, Parallel and Distributed Programming,
G. Andrews, Addison-Wesley, 2000.
- Computer Architecture: A Quantitative Approach 2nd ed,
D. Patterson, J. Hennessy, Morgan-Kaufmann 1996.
-
Fast N-Body Simulation with CUDA, L. Nyland, M. Harris, J. Prins,
in GPU Gems 3, H Nguyen, ed., Prentice-Hall 2007.
- "Questions and Answers about BSP", D. Skillicorn, J. Hill,
and W. McColl, Scientific Programming 6, 1997.
- Designing and Building Parallel Programs, I. Foster,
Addison-Wesley, 1995.
Online text.
- Introduction to Parallel Computing: Design and Analysis of
Algorithms,
V. Kumar, A. Grama, A. Gupta, G. Karypis, Benjamin-Cummings, 1994.
This page is maintained by
prins@cs.unc.edu.
Send mail if you find problems.