$ Revised: Tue Dec 3 2013 by email@example.com
This is an introductory graduate course covering several aspects of
parallel and high-performance computing.
Upon completion, you should
- be able to design and analyze parallel algorithms for a variety of
problems and computational models,
- be familiar with the hardware and software organization of
high-performance parallel computing systems, and
- have experience with the implementation of parallel applications
on high-performance computing systems, and be able to measure,
tune, and report on their performance.
Additional information including the course syllabus can be found in the
All parallel programming models discussed in this class are supported on
which will be available for use in this class.
- This course will use Piazza to manage class questions and discussions online.
(some material local-access only)
- (for Thu Nov 21) Read Kumar et al.,
Basic Communication Operations.
- (for Thu Nov 10) Read Foster, Chapter 8,
Overview of MPI.
- (for Tue Nov 5) Read Skillicorn et al.,
Questions and Answers about BSP.
- Read Nyland et. al,
Fast N-Body Simulation with CUDA.
Check supplementary materials for Cuda in the Software section below.
- (for Thu Oct 10) Read
Hennessy & Patterson Ch. 8, sections 8.5 - 8.6
(synchronization primitives in shared memory, and
memory consistency models).
- (for Tue Oct 8) Read
Memory consistency models tutorial (sections 1-6, pp 1 -17).
- (For Thu Sep 26)
The Implementation of the Cilk-5 Multithreaded Language,
sections 1 - 3.
- (For Thu Sep 19) Look through
Open MP Tutorial sections 7-9. Details specific
to OpenMP support on Bass (using gcc 4.4.7) can
be found here .
- (For Tue Sep 17) Look through
Open MP Tutorial sections 1-6. Most examples
are in Fortran, but don't require any knowledge of Fortran
beyond the obvious.
Ignore WORKSHARE and TASK directives, and discussion of
nested parallel constructs.
- (for Thu Sep 12) Read the overview of
Memory Hierarchy in Cache-based Systems (pg 1-9).
- (For Thu Sep 5) PRAM Handout, (review section 4.1), section 5.
- (For Tue Sep 3) PRAM Handout, sections 3.4, 3.6.
- (For Thu Aug 29) PRAM handout, sections 3.2, 3.3, 3.5.
- (For Tue Aug 27) Read PRAM Handout secns 1, 2, 3.1 (pp 1 - 8)
- (For Thu Aug 22) Look over the course overview.
Written and Programming Assignments
- (Aug 27) Written assignment WA1 is available. Due date is Sep 10.
- (Sep 10) Programming assignment PA1a is available. Due date is extended to Thu Sep 26.
- (Sep 24) Programming assignment PA1b is available. Due date is Tue Oct 8.
- Submission instructions:
Submit performance graph on paper and from the bass login node copy the files
needed to build your code into
where <yourlogin> should be replaced with your CS login.
If you have trouble writing to the AFS file system, you can instead upload individual files
or directories using the command
scp -pr <localfile-or-dir> <yourlogin>@classroom.cs.unc.edu:/afs/unc/project/courses/comp633/Submit/<yourlogin>/pa1b
In this case you have to supply your cs password to a prompt.
- (Oct 3) Written Assignment WA2 is available. Due date is Tue Oct 15.
Sample solutions are available for WA2.
- (Oct 31) Programming Assignment PA2 is available.
Project selection by Nov. 12, due date is Tue Dec. 3, submissions acccepted through
Friday Dec. 6.
- Project selection: Please create a text file named project.txt in your
pa2 submission directory
- your project choice (if you discussed a non-standard project with me write a keyword or two that will help me recall it)
- your partner in the project (if a team project).
- CUDA workstations You can develop and test CUDA programs on Bass, but not all
the Nvidia profiling tools will work with the GPUs. There are a few workstations with
more modern GPUs in GLAB (second floor) owned by the Manocha/Lin and Frahm groups.
To inquire about availability for brief use please contact
Abhinav Golas (firstname.lastname@example.org) or Jared Heinly (email@example.com).
- Project submission: Please provide a short description of the accomplishments
in the project you selected,
focusing on some aspect of performance analysis of your algorithm and implementation.
Upload the description (preferably in pdf form, but Word or ascii is ok too)
and your code to your pa2 submission directory (See above).
- (Nov 7) Written Assignment WA3 is available. Due date is Tue Nov. 26.
We will be using the Bass system for
programming assignments. The Bass system supports all the programming models studied in
The general instructions for
getting started on bass
are supplemented below with specific instructions for each programming model.
When you login to bass.cs.unc.edu you are connected to a specific node on bass dedicated
to interactive program development. You can compile programs on this node.
Shared-memory programs run within an individual node on bass. Distributed-memory programs run
across multiple nodes in Bass. The login node should not be used to run your programs,
although a short debug test for a few seconds and no more than 4 cores should be OK.
In general programs that need multiple nodes or dedicated nodes or GPUs should be
submitted to queues that are managed by the Grid Engine job scheduler.
- OpenMP reference: Specification of
OpenMP 3.0 API for C/C++.
You may be more interested in
OpenMP support in gcc 4.4.7 (the compiler on Bass).
- Bass-specific material
Getting started on Bass.
- To get accurate performance information run your programs
on a dedicated node as a batch job with a shell script myjob using
qsub -pe smp 16 myjob
or interactively via
qlogin -pe smp 16
Do not park yourself on this node as everyone else in the class will be held up.
- A directory with the sample diffusion program
discussed in class.
- Command lines for the compilation and execution of programs on
- C compilation to create a sequential program (compiler ignores OpenMP
directives and does not link with the OpenMP runtime library):
gcc -O3 -o prog prog.c (Gnu C compiler 4.4.7)
- C compilation to create a parallel program (OpenMP 3.0 directives honored
and program linked with the OpenMP runtime library)
gcc -fopenmp -O3 -o prog prog.c (Gnu C compiler 4.4.7)
Cilk and Cilk++
- This Cilk reference manual
refers to a slightly older revision of the Cilk system, but
is accurate with respect to the language definition.
- Cilk runs on Bass, but there is no public installation.
We recommend you install and use Cilk++ instead of Cilk.
- Cilk++ can be downloaded from
Intel. For Bass select the 64-bit linux version. You can install it
in your home directory. You can run it via qlogin or batch submission just
like OpenMP programs.
- Cuda 4.2 on Bass
- Cuda 5.0 on killdevil and cuda 5.5 on other GPU-equipped machines
- Java threads reference material
- Java threads execute in parallel on any Bass node.
- MPI reference material
- Running MPI programs on Bass
This list will evolve throughout the semester. Specific reading
assignments are listed above.
- PRAM Algorithms, S. Chatterjee, J. Prins,
course notes, 2013.
Memory Hierarchy in Cache-Based Systems,
R. v.d. Pas, Sun Microsystems, 2003.
OpenMP tutorial, Blaise Barney, LLNL, 2013.
The Implementation of the Cilk-5 Multithreaded Language,
M. Frigo, C. Leiserson, K. Randall, in
Proceedings of ACM Conf. on Programming Language Design and
- Shared Memory Consistency Models: A Tutorial,
S. V. Adve, K. Gharachorloo, DEC Western Research Labs Report 95/7, 1995.
- Computer Architecture: A Quantitative Approach 2nd ed,
D. Patterson, J. Hennessy, Morgan-Kaufmann 1996.
- "Questions and Answers about BSP", D. Skillicorn, J. Hill,
and W. McColl, Scientific Programming 6, 1997.
- Designing and Building Parallel Programs, I. Foster,
- Introduction to Parallel Computing: Design and Analysis of
V. Kumar, A. Grama, A. Gupta, G. Karypis, Benjamin-Cummings, 1994.
This page is maintained by
Send mail if you find problems.