COMP 633 Parallel Computing (Fall 2021)

$ Revised: Nov 30 2021 by prins@cs.unc.edu

Fall 2021 (Thu Aug 19 - Tue Nov 30)
TTh 3:30 - 4:45 PM SN011

Instructor: Jan Prins
FB 334, Tel: 919-590-6213
email: prins@cs.unc.edu
Office hours: MW 2-4 PM

TA: Misha Shvets, SN257
email: mshvets@cs.unc.edu
Office hours: TBD

Overview and Syllabus

This is an introductory graduate course on parallel computing. Upon completion, you should

be able to design and analyze parallel algorithms for a variety of problems and computational models,
be familiar with the hardware and software organization of high-performance parallel computing systems, and
have experience with the implementation of parallel applications on high-performance computing systems, and be able to measure, tune, and report on their performance.

Questions, answers, and discussions outside of lectures
- We will use Piazza for asynchronous discussions outside of class. The service is purchased, so you don't need to make a contribution.
- I have uploaded your email of record with the registrar to login. If you prefer to use another login, I believe you can add another login for our discussion group using this link: http://piazza.com/unc/fall2021/comp633. Be sure to sign up using a unc email.
- Please access the COMP 633 discussion group using this link: http://piazza.com/unc/fall2021/comp633/home
We will use the shared memory multiprocessor phaedra.cs.unc.edu in this class
- login with your onyen. You will have a home directory on phaedra.

(some material local-access only)

Lecture Slides

(for Tue Nov 11) Look over Kumar et al., Basic Communication Operations.
(For Tue Nov 9) Skim MPI tutorial by Blaise Barney, LLNL.
(For Tue Oct 26) Skim the Questions and Answers about BSP pp 1-25. We will not use BSPLib directly, rather we use the BSP model together with communication operations from the MPI library.
(For Oct 14/19) Look over the Cuda programming guide (9.2) and the Cuda best practices guide (9.2).
(Oct 12) All classes canceled.
(For Oct 7) in-class midterm exam
(For Oct 5) Midterm Review Topics
(For Thu Sep 30) Read Nyland et. al, Fast N-Body Simulation with CUDA.
(For Tue Sep 28) Hennessy & Patterson Ch. 8, sections 8.5 - 8.6 on Synchronization
(For Thu Sep 23) Look through Memory consistency models tutorial (sections 1-6, pp 1 -17).
(For Tue Sep 21) Look through section 8 of the OpenMP Tutorial
(For Tue Sep 14) Look through OpenMP Tutorial sections 3-5 and section 6 only up to the first exercise. Most examples are shown in C/C++ and in Fortran, so read examples using whichever language you prefer.
(For Tue Sep 7) Read the overview of Memory Hierarchy in Cache-based Systems
(For Thu Sep 2) Read PRAM Handout secn 5 (pp 20 - 23)
(For Tue Aug 31) Read PRAM Handout secns 3.6, 4.1 (pp 15 - 19)
(For Thu Aug 26) Read PRAM Handout secns 3.2, 3.3, 3.5 (pp 9 - 15)
(For Tue Aug 24) Read PRAM Handout secns 1, 2, 3.1 (pp 1 - 8)

Written Assignments
- (Assigned Sep 2, due Sep 16)
  WA1: PRAM and sample solutions
- (Assigned Nov 1, due Nov 23)
  WA2: BSP
Programming Assignments
- (Assigned Sep 21, due Sep 28)
  PA1(a): Sequential implementation of gravitational n-body simulation
- (Assigned Sep 28, due Oct 14)
  PA1(b): Parallel implementation of gravitational n-body simulation
- (assigned Nov 1, due Nov 30)
  PA2: K-means or alternative project

phaedra is an Intel Xeon E5-2650v4 compute server dedicated to this class. It has 20 cores and an attached Nvidia Titan V100 accelerator. OpenMP, Cilk, and Cuda programming models are supported. Login with your onyen (instead of a CS login).
longleaf is a research computing cluster with ~350 Intel Xeon E5-2643 nodes providing 24 cores per node. OpenMP and Cilk programming models are supported on individual nodes. Compute jobs are submitted using slurm.
dogwood is a research computing cluster with ~240 Intel Xeon E5-2699A nodes providing 44 cores per node. The MPI programming model is supported to coordinate and communicate among nodes. A subset of the nodes have Intel Xeon Phi (KNL) accelerators. The individual nodes support MPI, OpenMP, OpenACC, and Intel offload (Intel Xeon Phi) programming models. Compute jobs are submitted using slurm.

All students in COMP 633 can login on phaedra.cs.unc.edu using their onyen.

GNu compiler (gcc/g++ 11.2.0)
- supports OpenMP 5.1
- to use gcc/g++ on phaedra make sure you have /usr/local/gcc/ on your path.
Intel C/C++ compiler (icc/icpc 2020) [Note: not available yet, use gcc for the time being]
- supports OpenMP 4.5 with tasking and accelerator offload.
- On phaedra, source /opt/intel/bin/compilervars.sh intel64 (bash) or source /opt/intel/bin/compilervars.csh intel64 (csh) to access the Intel compilers and tools.
- On research computing clusters use "module add icc" to access Intel compilers.

Shared memory parallel programming. Specification of the OpenMP 4.5 API for C/C++ For a more accessible introduction see the tutorial for OpenMP 3.1 in the Bibliography below.

Nvidia GPUs: programmed using Cuda C (Compute Capability 9.2 for V100 on phaedra).
- CUDA C Programming Guide (v9.2 Aug 2018)
- Be sure to include /usr/local/cuda-9.2/bin in your path on phaedra to access the nvcc compiler and other Nvidia tools.

MPI reference material
- Comprehensive site organizing all reference and tutorial information on MPI.
- A short overview of MPI.
MPI programs can be submitted to dogwood

This list may evolve throughout the semester. Specific reading assignments are listed above.

PRAM Algorithms, S. Chatterjee, J. Prins, COMP 633 course notes, 2020.
Memory Hierarchy in Cache-Based Systems, R. v.d. Pas, Sun Microsystems, 2003.
OpenMP tutorial, Blaise Barney, LLNL.
Cilk Tutorial, Michael Graf, Andrei Papancea, David Bunde Knox Univ.
Shared Memory Consistency Models: A Tutorial, S. V. Adve, K. Gharachorloo, DEC Western Research Labs Report 95/7, 1995.
Computer Architecture: A Quantitative Approach 2nd ed, D. Patterson, J. Hennessy, Morgan-Kaufmann 1996.
Fast N-Body Simulation with CUDA, L. Nyland, M. Harris, J. Prins, GPUGems 3, 2008.
Questions and Answers about BSP, D. Skillicorn, J. Hill, and W. McColl, Scientific Programming 6, 1997.
Message Passing Interface, Blaise Barney, LLNL 2015
Introduction to Parallel Computing: Design and Analysis of Algorithms - Chapter 3, V. Kumar, A. Grama, A. Gupta, G. Karypis, Benjamin-Cummings, 1994.

This page is maintained by prins@cs.unc.edu. Send mail if you find problems.