Parallel Solutions to Irregular Problems using HPF |
|
Lars Nyland, Siddhartha Chatterjee, Jan Prins February 24, 1997 HPF User's Group Meeting |
The Source of Irregular Problems... |
| Goal: Parallelize code to get high performance!
Start with scalar functions
First step: vectorize functions
|
One More Step Leads to Nested Data Parallelism |
||
Next step: Add another dimension to vector solution to
create nested data parallelism
Examples
Nesting depth potentially unlimited |
||
|
|
|
|
Nested Sequences Regular or not? |
|
Some are regular
Some in between
Some are not
|
![]() |
![]() |
|
![]() |
|
Support for Nested Data Structures in HPF |
Multidimensional arrays (from Fortran90,95)
Segmented vectors (in HPF library)
|
represented as
|
An Example: Electrostatic Interaction with Cutoff |
|
|
![]() |
Ideal Implementation |
pairlist = (/ (i, j) where distance(i,j) <= cutoff, i = 1, n, j = i+1, n /) forall (i,j) in pairlist F(i) += force(i,j) F(j) -= force(i,j) function force(i,j) = k * q(i) * q(j) * (X(i) - X(j)) / norm2(X(i)-X(j))3< |
A Simple, Serial Implementation |
do i = 1,n do j = i+1,n r = dist(i,j) if (r <= cutoff) then k = G * mass(i) * mass(j) / r ** 3 xx = x(i) - x(j) Fxx = k * xx Fx(i) = Fx(i) + Fxx Fx(j) = Fx(j) - Fxx ! do same for Fy and Fz ... endif enddo enddo |
HPF Solution, using Array Operators |
Perform masked array operations
forall (i = 1:n) forall (j = i+1:n, dist(i,j) <= cutoff) mask(i, j) = .true. r(i,j) = dist(i,j) k(i,j) = G * mass(i) * mass(j) / r(i,j) ** 3 xx(i,j) = x(i) - x(j) Fxx(i,j) = k(i,j) * xx(i,j) ! do same for y and z ... end forall end forall Fx = sum(Fxx, dim=2, mask=mask) - sum(Fxx, dim=1, mask=mask) ... |
Or delete one 'sum' operation, replaced by two more assignments
forall (i = 1:n) forall (j = i+1:n, dist(i,j) <= cutoff) mask(i, j) = .true. mask(j,i) = .true. r(i,j) = dist(i,j) k(i,j) = G * mass(i) * mass(j) / r(i,j) ** 3 xx(i,j) = x(i) - x(j) Fxx(i,j) = k(i,j) * xx(i,j) Fxx(j,i) = -Fxx(i,j) ! do same for y and z ... end forall end forall Fx = sum(Fxx, dim=2, mask=mask) ... |
HPF Solution, Using Segmented VectorsPart I |
|
Calculate compressed pairlist of indices
|
mask = .false. forall (i = 1:n) forall (j in i+1:n, dist(i,j) <= cutoff) f_seg(i, j) = i f_ind(i, j) = j mask(i, j) = .true. end forall end forall nz = count(mask) allocate(idx1(nz)) ! and allocate idx2, F, k idx1 = pack(f_seg, mask) idx2 = pack(f_ind, mask) |
HPF Solution, Using Segmented VectorsPart II |
|
| Now perform a series of general indexing operations on dense vectors |
! Array ops for x,y,z force computation forall (i = 1:nz) k(i) = G*mass(idx1(i))*mass(idx2(i))/(dist(idx1(i),idx2(i))**3) end forall F = k * (x(idx1) - x(idx2)) allocate(zero(n)) zero = 0. Fx = sum_scatter(F, zero, idx1) - sum_scatter(F, zero, idx2) ! Do same array operations for Y and Z forces ... |
Data Mapping Strategy |
Do-loop Version
Array Version
Dense Vector Version
|
Performance Results |
Experiments on SP-2 at Cornell
Compiled successfully, but...
Improvements
Profiling tools helpful Pack slow in pghpf Difficult to explain distribution and alignment costs |
Updated Performance Results |
Experiments on Origin2000 at NCSA
Algorithms
Difficult to explain lack of performance improvement, despite multiple attempts at ALLOCATE |
Comments on Implementations |
| Segmented Vector is mostly O(n) dense vector operations
Didn't use segmented operations
Implementation of F90 and HPF library routines
Sensitivity to coding details
Change forall (..., dist(i,j) <= cutoff) to forall (...) d(i,j) = dist(i,j); forall (..., d(i,j) <= cutoff) |
General Parallelization Technique |
| Any pure scalar function can be 'parallelized' for nested
data parallel implementation
Also pure vector functions
Write scalar code For first level of parallelism
For a second level of parallelism
|
HPF Support for Specific Operations |
HPF supports vector extension of pointwise operations
Expansion operations
Conditionals
Reduction
Prefix, Suffix, Scatter
|
Automatic Methods |
Compilers can generate flat and nested data parallel versions of functions
automatically
Relies upon:
|
Conclusions |
| HPF has enough power to adequately express solutions for irregular
problems
Parallelization involves condensing irregular 2D data into segmented vectors Technique can be implemented
Good parallel implementations of HPF library functions are critical |
What I like about HPF |
| First high-level language (with user group > epsilon) that supports irregular
computations
Supports nested data structures through multidimensional arrays, nestable using pointers Supports nested parallel iterative constructs
Exactly what is required for nested data parallelism (multi-level vectorization of scalar functions) |
What I'd like in HPF |
Support for 0-length segments in segmented arrays
Segmented-reduce intrinsics
Indexing ragged arrays
Multi-level nested ragged arrays?
|
|
|