Hardware-Aware Analysis and Optimization of Stable Fluids

Theodore Kim
IBM TJ Watson Research Center


We perform a detailed flop and bandwidth analysis of Jos Stamís Stable Fluids algorithm on the CPU, GPU, and Cell. In all three cases, we find that the algorithm is bandwidth bound, with the cores sitting idle up to 96% of the time. Knowing this, we propose two modifications to accelerate the algorithm. First, a Mehrstellen discretization for the pressure solver which reduces the running time of the solver by a third. Second, a static caching scheme that eliminates roughly 99% of the random lookups in the advection stage. We observe a 2x speedup in the advection stage using this scheme. Both modifications apply equally well to all three architectures.

Full Paper:       Appearing in ACM Symposium on Interactive 3D Graphics and Games 2008 [0.5 MB PDF]

Movie:       [1.3 MB MOV] Standard and Mehrstellen Jacobi solvers