Title: Image-Space Decomposition Algorithms for Sort-First Parallel Volume Rendering of Unstructured Grids
Authors: Huseyin Kutluca, Tahsin M. Kurc, and Cevdet Aykanat
Status: Published in Journal of Supercomputing, vol. 15, no. 1, pp. 51-93, 2000.

Abstract:

Twelve adaptive image-space decomposition algorithms are presented for sort-first parallel direct volume rendering (DVR) of unstructured grids on distributed-memory architectures. The algorithms are presented under a novel taxonomy based on the dimension of the screen decomposition, the dimension of the workload arrays used in the decomposition, and the scheme used for workload-array creation and querying the workload of a region. For the 2D decomposition schemes using 2D workload arrays, a novel scheme is proposed to query the exact number of screen-space bounding boxes of the primitives in a screen region in constant time.  A probe-based chains-on-chains partitioning algorithm is exploited for load balancing in optimal 1D decomposition and iterative 2D rectilinear decomposition (RD).  A new probe-based optimal 2D jagged decomposition (OJD) is proposed which is much faster than the dynamic-programming based OJD scheme proposed in the  literature. The summed-area table is successfully exploited to query the workload of a rectangular region in constant time in both OJD and RD schemes for the subdivision of general 2D workload arrays. Two orthogonal recursive bisection (ORB) variants are adapted to relax the straight-line division restriction in conventional ORB through using the medians-of-medians approach on regular mesh and quadtree superimposed on the screen.  Two approaches based on the Hilbert space-filling curve and graph-partitioning are also proposed. An efficient primitive classification scheme is proposed for redistribution in 1D, and 2D rectilinear and jagged decompositions. The performance comparison of the decomposition algorithms is modeled by establishing appropriate quality measures for load-balancing, amount of primitive replication and parallel execution time. The experimental results on a  Parsytec CC system using a set of benchmark volumetric datasets verify the validity of the proposed performance models.  The performance evaluation of the decomposition algorithms is also carried out through the sort-first parallelization of an efficient DVR algorithm.

Full paper