In distributed memory machines, large data array are frequently partioned between local memories of processors. Regular mapping like BLOCK, CYCLIC, or BLOCK-CYCLIC distributions are not suitable for many applications which have irregular computation and communication structure, such as computational fluid dynamics, molecular dynamics, unstructured mesh codes, or galaxy simulations.
In irregular problems, patterns of data access cannot be predicted until runtime. This prevents compile time optimization. The following example illustrats code with irregular data and loop distribution.
TEMPLATE T(100) INTEGER MAP(100) REAL X(100), Y(100) ALIGN X(I) WITH T(I) ALIGN Y(I) WITH T(I) DISTRIBUTE IRREGULAR T(MAP) ... some computation which sets the values of the MAP array ... EXECUTE (I) on T(I) DO I = 1, N X(I) = X(I) + X(Y(I)) ENDDO
Once the values MAP(I) are computed, the X and Y arrays are distributed according to the values of MAP: the I-th MAP array gives the number of the processor to which the template element T(I) is assigned. Moreover, which X in X(Y(I)) will be accessed cannot be known until runtime.
In order to reduce the volumn of communication and the number of messages, indirection patterns have to be preprocessed, and the elements to be sent and received by each processor must be precomputed to prefetch off processor data to hide communication latencies. This is called inspector phase. Once inspector is done. the loop is executed using the data structures build by the inspector. This phase is called executor.
Chaos is a run-time library designed to help users efficiently program irregular problems.
In this project, we applied Polaris to accept fortran program with distribution directives and output fortran program which applies Chaos library to deal with irregular program.
input1.f: the input fortran program. output1.f: the output fortran program
input2.f: the input fortran program. output2.f: the output fortran program
smvm.F: complete sparse matrix-vector multiplication from output1.f that has been tested.