AbstractMiroproessor speed has been growing exponentially faster than memory speed in the reent past. In this paper, we onsider the long term impliations of a ontinuation of this trend. We de ne a program property known as salable loality, whih measures our ability to apply ever faster uniproessors to inreasingly large problems (just as salable parallelism measures our ability to apply more numerous proessors to larger problems). We then show that exploiting salable loality in sienti programs often requires advaned ompile-time reordering of loop iterations. We provide an algorithm that derives an exeution order, storage mapping, and ahe requirement for any desired degree of loality, for ertain programs that an be made to exhibit salable loality. Our approah is unusual in that it derives the program transformation and ahe requirement from the dataow of the alulation (a fundamental harateristi of the algorithm), instead of searhing a spae of possible transformations of the exeution order and array layout used by the programmer (whih are artifats of the expression of the algorithm). We inlude empirial results showing the e etiveness of our transformation on two small kernels and a non-trivial benhmark program. Our transformation an produe speedups for data sets residing in L2 ahe, main memory, or virtual memory. This report uni es the ma jor results of DCS-TR-379 and DCS-TR-378, presents them in terms of a general algorithm rather than a olletion of heuristis, and gives a learer presentation of the empirial data.
RightsThis Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).