ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Minimizing Cache Misses Using Minimum-Surface Bodies (2002)

Biegel, Bryan ; Frumkin, Michael ; VanderWijngaart, Rob

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2013-08-29

Description: A number of known techniques for improving cache performance in scientific computations involve the reordering of the iteration space. Some of these reorderings can be considered as coverings of the iteration space with the sets having good surface-to-volume ratio. Use of such sets reduces the number of cache misses in computations of local operators having the iteration space as a domain. First, we derive lower bounds which any algorithm must suffer while computing a local operator on a grid. Then we explore coverings of iteration spaces represented by structured and unstructured grids which allow us to approach these lower bounds. For structured grids we introduce a covering by successive minima tiles of the interference lattice of the grid. We show that the covering has low surface-to-volume ratio and present a computer experiment showing actual reduction of the cache misses achieved by using these tiles. For planar unstructured grids we show existence of a covering which reduces the number of cache misses to the level of structured grids. On the other hand, we present a triangulation of a 3-dimensional cube such that any local operator on the corresponding grid has significantly larger number of cache misses than a similar operator on a structured grid.

Keywords: Computer Programming and Software

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

2

Unknown

Algorithms for parallel flow solvers on message passing architectures (1995)

Vanderwijngaart, Rob F.

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-06-28

Description: The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.

Keywords: FLUID MECHANICS AND HEAT TRANSFER

Type: NASA-CR-197758 , NAS 1.26:197758 , MCAT-95-15

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

3

Unknown

Multi-partitioning for ADI-schemes on message passing architectures (1994)

Vanderwijngaart, Rob F.

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-06-28

Description: A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation.

Keywords: COMPUTER PROGRAMMING AND SOFTWARE

Type: NASA-CR-196434 , NAS 1.26:196434 , MCAT-94-06

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

4

Unknown

Charon Message-Passing Toolkit for Scientific Computations (1998)

Saini, Subhash ; VanderWijngaart, Rob F.

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: The Charon toolkit for piecemeal development of high-efficiency parallel programs for scientific computing is described. The portable toolkit, callable from C and Fortran, provides flexible domain decompositions and high-level distributed constructs for easy translation of serial legacy code or design to distributed environments. Gradual tuning can subsequently be applied to obtain high performance, possibly by using explicit message passing. Charon also features general structured communications that support stencil-based computations with complex recurrences. Through the separation of partitioning and distribution, the toolkit can also be used for blocking of uni-processor code, and for debugging of parallel algorithms on serial machines. An elaborate review of recent parallelization aids is presented to highlight the need for a toolkit like Charon. Some performance results of parallelizing the NAS Parallel Benchmark SP program using Charon are given, showing good scalability.

Keywords: Computer Systems

Type: National Energy Research Scientific Computing Center Meeting; Aug 26, 1998; Berkeley, CA; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

5

Unknown

Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design (1997)

VanderWijngaart, Rob F. ; Kutler, Paul

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: Charon is a software toolkit that enables engineers to develop high-performing message-passing programs in a convenient and piecemeal fashion. Emphasis is on rapid program development and prototyping. In this report a detailed description of the functional design of the toolkit is presented. It is illustrated by the stepwise parallelization of two representative code examples.

Keywords: Computer Programming and Software

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

6

Unknown

Communication Improvement for the LU NAS Parallel Benchmark: A Model for Efficient Parallel Relaxation Schemes (1997)

Yarrow, Maurice ; VanderWijngaart, Rob ; Kutler, Paul

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: The first release of the MPI version of the LU NAS Parallel Benchmark (NPB2.0) performed poorly compared to its companion NPB2.0 codes. The later LU release (NPB2.1 & 2.2) runs up to two and a half times faster, thanks to a revised point access scheme and related communications scheme. The new scheme sends substantially fewer messages. is cache "friendly", and has a better load balance. We detail the, observations and modifications that resulted in this efficiency improvement, and show that the poor behavior of the original code resulted from deriving a message passing scheme from an algorithm originally devised for a vector architecture.

Keywords: Computer Systems

Type: NAS-97-032

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

7

Unknown

Predicting Cost/Performance Trade-offs For Whitney: A Commodity Computing Cluster (1998)

Tweten, Dave ; Nitzberg, Bill ; VanDerWijngaart, Rob F. ; [et al.]

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: Recent advances in low-end processor and network technology have made it possible to build a "supercomputer" out of commodity components. We develop simple models of the NAS Parallel Benchmarks version 2 (NPB 2) to explore the cost/performance trade-offs involved in building a balanced parallel computer supporting a scientific workload. By measuring single processor benchmark performance, network latency, and network bandwidth, and using closed form expressions detailing the number and size of messages sent by each benchmark, our models predict benchmark performance to within 30%. A comparison based on total system cost reveals that current commodity technology (200 MHz Pentium Pros with 100baseT Ethernet) is well balanced for the NPBs up to a total system cost of around $ 1,000,000.

Keywords: Computer Operations and Hardware

Type: HICSS-31; Jan 09, 1998; Kohala Coast, HI; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

8

Unknown

RANS-MP: A Portable Parallel Navier-Stokes Solver (1996)

Tu, Eugene ; VanderWijngaart, Rob F.

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: RANS-MP, a new implementation of a single-grid Navier-Stokes solver using the diagonalized Beam-Warming approximate-factorization scheme, is presented. This first release of the completely rewritten solver employs the following optimizations: (1) Bi-directional multi-partition method for the ADI solver part; this improves granularity and load balance; (2) Improved cache usage through elimination of non-unit-stride array access (possible in part due to multi-partitioning); (3) Preprocessing of communicating boundary conditions to streamline logic during time stepping; (4) Truly parallel, high-performance I/O using the newly-developed MPI-IO library; (5) Elimination of large amounts of redundant operations through efficient use of workspace. Results of some realistic wing computations on the IBM SP2 computer will be presented. We will demonstrate that excellent absolute performance and scalability are obtained with RANS-MP, even for relatively small grid sizes. Besides high performance, an outstanding feature of RANS-MP is its true portability, due to the use of the portable message passing and I/O libraries MPI and MPI-IO.

Keywords: Fluid Mechanics and Thermodynamics

Type: Third Annual Computational Aerosciences Workshop; Aug 13, 1996 - Aug 15, 1996; Moffett Field, CA; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

9

Unknown

Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes (2004)

VanderWijngaart, Rob ; Biswas, Rupak ; Mavriplis, Catherine ; [et al.]

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.

Keywords: Fluid Mechanics and Thermodynamics

Type: International Conference on Spectral and High Order Methods; Jun 21, 2004 - Jun 25, 2004; RI; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

10

Unknown

Efficacy of Code Optimization on Cache-Based Processors (1997)

VanderWijngaart, Rob F. ; Saphir, William C. ; Chancellor, Marisa K.

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: In this paper a number of techniques for improving the cache performance of a representative piece of numerical software is presented. Target machines are popular processors from several vendors: MIPS R5000 (SGI Indy), MIPS R8000 (SGI PowerChallenge), MIPS R10000 (SGI Origin), DEC Alpha EV4 + EV5 (Cray T3D & T3E), IBM RS6000 (SP Wide-node), Intel PentiumPro (Ames' Whitney), Sun UltraSparc (NERSC's NOW). The optimizations all attempt to increase the locality of memory accesses. But they meet with rather varied and often counterintuitive success on the different computing platforms. We conclude that it may be genuinely impossible to obtain portable performance on the current generation of cache-based machines. At the least, it appears that the performance of modern commodity processors cannot be described with parameters defining the cache alone.

Keywords: Computer Programming and Software

Type: SC97: High Performance Networking and Computing; Nov 15, 1997 - Nov 21, 1997; San Jose, CA; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X