ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2 (1996)

Strawn, Roger C. ; Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-06-28

Description: Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

Keywords: Computer Programming and Software

Type: NASA-CR-201396 , RIACS-TR-96-11 , NAS 1.26:201396

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

2

Unknown

Parallel Load Balancing for Adaptive Unstructured Meshes (1998)

Biswas, Rupak ; Bailey, David

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We describe a novel method to dynamically balance the processor workloads with a global view. Mesh question, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model will also be presented that predicts the remapping cost. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented will demonstrate that this is an effective dynamic load balancing strategy which remains viable on a large number of processors.

Keywords: Computer Programming and Software

Type: NEC Europe Ltd. Conference; May 04, 1998 - May 08, 1998; Sankt Augustin; Germany

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

3

Unknown

NAS Applications and Advanced Architectures (1997)

Bailey, David H. ; Biswas, Rupak ; VanDerWijngaart, Rob

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.

Keywords: Computer Programming and Software

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

4

Unknown

Self-Avoiding Walks over Adaptive Triangular Grids (1998)

Biswas, Rupak ; Heber, Gerd ; Saini, Subhash ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: In this paper, we present a new approach to constructing a "self-avoiding" walk through a triangular mesh. Unlike the popular approach of visiting mesh elements using space-filling curves which is based on a geometric embedding, our approach is combinatorial in the sense that it uses the mesh connectivity only. We present an algorithm for constructing a self-avoiding walk which can be applied to any unstructured triangular mesh. The complexity of the algorithm is O(n x log(n)), where n is the number of triangles in the mesh. We show that for hierarchical adaptive meshes, the algorithm can be easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the run-time partitioning and load balancing of adaptive unstructured grids.

Keywords: Computer Programming and Software

Type: 39th Symposium on Foundations of Computer Science; Nov 08, 1998 - Nov 11, 1998; Palo Alto, CA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

5

Unknown

Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms (2000)

Oliker, Leonid ; Heber, Gerd ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

Keywords: Computer Programming and Software

Type: Parallel and Distributed Computing Systems; Aug 08, 2000 - Aug 10, 2000; Las Vegas, NV; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

6

Unknown

Dynamic Load Balancing for Adaptive Unstructured Grids (1998)

Saini, Subhash ; Biswas, Rupak

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: Dynamic mesh adaptation on unstructured grids is a powerful tool for computing unsteady three-dimensional problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture phenomena of interest, such procedures make standard computational methods more cost effective. Highly refined meshes are required to accurately capture shock waves, contact discontinuities, vortices, and shear layers in fluid flow problems. Adaptive meshes have also proved to be useful in several other areas of computational science and engineering like computer vision and graphics, semiconductor device modeling, and structural mechanics. Local mesh adaptation provides the opportunity to obtain solutions that are comparable to those obtained on globally-refined grids but at a much lower cost. Additional information is contained in the original extended abstract.

Keywords: Computer Programming and Software

Type: MASCOTS 1998; Jul 19, 1998 - Jul 24, 1998; Montreal; Canada

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

7

Unknown

Parallel Computing Strategies for Irregular Algorithms (2002)

Oliker, Leonid ; Biswas, Rupak ; Biegel, Bryan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

Keywords: Computer Programming and Software

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

8

Unknown

Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations (2002)

Li, Xiaoye ; Biswas, Rupak ; Husbands, Parry ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.

Keywords: Computer Programming and Software

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

9

Unknown

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations (2003)

Ethier, Stephane ; Djomehri, Jahed ; Skinner, David ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.

Keywords: Computer Programming and Software

Type: Supercomputing 2003; Nov 15, 2003 - Nov 21, 2003; Phoenix, AZ; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

10

Unknown

Tools and Techniques for Measuring and Improving Grid Performance (2001)

Biegel, Bryan ; Wong, P. ; Smith, W. ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.

Keywords: Computer Programming and Software

Type: APART Workshop; Nov 16, 2001; Denver, CO; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview