ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes (1999)

Biswas, Rupak ; Gao, Guang R. ; Heber, Gerd

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.

Keywords: Computer Systems

Type: IPPS''99; Apr 12, 1999 - Apr 16, 1999; San Juan; Puerto Rico

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

2

Unknown

Communication Studies of DMP and SMP Machines (1997)

Biswas, Rupak ; Chancellor, Marisa K. ; Sohn, Andrew

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.

Keywords: Computer Systems

Type: NAS-96-005

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

3

Unknown

Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations (1997)

Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Mesh adaption is a powerful tool for efficient unstructured- grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.

Keywords: Computer Systems

Type: NASA-CR-204487 , NAS 1.26:204487 , RIACS-TR-97-03 , Parallel Algorithms and Architectures; Jun 22, 1997 - Jun 25, 1997; Newport, RI; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

4

Unknown

Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA (1999)

Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

Keywords: Computer Systems

Type: Supercomputing; Nov 13, 1999 - Nov 19, 1999; Portland, OR; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

5

Unknown

Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor (1996)

Sohn, Andrew ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

Keywords: Computer Systems

Type: NASA-CR-200964 , NAS 1.26:200964 , RIACS-TR-96-07 , 10th ACM International Conference on Supercomputing; May 25, 1996 - May 28, 1996; Philadelphia, PA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

6

Unknown

Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems (1996)

Sohn, Andrew ; Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.

Keywords: Computer Systems

Type: NASA-CR-202186 , NAS 1.26: 202186 , RIACS-TR-96-16 , Supercomputing 1996; Nov 17, 1996 - Nov 22, 1996; Pittsburgh, PA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

7

Unknown

HARP: A Dynamic Inertial Spectral Partitioner (1997)

Simon, Horst D. ; Sohn, Andrew ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications.

Keywords: Computer Systems

Type: NASA-CR-204489 , NAS 1.26:204489 , RIACS-TR-97-01 , Parallel Algorithms and Architectures; Jun 22, 1997 - Jun 25, 1997; Newport, RI; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

8

Unknown

Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems (1996)

Oliker, Leonid ; Sohn, Andrew ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.

Keywords: Computer Systems

Type: NASA-CR-203532 , NAS 1.26:203532 , NAS-96-013 , Supercomputing 1996; Nov 17, 1996 - Nov 22, 1996; Pittsburgh, PA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

9

Unknown

Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2 (1996)

Strawn, Roger C. ; Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

Keywords: Computer Systems

Type: NASA-TM-112033 , NAS 1.15:112033 , NAS-96-011 , International Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR''96); Aug 19, 1996 - Aug 21, 1996; Santa Barbara, CA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

10

Unknown

Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors (1996)

Simon, Horst D. ; Biswas, Rupak ; Sohn, Andrew

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.

Keywords: Computer Systems

Type: NASA-TM-112034 , NAS 1.15:112034 , NAS-96-012 , IEEE Symposium on Parallel and Distributed Processing (SPDP''96); Oct 23, 1996 - Oct 26, 1996; New Orleans, LA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview