ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

21

Unknown

HARP: A Dynamic Inertial Spectral Partitioner (1997)

Simon, Horst D. ; Sohn, Andrew ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications.

Keywords: Computer Systems

Type: NASA-CR-204489 , NAS 1.26:204489 , RIACS-TR-97-01 , Parallel Algorithms and Architectures; Jun 22, 1997 - Jun 25, 1997; Newport, RI; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

22

Unknown

Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems (1996)

Oliker, Leonid ; Sohn, Andrew ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.

Keywords: Computer Systems

Type: NASA-CR-203532 , NAS 1.26:203532 , NAS-96-013 , Supercomputing 1996; Nov 17, 1996 - Nov 22, 1996; Pittsburgh, PA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

23

Unknown

Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2 (1996)

Strawn, Roger C. ; Biswas, Rupak ; Oliker, Leonid

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

Keywords: Computer Systems

Type: NASA-TM-112033 , NAS 1.15:112033 , NAS-96-011 , International Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR''96); Aug 19, 1996 - Aug 21, 1996; Santa Barbara, CA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

24

Unknown

Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors (1996)

Simon, Horst D. ; Biswas, Rupak ; Sohn, Andrew

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.

Keywords: Computer Systems

Type: NASA-TM-112034 , NAS 1.15:112034 , NAS-96-012 , IEEE Symposium on Parallel and Distributed Processing (SPDP''96); Oct 23, 1996 - Oct 26, 1996; New Orleans, LA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

25

Unknown

Parallel Performance Characterization of Columbia (2004)

Biswas, Rupak

In: Other Sources

add to mindlist on the mindlist

Details

Publication Date: 2019-07-18

Description: Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/

Keywords: Computer Systems

Type: Supercomputing 2004; Nov 06, 2004 - Nov 12, 2004; Pittsburgh, PA; United States

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

26

Unknown

Impact of the Columbia Supercomputer on NASA Space and Exploration Mission (2006)

Kwak, Dochan ; Kiris, Cetin ; Biswas, Rupak ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: NASA's 10,240-processor Columbia supercomputer gained worldwide recognition in 2004 for increasing the space agency's computing capability ten-fold, and enabling U.S. scientists and engineers to perform significant, breakthrough simulations. Columbia has amply demonstrated its capability to accelerate NASA's key missions, including space operations, exploration systems, science, and aeronautics. Columbia is part of an integrated high-end computing (HEC) environment comprised of massive storage and archive systems, high-speed networking, high-fidelity modeling and simulation tools, application performance optimization, and advanced data analysis and visualization. In this paper, we illustrate the impact Columbia is having on NASA's numerous space and exploration applications, such as the development of the Crew Exploration and Launch Vehicles (CEV/CLV), effects of long-duration human presence in space, and damage assessment and repair recommendations for remaining shuttle flights. We conclude by discussing HEC challenges that must be overcome to solve space-related science problems in the future.

Keywords: Computer Systems

Type: Second International Conference on Space Mission Challenges for Information Technology 2006; Jul 17, 2006 - Jul 21, 2006; Pasadena, CA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

27

Unknown

An Application-Based Performance Characterization of the Columbia Supercluster (2005)

Kiris, Cetin ; Saini, Subhash ; Jin, Hoaqiang ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.

Keywords: Computer Systems

Type: Supercomputing Conference 2005; Nov 12, 2005 - Nov 18, 2005; Seattle, WA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

28

Unknown

NASA Advanced Computing Environment for Science and Engineering (2014)

Biswas, Rupak ; Mehrotra, Piyush

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-11-16

Description: High-fidelity modeling, simulation, and analysis, enabled by supercomputing, are becoming increasingly important to NASAs broad spectrum of missions. This paper describes NASAs advanced supercomputing environment at Ames Research Center that is geared toward solving the space agencys most challenging science and engineering problems.

Keywords: Computer Systems

Type: ARC-E-DAA-TN15006 , International Conference on Parallel Computational Fluid Dynamics Parallel; May 20, 2014 - May 22, 2014; Trondheim; Norway

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview