ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A Hierarchical and Distributed Approach for Mapping Large Applications to Heterogeneous Grids using Genetic Algorithms (2003)

Sanyal, Soumya ; Jain, Amit ; Das, Sajal K. ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.

Keywords: Computer Systems

Type: IEEE 5th International Conference on Cluster Computing; Dec 01, 2003 - Dec 04, 2003; Hong Kong; China

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

2

Unknown

Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms (2000)

Oliker, Leonid ; Heber, Gerd ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

Keywords: Computer Programming and Software

Type: Parallel and Distributed Computing Systems; Aug 08, 2000 - Aug 10, 2000; Las Vegas, NV; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

3

Unknown

Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems (2000)

Li, Xiaoye ; Heber, Gerd ; Oliker, Leonid ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The ability of computers to solve hitherto intractable problems and simulate complex processes using mathematical models makes them an indispensable part of modern science and engineering. Computer simulations of large-scale realistic applications usually require solving a set of non-linear partial differential equations (PDES) over a finite region. For example, one thrust area in the DOE Grand Challenge projects is to design future accelerators such as the SpaHation Neutron Source (SNS). Our colleagues at SLAC need to model complex RFQ cavities with large aspect ratios. Unstructured grids are currently used to resolve the small features in a large computational domain; dynamic mesh adaptation will be added in the future for additional efficiency. The PDEs for electromagnetics are discretized by the FEM method, which leads to a generalized eigenvalue problem Kx = AMx, where K and M are the stiffness and mass matrices, and are very sparse. In a typical cavity model, the number of degrees of freedom is about one million. For such large eigenproblems, direct solution techniques quickly reach the memory limits. Instead, the most widely-used methods are Krylov subspace methods, such as Lanczos or Jacobi-Davidson. In all the Krylov-based algorithms, sparse matrix-vector multiplication (SPMV) must be performed repeatedly. Therefore, the efficiency of SPMV usually determines the eigensolver speed. SPMV is also one of the most heavily used kernels in large-scale numerical simulations.

Keywords: Computer Systems

Type: Irregular; May 01, 2000; Cancun; Mexico

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

4

Unknown

Parallel Computing Strategies for Irregular Algorithms (2002)

Oliker, Leonid ; Biswas, Rupak ; Biegel, Bryan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

Keywords: Computer Programming and Software

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

5

Unknown

Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations (2002)

Li, Xiaoye ; Biswas, Rupak ; Husbands, Parry ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.

Keywords: Computer Programming and Software

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

6

Unknown

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations (2003)

Ethier, Stephane ; Djomehri, Jahed ; Skinner, David ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.

Keywords: Computer Programming and Software

Type: Supercomputing 2003; Nov 15, 2003 - Nov 21, 2003; Phoenix, AZ; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

7

Unknown

Tools and Techniques for Measuring and Improving Grid Performance (2001)

Biegel, Bryan ; Wong, P. ; Smith, W. ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.

Keywords: Computer Programming and Software

Type: APART Workshop; Nov 16, 2001; Denver, CO; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

8

Unknown

Load Balancing Strategies for Multi-Block Overset Grid Applications (2002)

Lopez-Benitez, Noe ; Biegel, Bryan ; Biswas, Rupak ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.

Keywords: Computer Programming and Software

Type: ISCA 18th International Conference on Computers and Their Applications; Mar 26, 2003 - Mar 28, 2003; Honolulu, HI; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

9

Unknown

A Performance Evaluation of the Cray X1 for Scientific Applications (2003)

Shan, Hongzhang ; Skinner, David ; Carter, Jonathan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

Keywords: Computer Systems

Type: 6th International Meeting on High Performance Computing for Computational Science; Jun 28, 2004 - Jun 30, 2004; Valencia; Spain

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

10

Unknown

Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures (2003)

Djomehri, M. Jahed ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.

Keywords: Computer Systems

Type: HiPC International Conference on High Performance Computing; Dec 17, 2003 - Dec 20, 2003; Hyderabad; India

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview