ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A Hierarchical and Distributed Approach for Mapping Large Applications to Heterogeneous Grids using Genetic Algorithms (2003)

Sanyal, Soumya ; Jain, Amit ; Das, Sajal K. ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.

Keywords: Computer Systems

Type: IEEE 5th International Conference on Cluster Computing; Dec 01, 2003 - Dec 04, 2003; Hong Kong; China

Format: text

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

2

Unknown

Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems (2000)

Li, Xiaoye ; Heber, Gerd ; Oliker, Leonid ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The ability of computers to solve hitherto intractable problems and simulate complex processes using mathematical models makes them an indispensable part of modern science and engineering. Computer simulations of large-scale realistic applications usually require solving a set of non-linear partial differential equations (PDES) over a finite region. For example, one thrust area in the DOE Grand Challenge projects is to design future accelerators such as the SpaHation Neutron Source (SNS). Our colleagues at SLAC need to model complex RFQ cavities with large aspect ratios. Unstructured grids are currently used to resolve the small features in a large computational domain; dynamic mesh adaptation will be added in the future for additional efficiency. The PDEs for electromagnetics are discretized by the FEM method, which leads to a generalized eigenvalue problem Kx = AMx, where K and M are the stiffness and mass matrices, and are very sparse. In a typical cavity model, the number of degrees of freedom is about one million. For such large eigenproblems, direct solution techniques quickly reach the memory limits. Instead, the most widely-used methods are Krylov subspace methods, such as Lanczos or Jacobi-Davidson. In all the Krylov-based algorithms, sparse matrix-vector multiplication (SPMV) must be performed repeatedly. Therefore, the efficiency of SPMV usually determines the eigensolver speed. SPMV is also one of the most heavily used kernels in large-scale numerical simulations.

Keywords: Computer Systems

Type: Irregular; May 01, 2000; Cancun; Mexico

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

3

Unknown

I/O Performance Characterization of Lustre and NASA Applications on Pleiades (2012)

Chang, Johnny ; Mehrotra, Piyush ; Saini, Subhash ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: In this paper we study the performance of the Lustre file system using five scientific and engineering applications representative of NASA workload on large-scale supercomputing systems such as NASA s Pleiades. In order to facilitate the collection of Lustre performance metrics, we have developed a software tool that exports a wide variety of client and server-side metrics using SGI's Performance Co-Pilot (PCP), and generates a human readable report on key metrics at the end of a batch job. These performance metrics are (a) amount of data read and written, (b) number of files opened and closed, and (c) remote procedure call (RPC) size distribution (4 KB to 1024 KB, in powers of 2) for I/O operations. RPC size distribution measures the efficiency of the Lustre client and can pinpoint problems such as small write sizes, disk fragmentation, etc. These extracted statistics are useful in determining the I/O pattern of the application and can assist in identifying possible improvements for users applications. Information on the number of file operations enables a scientist to optimize the I/O performance of their applications. Amount of I/O data helps users choose the optimal stripe size and stripe count to enhance I/O performance. In this paper, we demonstrate the usefulness of this tool on Pleiades for five production quality NASA scientific and engineering applications. We compare the latency of read and write operations under Lustre to that with NFS by tracing system calls and signals. We also investigate the read and write policies and study the effect of page cache size on I/O operations. We examine the performance impact of Lustre stripe size and stripe count along with performance evaluation of file per process and single shared file accessed by all the processes for NASA workload using parameterized IOR benchmark.

Keywords: Computer Systems

Type: ARC-E-DAA-TN6025 , HiPC 2012; Dec 18, 2012 - Dec 21, 2012; Pune; India

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

4

Unknown

An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform (2012)

Biswas, Rupak ; Mehrotra, Piyush ; Chang, Johnny ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.

Keywords: Computer Systems

Type: ARC-E-DAA-TN5169 , 14th IEEE International Conferenc eon HPCC-2012; Jun 25, 2012; Liverpool; United Kingdom

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

5

Unknown

A Look at the Impact of High-End Computing Technologies on NASA Missions (2012)

Rogers, Stuart ; Hardman, John ; Bailey, F. Ron ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-12

Description: From its bold start nearly 30 years ago and continuing today, the NASA Advanced Supercomputing (NAS) facility at Ames Research Center has enabled remarkable breakthroughs in the space agency s science and engineering missions. Throughout this time, NAS experts have influenced the state-of-the-art in high-performance computing (HPC) and related technologies such as scientific visualization, system benchmarking, batch scheduling, and grid environments. We highlight the pioneering achievements and innovations originating from and made possible by NAS resources and know-how, from early supercomputing environment design and software development, to long-term simulation and analyses critical to design safe Space Shuttle operations and associated spinoff technologies, to the highly successful Kepler Mission s discovery of new planets now capturing the world s imagination.

Keywords: Computer Systems

Type: ARC-E-DAA-TN4714

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

6

Unknown

A Performance Evaluation of the Cray X1 for Scientific Applications (2003)

Shan, Hongzhang ; Skinner, David ; Carter, Jonathan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

Keywords: Computer Systems

Type: 6th International Meeting on High Performance Computing for Computational Science; Jun 28, 2004 - Jun 30, 2004; Valencia; Spain

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

7

Unknown

Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures (2003)

Djomehri, M. Jahed ; Biswas, Rupak

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.

Keywords: Computer Systems

Type: HiPC International Conference on High Performance Computing; Dec 17, 2003 - Dec 20, 2003; Hyderabad; India

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

8

Unknown

Beyond the NAS Parallel Benchmarks: Measuring Dynamic Program Performance and Grid Computing Applications (2001)

Biegel, Bryan ; VanderWijngaart, Rob F. ; Feng, Huiyu ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-13

Description: The contents include: 1) A brief history of NPB; 2) What is (not) being measured by NPB; 3) Irregular dynamic applications (UA Benchmark); and 4) Wide area distributed computing (NAS Grid Benchmarks-NGB). This paper is presented in viewgraph form.

Keywords: Computer Systems

Type: Workshop on the Performance Characterization of Algorithms; Jul 16, 2001 - Jul 17, 2001; Oakland, CA; United States

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

9

Unknown

Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses (2001)

Biswas, Rupak ; Feng, Hui-Yu ; Biegel, Bryan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.

Keywords: Computer Systems

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview

10

Unknown

Message Passing and Shared Address Space Parallelism on an SMP Cluster (2002)

Singh, Jaswinder P. ; Biswas, Rupak ; Biegel, Bryan ; [et al.]

In: CASI

add to mindlist on the mindlist

Details

Publication Date: 2019-07-10

Description: Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.

Keywords: Computer Systems

Format: application/pdf

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

NASA TECHNICAL REPORTS

S·F·X

Overview