ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    facet.materialart.
    Unknown
    In:  Other Sources
    Publication Date: 2019-07-18
    Description: Using a collection of benchmark problems of increasing levels of realism and computational effort, we will characterize the strengths and limitations of the 10,240 processor Columbia system to deliver supercomputing value to application scientists. Scientists need to be able to determine if and how they can utilize Columbia to carry extreme workloads, either in terms of ultra-large applications that cannot be run otherwise (capability), or in terms of very large ensembles of medium-scale applications to populate response matrices (capacity). We select existing application benchmarks that scale from a small number of processors to the entire machine, and that highlight different issues in running supercomputing-calss applicaions, such as the various types of memory access, file I/O, inter- and intra-node communications and parallelization paradigms. http://www.nas.nasa.gov/Software/NPB/
    Keywords: Computer Systems
    Type: Supercomputing 2004; Nov 06, 2004 - Nov 12, 2004; Pittsburgh, PA; United States
    Format: text
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2019-07-10
    Description: We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
    Keywords: Computer Systems
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2019-07-10
    Description: Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
    Keywords: Computer Systems
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2019-07-13
    Description: In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.
    Keywords: Computer Systems
    Type: IEEE 5th International Conference on Cluster Computing; Dec 01, 2003 - Dec 04, 2003; Hong Kong; China
    Format: text
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2019-07-13
    Description: The contents include: 1) A brief history of NPB; 2) What is (not) being measured by NPB; 3) Irregular dynamic applications (UA Benchmark); and 4) Wide area distributed computing (NAS Grid Benchmarks-NGB). This paper is presented in viewgraph form.
    Keywords: Computer Systems
    Type: Workshop on the Performance Characterization of Algorithms; Jul 16, 2001 - Jul 17, 2001; Oakland, CA; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2019-07-13
    Description: The ability of computers to solve hitherto intractable problems and simulate complex processes using mathematical models makes them an indispensable part of modern science and engineering. Computer simulations of large-scale realistic applications usually require solving a set of non-linear partial differential equations (PDES) over a finite region. For example, one thrust area in the DOE Grand Challenge projects is to design future accelerators such as the SpaHation Neutron Source (SNS). Our colleagues at SLAC need to model complex RFQ cavities with large aspect ratios. Unstructured grids are currently used to resolve the small features in a large computational domain; dynamic mesh adaptation will be added in the future for additional efficiency. The PDEs for electromagnetics are discretized by the FEM method, which leads to a generalized eigenvalue problem Kx = AMx, where K and M are the stiffness and mass matrices, and are very sparse. In a typical cavity model, the number of degrees of freedom is about one million. For such large eigenproblems, direct solution techniques quickly reach the memory limits. Instead, the most widely-used methods are Krylov subspace methods, such as Lanczos or Jacobi-Davidson. In all the Krylov-based algorithms, sparse matrix-vector multiplication (SPMV) must be performed repeatedly. Therefore, the efficiency of SPMV usually determines the eigensolver speed. SPMV is also one of the most heavily used kernels in large-scale numerical simulations.
    Keywords: Computer Systems
    Type: Irregular; May 01, 2000; Cancun; Mexico
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2019-07-13
    Description: The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
    Keywords: Computer Systems
    Type: 6th International Meeting on High Performance Computing for Computational Science; Jun 28, 2004 - Jun 30, 2004; Valencia; Spain
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2019-07-13
    Description: This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
    Keywords: Computer Systems
    Type: HiPC International Conference on High Performance Computing; Dec 17, 2003 - Dec 20, 2003; Hyderabad; India
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2019-07-10
    Description: Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.
    Keywords: Computer Systems
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...