ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
  • 2
    Publication Date: 1999-01-01
    Description: Modern dialects of Fortran enjoy wide use and good support on high‐performance computers as performance‐oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data‐parallel computation is unpredictable and often poor using current compilers, we investigatethreadingandflattening, two source‐to‐source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data‐parallel implementations of the sparse matrix‐vector product and the Barnes–Hut n‐body algorithm by hand‐coding thread‐based (using OpenMP directives) and flattening‐based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX‐4, two shared‐memory machines.
    Print ISSN: 1058-9244
    Electronic ISSN: 1875-919X
    Topics: Computer Science , Media Resources and Communication Sciences, Journalism
    Published by Hindawi
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 1999-01-01
    Description: This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of object‐oriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a high‐performance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an object‐oriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that high‐performance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.
    Print ISSN: 1058-9244
    Electronic ISSN: 1875-919X
    Topics: Computer Science , Media Resources and Communication Sciences, Journalism
    Published by Hindawi
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2019-06-28
    Description: Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: an alignment that maps all the objects to an abstract template, followed by a distribution that maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs, and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.
    Keywords: Computer Programming and Software
    Type: NASA-CR-202184 , NAS 1.26:202184 , RIACS-96.14 , Journal of Parallel and Distributed Computing
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2019-07-27
    Description: We investigate the problem of evaluating FORTRAN 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression. The choice of where to perform the operation then affects this cost. We present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. We also consider expressions containing operations that change the shape of the arrays, and show that our approach extends naturally to handle this case.
    Keywords: COMPUTER PROGRAMMING AND SOFTWARE
    Type: RIACS-TR-92-17 , ; 38 p.
    Format: text
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2019-07-13
    Description: Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignment-distribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.
    Keywords: COMPUTER PROGRAMMING AND SOFTWARE
    Type: NASA-CR-194609 , RIACS-TR-93.06 , NAS 1.26:194609 , Annual Languages and Compilers for Parallelism Workshop; Aug 12, 1993 - Aug 14, 1993; Portland, OR; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2019-07-13
    Description: Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution, and a computation involving the regular section A, the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.
    Keywords: COMPUTER SYSTEMS
    Type: NASA-CR-194605 , RIACS-TR-93.03 , NAS 1.26:194605 , ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming; May 01, 1993; San Diego, CA; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2019-07-13
    Description: Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignment distribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.
    Keywords: NUMERICAL ANALYSIS
    Type: NASA-CR-194292 , NAS 1.26:194292 , RIACS-TR-93-06 , Aug 12, 1993 - Aug 14, 1993; Portland, OR; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2019-07-13
    Description: When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
    Keywords: COMPUTER SYSTEMS
    Type: NASA-CR-194294 , NAS 1.26:194294 , RIACS-TR-93-08 , 1993 Proceedings of Supercomputing; Nov 15, 1993 - Nov 19, 1993; Portland, OR; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2019-07-13
    Description: Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give a heuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM-5.
    Keywords: COMPUTER SYSTEMS
    Type: NASA-CR-196377 , NAS 1.26:196377 , RIACS-TP-94-10 , Frontiers 1995; Feb 06, 1995 - Feb 09, 1995; McLean, VA; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...