Skip to main content
Log in

Optimal Modulo Scheduling Through Enumeration

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Resource-constrained software-pipelining has played an increasingly significant role in exploiting instruction-level parallelism and has drawn intensive academic and industrial interest. The challenge is to find a schedule which is optimal : i.e., given the data dependence graph (DDG) for a loop, find the fastest possible schedule under given resource constraints while keeping register usage minimal. This paper proposes a novel enumeration based modulo scheduling approach to solve this problem. The proposed approach does not require any awkward reworking of constraints into linear form and employs a realistic register model. The set of schedules enumerated also allows us to characterize the schedule space and address questions such as whether schedules using a small number of registers tend to require a large number of function units. The proposed approach has been implemented under the MOST testbed at McGill University. Experimental results on more than 1000 loops from popular benchmark programs show that enumeration is generally faster at obtaining optimal schedules than integer linear programming approaches. Compared to Huff's Slack Scheduling , enumeration found a faster schedule for almost 15% of loops, with a mean improvement of 18%. 10% of the remaining loops required fewer registers under enumeration, with a mean reduction of 16%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. James C. Dehnert and Ross A. Towle, Compiling for Cydra 5. J. of Supercomputing 7:181–227 (May 1993).

    Google Scholar 

  2. Richard A. Huff, Lifetime-sensitive Modulo Scheduling, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, Albuquerque, New Mexico (June 23–25, 1993); SIGPLAN Notices 28(6):258–267 (June 1993).

  3. Monica Lam, Software Pipelining: An Effective Scheduling Technique For VLIW Machines, Proc. SIGPLAN Conf. Progr. Lang. Design and Implementation, Atlanta, Georgia (June 22–24, 1988); SIGPLAN Notices 23(7):318–328 (July 1988).

  4. Qi Ning and Guang R. Gao, A Novel Framework of Register Allocation for Software Pipelining, Conf. Rec. of the 20th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Progr. Lang., Charleston, South Carolina, pp. 29–42 (January 10–13, 1993).

  5. B. R. Rau and C. D. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing, Proc. 14 th Ann. Microprogramming Workshop, Chatham, Massachusetts (October 12–15, 1981); ACM SIGMICRO and IEEE-CS TC-MICRO, pp. 183–198.

  6. B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker, Register Allocation for Software Pipelined Loops, Proc. ACM SIGPLAN Conf. on Progr. Lang. Design and Implementation, San Francisco, California (June 17–19, 1992); SIGPLAN Notices 27(7):283–299 (July 1992).

  7. Roy F. Touzeau, A FORTRAN Compiler for the FPS-164 Scientific Computer, Proc. SIGPLAN Symp. on Compiler Construction, Montréal, Québec (June 17–22, 1984); ACM SIGPLAN. SIGPLAN Notices 19(6):48–57 (June 1984).

  8. Alexander Aiken and Alexandru Nicolau, A Realistic Resource-Constrained Software Pipelining Algorithm. In Alexandru Nicolau, David Gelernter, Thomas Gross, and David Padua, (eds.), Advances in Languages and Compilers for Parallel Processing, Research Monographs in Parallel and Distribution Computing, Chap. 14, pp. 274–290. Pitman Pub. and the MIT Press, London, England, and Cambridge, Massachusetts (1991) Selected papers from the Third Workshop on Languages and Compilers for Parallel Computing, Irvine, California (August 1–3, 1990).

    Google Scholar 

  9. Kemal Ebcioğlu and Toshio Nakatani, A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture. In David Gelernter (ed.), Languages and Compilers for Parallel Computing, MIT Press, Cambridge, Massachusetts, pp. 213–229 (1990).

    Google Scholar 

  10. Soo-Mook Moon and Kemal Ebcioğlu, An Efficient Resource-Constrained Global Scheduling Technique for Superscalar and VLIW Processors, Proc. 25th Ann. Int'l. Symp. on Microarchitecture, Portland, Oregon (December 1–4, 1992); ACM SIGMICRO and IEEE-CS TC-MICRO. SIG MICRO Newsletter 23(1,2):55–71 (December 1992).

  11. Steven R. Vegdahl, A Dynamic-Programming Technique for Compacting Loops, Proc. 25th Ann. Int'l. Symp. on Microarchitecture, Portland, Oregon (December 1–4, 1992); ACM SIGMICRO and IEEE-CS TC-MICRO. SIG MICRO Newsletter 23(1,2):180–188 (December 1992).

  12. John Ruttenberg, Guang R. Gao, Artour Stoutchinin, and W. Lichtenstein, Software Pipelining Showdown: Optimal vs Heuristic Methods in a Production Compiler, Proc. SIGPLAN PLDI, Philadelphia (May 1996).

  13. Erik R. Altman, Optimal Software Pipelining with Function Unit and Register Constraints, Ph.D. thesis, McGill University, Montréal, Québec (October 1995).

    Google Scholar 

  14. Erik R. Altman, R. Govindarajan, and Guang R. Gao, Scheduling and Mapping: Software Pipelining in the Presence of Structural Hazards. ACAPS Tech. Memo 91, Sch. of Comp. Sci., McGill University, Montréal, Québec (January 1995) In ftp://ftp-acaps.cs.mcgill.ca/pub/doc/memos.

    Google Scholar 

  15. Alexandre E. Eichenberger and Edward S. Davidson, Efficient Formulation for Optimal Modulo Schedulers, Proc. SIGPLAN PLDI, Las Vegas, Nevada, pp. 194–205 (June 1997).

  16. P. Feautrier, Fine-grain Scheduling under Resource Constraints, Seventh Ann. Workshop on Lang. and Compilers for Parallel Computing, Ithaca (August 1994).

  17. R. Govindarajan, Erik R. Altman, and Guang R. Gao, Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining, Proc. of the 27th Ann. Int'l. Symp. on Microarchitecture, San Jose, California, pp. 85–94 (November 30–December 2, 1994); ACM SIGMICRO and IEEE-CS TC-MICRO.

  18. J. R. Allen, Ken Kennedy, Carrie Portefield, and Joe Warren, Conversion of Control Dependence to Data Dependence, Conf. Rec. of the Tenth Ann. ACM Symp. Principles of Progr. Lang. Austin, Texas, pp. 177–189 ( January 24–26, 1983).

  19. Nancy J. Warter, Scott A. Mahlke, Wen-mei W. Hwu, and B. Ramakrishna Rau, Reverse If-Conversion, Proc. SIGPLAN PLDI, Albuquerque, New Mexico, pp. 290–299 (June 1993).

  20. Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham, Optimum Modulo Schedules for Minimum Register Requirements, Conf. Proc. Int'l. Conf. on Supercomputing, Barcelona, Spain, pp. 31–40 (July 3–7, 1995); ACM SIGARCH.

  21. James E. Smith and Shlomo Weiss, PowerPC 601 and Alpha 21064: A Tale of Two RISCs, Computer 27(6):46–58 (June 1994).

    Google Scholar 

  22. Peter Song and Marvin Denman, The PowerPC 604 RISC Microprocessor, Motorola, IBM Corporation (1994).

  23. Alexander Aiken and Alexandru Nicolau, Optimal Loop Parallelization, Proc. SIGPLAN Conf. on Progr. Lang. Design and Implementation, Atlanta, Georgia, June 22–24, 19988. SIGPLAN Notices 23(7):308–317 (July 1988).

    Google Scholar 

  24. Alexandre E. Eichenberger, Edward S. Davidson, and Santosh G. Abraham, Minimum Register Requirements for a Modulo Schedule, Proc. of the 27th Ann. Int' l. Symp. on Microarchitecture, San Jose, California, pp. 75–84 (November 30–December 2, 1994); ACM SIGMICRO and IEEE-CS TC-MICRO.

  25. F. Gasperoni and U. Schwiegelshohn, Efficient Algorithms for Cyclic Scheduling, Res. Rep. RC 17068, IBM T. J. Watson Res. Center, Yorktown Heights, New York (1991).

    Google Scholar 

  26. B. Ramakrishna Rau, Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops, Proc. 27th Ann. Int'l. Symp. on Microarchitecture, San Jose, California, pp. 63–74 (November 30–December 2, 1994); ACM SIGMICRO and IEEE-CS TC-MICRO.

  27. V. Van Dongen, G. R. Gao, and Q. Ning, A Polynomial Time Method for Optimal Software Pipelining, Proc. Conf. on Vector and Parallel Processing, CONPAR-92, LNCS Springer-Verlag, Lyon, France 634:613–624 (September 1–4, 1992).

    Google Scholar 

  28. J. Wang and E. Eisenbeis, A New Approach to Software Pipelining of Complicated Loops with Branches. Res. Rep. No., Institut Nat. de Recherche en Informatique et en Automatique (INRIA), Rocquencourt, France ( January 1993).

    Google Scholar 

  29. Nancy J. Warter, Grant E. Haab, John W. Bockhaus, and Krishna Subramanian, Enhanced Modulo Scheduling for Loops with Conditional Brances, Proc. 25th Ann. Int'l. Symp. on Microarchitecture, Portland, Oregon (December 1–4, 1992); ACM SIGMICRO and IEEE-CS TC-MICRO. SIG MICRO Newsletter 23(1,2):170–179 (December 1992).

  30. B. R. Rau and J. A. Fisher, Instruction-level Parallel Processing: History, Overview and Perspective, J. Supercomputing 7:9–50 (May 1993).

    Google Scholar 

  31. Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan, Software Pipelining, ACM Computing Surveys 27(3):367–432 (September 1995).

    Google Scholar 

  32. P. Y. T. Hsu, Highly Concurrent Scalar Processing. Technical Report, University of Illinois at Urbana-Champagne, Urbana, Illinois, Ph.D. thesis (1986).

    Google Scholar 

  33. Alexander Aiken, Alexandru Nicolau, and Steven Novack, Resource-Constrained Software Pipelining, IEEE Trans. Parallel and Distributed Syst. 6(12):1248–1270 (December 1995).

    Google Scholar 

  34. Erik R. Altman and Guang R. Gao, Optimal Software Pipelining Through Enumeration of Schedules, In Luc Bougé, Pierre Fraigniaud, Anne Mignotte, and Yves Robert, (eds.), Proc. Euro-Par '96 Conf. Parallel Processing, Springer-Verlag, Lyon, pp. 833–840 (August 1996).

    Google Scholar 

  35. T. C. Hu, Integer Programming and Network Flows, Addison-Wesley Pub. Co. (1969).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altman, E.R., Gao, G.R. Optimal Modulo Scheduling Through Enumeration. International Journal of Parallel Programming 26, 313–344 (1998). https://doi.org/10.1023/A:1018742213548

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018742213548

Navigation