Skip to main content
Log in

Reliable broadcasts and communication models: tradeoffs and lower bounds

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Reliable Broadcast is a mechanism by which a processor in a distributed system disseminates a value to all other processors in the presence of both communication and processor failures. Protocols to achieve Reliable Broadcast are at the heart of most fault-tolerant applications. We characterize the execution time of Reliable Broadcast protocols as a function of the communication model. This model includes familiar communication structures such as fully-connected point-to-point graphs, linear chains, rings, broadcast networks (such as Ethernet) and buses. We derive a parameterized protocol that implements Reliable Broadcast for any member within this class. We obtain lower bound results that show the optimality of our protocols. The lower bound results identify a time complexity gap between systems where processors may only fail to send messages, and systems where processors may fail both to send and to receive messages. The tradeoffs that our results reveal between performance, resiliency and network cost offer many new alternatives previously not considered in designing fault-tolerant systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babaoglu Ö (1987) On the reliability of consensus-based fault-tolerant distributed computing systems. ACM Trans Comput Syst 5(4)

  2. Babaoglu Ö, Drummond R (1985) Streets of Byzantium: network architectures for fast reliable broadcast. IEEE Trans Software Eng SE-11(6):546–554

    Google Scholar 

  3. Babaoglu Ö, Drummond R (1987) (Almost) no cost clock synchronization. Proc 17th Symp Fault Tolerant Comput, Pittsburgh, Pennsylvania (July 1987) pp 42–47

  4. Broder A, Dolev D, Fischer M, Simons B (1984) Efficient fault tolerant routings in networks. Proc 16th ACM Symp Theory Comput, pp 536–541

  5. Birman KP, Joseph T (1987) Reliable communication in the presence of failures. ACM Trans Comput Syst 5(1):47–76

    Google Scholar 

  6. Bracha G (1985) AnO(lgn) expected rounds randomized Byzantine Generals protocol. Proc 17th ACM Symp Theory Comput, Providence, Rhode Island (May 1985) pp 316–326

  7. Chang JM, Maxemchuck NF (1984) Reliable broadcast protocols. ACM Trans Comput Syst 2(3):251–273

    Google Scholar 

  8. Cristian F, Aghili H, Strong R, Dolev D (1985) Atomic broadcasts: From simple message diffusion to Byzantine Agreement. Proc 15th Symp Fault Tolerant Comput, Ann Arbor, Michigan (June 1985) pp 200–206

  9. Dolev D, Reischuck R, Strong HR (1982) ‘Eventual’ is earlier than ‘immediate’. Proc 23rd Symp Foundat Comput Sci, Chicago, Illinois (November 1982) pp 196–203

  10. Dolev D, Strong HR (193) Authenticated algorithms for Byzantine Agreement. SIAM J Comput 12(4):656–666

    Google Scholar 

  11. Fischer M (1983) The consensus problem in unreliable distributed systems (A Brief Survey). Tech Rep YALEU/DCS/RR-273, Dept Comput Sci, Yale University, New Haven, Connecticut (June 1983)

    Google Scholar 

  12. Fischer M, Lynch N (1982) A lower bound for the time to assure interactive consistency. Inf Proc Lett 14(4): 183–186

    Google Scholar 

  13. Garcia-Molina H, Pittelli F, Davidson S (1984) Applications of Byzantine Agreement in database systems. Tech Rep TR 316, Princeton University, Princeton, New Jersey (June 1984)

    Google Scholar 

  14. Hadzilacos V (1984) Issues of fault tolerance in concurrent computations. Ph.D Thesis, Tech Rep TR-11-84, Aiken Computation Laboratory, harvard University, Cambridge, Mass (June 1984)

    Google Scholar 

  15. Halpern JY, Simons B, Strong HR, Dolev D (1984) Fault-tolerant clock synchronization. Proc 3rd ACM Symp Principles Distributed Comput, Vancouver, B.C., Canada (August 1984) pp 89–102

  16. Kim W (1984) Highly available systems for database applications. ACM Comput Surv 16(1):71–98

    Google Scholar 

  17. Lamport L (1984) Using time instead of timeout for fault-tolerant distributed systems. ACM Trans Prog Lang Syst 6(2):254–280

    Google Scholar 

  18. Lamport L, Melliar-Smith PM (1984) Byzantine clock synchronization. Proc 3rd ACM Symp Principles Distributed Comput, Vancouver, B.C., Canada (August 1984) pp 68–74

  19. Lamport L, Shostak R, Pease M (1982) The Byzantine Generals problem. ACM Trans Prog Lang Syst 4(3):382–401

    Google Scholar 

  20. Metcalfe R, Boggs DR (1976) Ethernet: Distributed packet switching for local computer networks. Commun ACM 19(7):396–403

    Google Scholar 

  21. Moses Y, Tuttle M (1986) Programming simultaneous actions using common knowledge (Algorithmica, to appear). Preliminary version available in Proc 27th Ann IEEE Symp Foundat Comput Sci (October 1986) pp 208–221

  22. Rabin M (1983) Randomized Byzantine generals. Proc 24th Symp Foundat Comput Sci, Tucson, Arizona (November 1983) pp 403–409

  23. Strong HR, Dolev D (1983) Byzantine agreement. Digest of Papers, Spring Compcon 83, San Francisco, California (March 1983) pp 77–81

  24. Schneider FB, Lamport L (1982) Paradigms for distributed programs. In: Paul M, Siegert HJ (eds) Distributed Systems: Methods and Tools for Specification. Lect Notes Comput Sci, vol 190

  25. Spector AZ (1984) Computer software for process control. Scientific American 251(3):174–187

    Google Scholar 

  26. Srikanth TK, Toueg S (1985) Optimal clock synchronization. Proc 4th Symp Principles Distributed Comput, Minaki, Canada (August 1985) pp 71–86

  27. Stallings W (1984) Local networks. ACM Comput Surv 16(1):3–41

    Google Scholar 

  28. Svoboda L (1984) Resilient distributed computing. IEEE Trans Software Eng SE-10(3):257–268

    Google Scholar 

  29. Tanenbaum A (1981) Computer Networks. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Özalp Babaoglu is Associate Professor in the Department of Computer Science at Cornell University, Ithaca, New York. His research interests include distributed systems, fault tolerance, performance evaluation and modeling. He received a BS in electrical engineering from George Washington University, Washington, D.C. in 1976. From the University of California, Berkeley, he received a MS in 1977 and a PhD in 1981, both in computer science. While at Berkeley, he designed and implemented the virtual memory extensions to VAX Unix that came to be known as 3. Obsd.

Pat Stephenson is a Doctoral Candidate in the Computer Science Department at Cornell University, Ithaca, New York. His research interests include distributed systems and fault tolerance. In 1983, he received a B.A. (Mod.) in computer science from Trinity College, Dublin, Ireland. He received his MS in computer science from Cornell in 1986. He is currently working on new tradeoffs in the design of fault-tolerant algorithms.

Rogério Drummond is Assistant Professor in the Computer Science Department at the Universidade Estadual de Campinas (UNICAMP), São Paulo, Brazil. His interests include distributed computing, fault tolerance and operating systems. He received a BS and a MS in computer science from the Universidade Estadual de Campinas in 1978 and 1980, respectively. In 1986 he received a PhD in computer science from Cornell University. He is currently working on integrated environments for the development of software and hardware.

Partial support for this work was provided by the National Science Foundation under Grants DCR-86-01864 and MCS-82-10356 and AT&T under a Foundation Grant

Supported partially by the Defense Advanced Research Projects Agency (DoD) under ARPA order 5378, Contract MDA 903-85-C-0124, and partially by an IBM graduate fellowship. The views, opinions and findings contained in this report are solely those of the authors and should not be construed as an official Department of Defense position, policy, or decision

Supported partially by the CAPES and CNPq agencies of the Ministry of Education of Brazil

Rights and permissions

Reprints and permissions

About this article

Cite this article

Babaoglu, Ö., Stephenson, P. & Drummond, R. Reliable broadcasts and communication models: tradeoffs and lower bounds. Distrib Comput 2, 177–189 (1988). https://doi.org/10.1007/BF01872844

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01872844

Key words

Navigation