Abstract
Reliable Broadcast is a mechanism by which a processor in a distributed system disseminates a value to all other processors in the presence of both communication and processor failures. Protocols to achieve Reliable Broadcast are at the heart of most fault-tolerant applications. We characterize the execution time of Reliable Broadcast protocols as a function of the communication model. This model includes familiar communication structures such as fully-connected point-to-point graphs, linear chains, rings, broadcast networks (such as Ethernet) and buses. We derive a parameterized protocol that implements Reliable Broadcast for any member within this class. We obtain lower bound results that show the optimality of our protocols. The lower bound results identify a time complexity gap between systems where processors may only fail to send messages, and systems where processors may fail both to send and to receive messages. The tradeoffs that our results reveal between performance, resiliency and network cost offer many new alternatives previously not considered in designing fault-tolerant systems.
Similar content being viewed by others
References
Babaoglu Ö (1987) On the reliability of consensus-based fault-tolerant distributed computing systems. ACM Trans Comput Syst 5(4)
Babaoglu Ö, Drummond R (1985) Streets of Byzantium: network architectures for fast reliable broadcast. IEEE Trans Software Eng SE-11(6):546–554
Babaoglu Ö, Drummond R (1987) (Almost) no cost clock synchronization. Proc 17th Symp Fault Tolerant Comput, Pittsburgh, Pennsylvania (July 1987) pp 42–47
Broder A, Dolev D, Fischer M, Simons B (1984) Efficient fault tolerant routings in networks. Proc 16th ACM Symp Theory Comput, pp 536–541
Birman KP, Joseph T (1987) Reliable communication in the presence of failures. ACM Trans Comput Syst 5(1):47–76
Bracha G (1985) AnO(lgn) expected rounds randomized Byzantine Generals protocol. Proc 17th ACM Symp Theory Comput, Providence, Rhode Island (May 1985) pp 316–326
Chang JM, Maxemchuck NF (1984) Reliable broadcast protocols. ACM Trans Comput Syst 2(3):251–273
Cristian F, Aghili H, Strong R, Dolev D (1985) Atomic broadcasts: From simple message diffusion to Byzantine Agreement. Proc 15th Symp Fault Tolerant Comput, Ann Arbor, Michigan (June 1985) pp 200–206
Dolev D, Reischuck R, Strong HR (1982) ‘Eventual’ is earlier than ‘immediate’. Proc 23rd Symp Foundat Comput Sci, Chicago, Illinois (November 1982) pp 196–203
Dolev D, Strong HR (193) Authenticated algorithms for Byzantine Agreement. SIAM J Comput 12(4):656–666
Fischer M (1983) The consensus problem in unreliable distributed systems (A Brief Survey). Tech Rep YALEU/DCS/RR-273, Dept Comput Sci, Yale University, New Haven, Connecticut (June 1983)
Fischer M, Lynch N (1982) A lower bound for the time to assure interactive consistency. Inf Proc Lett 14(4): 183–186
Garcia-Molina H, Pittelli F, Davidson S (1984) Applications of Byzantine Agreement in database systems. Tech Rep TR 316, Princeton University, Princeton, New Jersey (June 1984)
Hadzilacos V (1984) Issues of fault tolerance in concurrent computations. Ph.D Thesis, Tech Rep TR-11-84, Aiken Computation Laboratory, harvard University, Cambridge, Mass (June 1984)
Halpern JY, Simons B, Strong HR, Dolev D (1984) Fault-tolerant clock synchronization. Proc 3rd ACM Symp Principles Distributed Comput, Vancouver, B.C., Canada (August 1984) pp 89–102
Kim W (1984) Highly available systems for database applications. ACM Comput Surv 16(1):71–98
Lamport L (1984) Using time instead of timeout for fault-tolerant distributed systems. ACM Trans Prog Lang Syst 6(2):254–280
Lamport L, Melliar-Smith PM (1984) Byzantine clock synchronization. Proc 3rd ACM Symp Principles Distributed Comput, Vancouver, B.C., Canada (August 1984) pp 68–74
Lamport L, Shostak R, Pease M (1982) The Byzantine Generals problem. ACM Trans Prog Lang Syst 4(3):382–401
Metcalfe R, Boggs DR (1976) Ethernet: Distributed packet switching for local computer networks. Commun ACM 19(7):396–403
Moses Y, Tuttle M (1986) Programming simultaneous actions using common knowledge (Algorithmica, to appear). Preliminary version available in Proc 27th Ann IEEE Symp Foundat Comput Sci (October 1986) pp 208–221
Rabin M (1983) Randomized Byzantine generals. Proc 24th Symp Foundat Comput Sci, Tucson, Arizona (November 1983) pp 403–409
Strong HR, Dolev D (1983) Byzantine agreement. Digest of Papers, Spring Compcon 83, San Francisco, California (March 1983) pp 77–81
Schneider FB, Lamport L (1982) Paradigms for distributed programs. In: Paul M, Siegert HJ (eds) Distributed Systems: Methods and Tools for Specification. Lect Notes Comput Sci, vol 190
Spector AZ (1984) Computer software for process control. Scientific American 251(3):174–187
Srikanth TK, Toueg S (1985) Optimal clock synchronization. Proc 4th Symp Principles Distributed Comput, Minaki, Canada (August 1985) pp 71–86
Stallings W (1984) Local networks. ACM Comput Surv 16(1):3–41
Svoboda L (1984) Resilient distributed computing. IEEE Trans Software Eng SE-10(3):257–268
Tanenbaum A (1981) Computer Networks. Prentice Hall, Englewood Cliffs, NJ
Author information
Authors and Affiliations
Additional information
Özalp Babaoglu is Associate Professor in the Department of Computer Science at Cornell University, Ithaca, New York. His research interests include distributed systems, fault tolerance, performance evaluation and modeling. He received a BS in electrical engineering from George Washington University, Washington, D.C. in 1976. From the University of California, Berkeley, he received a MS in 1977 and a PhD in 1981, both in computer science. While at Berkeley, he designed and implemented the virtual memory extensions to VAX Unix that came to be known as 3. Obsd.
Pat Stephenson is a Doctoral Candidate in the Computer Science Department at Cornell University, Ithaca, New York. His research interests include distributed systems and fault tolerance. In 1983, he received a B.A. (Mod.) in computer science from Trinity College, Dublin, Ireland. He received his MS in computer science from Cornell in 1986. He is currently working on new tradeoffs in the design of fault-tolerant algorithms.
Rogério Drummond is Assistant Professor in the Computer Science Department at the Universidade Estadual de Campinas (UNICAMP), São Paulo, Brazil. His interests include distributed computing, fault tolerance and operating systems. He received a BS and a MS in computer science from the Universidade Estadual de Campinas in 1978 and 1980, respectively. In 1986 he received a PhD in computer science from Cornell University. He is currently working on integrated environments for the development of software and hardware.
Partial support for this work was provided by the National Science Foundation under Grants DCR-86-01864 and MCS-82-10356 and AT&T under a Foundation Grant
Supported partially by the Defense Advanced Research Projects Agency (DoD) under ARPA order 5378, Contract MDA 903-85-C-0124, and partially by an IBM graduate fellowship. The views, opinions and findings contained in this report are solely those of the authors and should not be construed as an official Department of Defense position, policy, or decision
Supported partially by the CAPES and CNPq agencies of the Ministry of Education of Brazil
Rights and permissions
About this article
Cite this article
Babaoglu, Ö., Stephenson, P. & Drummond, R. Reliable broadcasts and communication models: tradeoffs and lower bounds. Distrib Comput 2, 177–189 (1988). https://doi.org/10.1007/BF01872844
Issue Date:
DOI: https://doi.org/10.1007/BF01872844