Skip to main content
Log in

General methods of sequence comparison

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Mathematical methods for comparison of nucleic acid sequences are reviewed. There are two major methods of sequence comparison: dynamic programming and a method referred to here as the regions method. The problem types discussed are comparison of two sequences, location of long matching segments, efficient database searches and comparison of several sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Aho, V. A., J. E. Hopcroft and J. D. Ullman. 1974.The Design and Analysis of Computer Algorithms. Addison-Wesley, Menlo Park, California.

    Google Scholar 

  • Arratia, R. A. and M. S. Waterman. 1984. “An Erdos-Renyi Law with Shifts.”Adv. appl. Math. (in press).

  • Arratia, R. A., L. Gordon and M. S. Waterman. 1984. “An Extreme Value Distribution for Sequence Matching.” Manuscript.

  • Beyer, W. A., C. Burks and W. B. Goad. 1983. “Quantitative Comparison of DNA Sequences.”Los Alamos Sci. 9, 62–63.

    Google Scholar 

  • —, M. L. Stein, T. F. Smith and S. M. Ulam. 1974. “A Molecular-sequence Metric and Evolutionary Trees.”Math Biosci. 19, 9–25.

    Article  MATH  Google Scholar 

  • Boswell, D. R. and A. D. MacLachlan. 1984. “Sequence Comparison by Exponentially-damped Alignment.”Nucleic Acids Res.12, 457–464.

    Google Scholar 

  • Byers, T. H. and M. S. Waterman. 1984. “Determining All Optimal and Near-optimal Solutions When Solving Shortest Path Problems by Dynamic Programming.”Operat. Res. (in press).

  • Cohen, D. N., T. A. Reichert and A. K. C. Wong. 1975. “Matching Code Sequences Utilizing Context Free Quality Measures.”Math. Biosci. 24, 25–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Collins, J. F. and A. F. W. Coulson. 1984. “Applications of Parallel Processing Algrithms for DNA Sequence Analysis.”Nucleic Acids Res.12, 181–192.

    Google Scholar 

  • Delcoigne, A. and P. Hansen. 1975. “Sequence Comparison by Dynamic Programming.”Biometrika 62, 661–664.

    Article  MATH  Google Scholar 

  • Doolittle, R. F., M. W. Hunkapiller, L. E. Hood, S. G. Devare, K. C. Robbins, S. A. Aaronson and H. M. Antoniades. 1983. “Simian Sarcoma Viruses Onc Gene v-sis is Derived from the Gene (or Genes) Encoding a Platelet-derived Growth Factor.”Science 221, 275–276.

    Google Scholar 

  • Dumas, J. P. and J. Ninio. 1982. “Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences.”Nucleic Acids Res.80, 197–206.

    Google Scholar 

  • Dumey, A. I. 1956. “Indexing for Rapid Random-access Memory.”Comput. Automat. 5, 6–8.

    Google Scholar 

  • Erickson, B. W. and P. H. Sellers. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Ed. D. Sankoff and J. B. Kruskal, pp. 55–90, Addison-Wesley, London.

    Google Scholar 

  • Fickett, J. W. 1984. “Fast Optimal Alignment.”Nucleic Acids Res.12, 175–180.

    Google Scholar 

  • Fitch, W. M. 1969. “Locating Gaps in Amino Acid Sequences to Optimize the Homology Between Two Proteins.”Biochem. Genet. 3, 99.

    Article  Google Scholar 

  • —. 1971. “Towards Defining the Course of Evolution: Miniumum Change for a Specific Tree Topology.”Syst. Zool. 20, 406–416.

    Article  Google Scholar 

  • — and E. Margoliash. 1967. “Construction of Polygenetic Trees.”Science 155, 279–284.

    Google Scholar 

  • — and T. F. Smith. 1983. “Optimal Sequence Alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1386.

    Article  Google Scholar 

  • Gibbs, A. J. and G. A. McIntyre. 1970. “The Diagram, a Method for Comparing Sequences.”Euro. J. Biochem. 16, 1–11.

    Article  Google Scholar 

  • Goad, W. B., M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetries.”Nucleic Acids Res.10, 247–263.

    Google Scholar 

  • Gordon, A. D. 1973. “A Sequence-comparison Statistic and Algorithm.”Biometrika 60, 197–200.

    Article  MATH  MathSciNet  Google Scholar 

  • Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. Mol. Biol. 162, 705–708.

    Article  Google Scholar 

  • Harr, R., P. Hagblom and P. Gustafsson. 1982. “Two-dimensional Graphic Analysis of DNA Sequence Homologies.”Nucleic Acids Res.10, 365–374.

    Google Scholar 

  • Jagadeeswaran, P. and P. M. McGuire. 1982. “Interactive Computer Programs in Sequence Data Analysis.”Nucleic Acids Res.10, 433–447.

    Google Scholar 

  • Kanehisa, M. I. and W. B. Goad. 1982. “Pattern Recognition in Nucleic Acid Sequences II. An Efficient Method for Finding Locally Stable Secondary Structures.”Nucleic Acids Res.10, 265–277.

    Google Scholar 

  • Karlin, S., G. Ghandour and D. E. Foulser. 1984. “Comparative Analysis of Human and Bovine Papallimaviruses.”Mol. Biol. Evol. 1, 357–370.

    Google Scholar 

  • ——, F. Ost, S. Tavare and L. J. Korn. 1983. “New Approaches for Computer Analysis of Nucleic Acid Sequences.”Proc. natn. Acad. Sci U.S.A. 80, 5660–5664.

    Article  MATH  Google Scholar 

  • Korn, L. J., C. L. Queen and M. N. Wegman. 1977. “Computer Analysis of Nucleic Acid Regulatory Sequences.”Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405.

    Article  Google Scholar 

  • Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. B. Kruskal, pp. 1–40. Addison-Wesley, London.

    Google Scholar 

  • — and D. Sankoff. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. B. Kruskal, pp. 265–310. Addison-Wesley, London.

    Google Scholar 

  • Laquer, T. H. 1981. “Asymptotic Limits for a Two-dimensional Recursion.”Stud. appl. Math. 64, 271–277.

    MATH  MathSciNet  Google Scholar 

  • Levenshtein, V. I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.”Cybernet. Control Theor. 10, 707–710;Dolklady Akademii nauk SSSR 163, 845–848.

    MathSciNet  Google Scholar 

  • Maizel, J. and R. Lenk. 1981. “Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences.”Proc. natn. Acad. Sci. U.S.A. 78, 7665–7669.

    Article  MathSciNet  Google Scholar 

  • Martinez, H. M. 1980. “A New Algorithm for Calculating RNA Secondary Structure.” Manuscript.

  • — 1983. “An Efficient Method for Finding Repeats in Molecular Sequences.”Nucleic Acids Res.11, 4629–4634.

    Google Scholar 

  • Naharro, G., K. C. Robbins and E. P. Reddy. 1984. “Gene Product of v-fgr Onc: Hybrid Protein Containing a Portion of Actin and Tyrosin-Specific Protein Kinase.”Science 223, 63–66.

    Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. Mol. Biol. 48, 444–453.

    Article  Google Scholar 

  • Novotny, J. 1982. “Matrix Program to Analyze Primary Structure Homology.”Nucleic Acids Res.10, 127–131.

    Google Scholar 

  • Queen, C. M., N. Wegman and L. J. Korn. 1982. “Improvements to a Program for DNA Analysis: a Procedure to find Homologies Among Many Sequences.”Nucleic Acids Res.10, 449–456.

    Google Scholar 

  • Reichert, T. A., D. N. Cohen and A. K. C. Wong. 1973. “An Application of Informaton Theory to Genetic Mutations and Matching of Polypeptide Sequences.”J. Theor. Biol. 42, 245–261.

    Article  Google Scholar 

  • Sankoff, D. 1972. “Matching Sequences Under Deletion-Insertion Constraints.”Proc. natn. Acad. Sci. U.S.A. 68, 4–6.

    Article  MathSciNet  Google Scholar 

  • — 1975. “Minimal Mutation Trees of Sequences.”SIAM J. appl. Math. 78, 35–42.

    Article  MathSciNet  Google Scholar 

  • — and R. J. Cedergren. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. Kruskal, pp. 253–263. Addison-Wesley, London.

    Google Scholar 

  • — and J. B. Kruskal (eds). 1983.Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Addison-Wesley: London.

    Google Scholar 

  • — and P. H. Sellers. 1973. “Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets.”Discrete Math. 4, 287–293.

    Article  MATH  MathSciNet  Google Scholar 

  • Sellers, P. 1974a. “An Algorithm for the Distance Between Two Finite Sequences.”Comb. Theory 16, 253–258.

    Article  MATH  MathSciNet  Google Scholar 

  • — 1974b. “On the Theory and Computation of Evolutionary Distances.”SIAM J. appl. Math. 26, 787–793.

    Article  MATH  MathSciNet  Google Scholar 

  • — 1979. “Pattern Recognition in Genetic Sequences.”Proc. natn. Acad. Sci. U.S.A. 76, 3041.

    Article  MATH  MathSciNet  Google Scholar 

  • — 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recoginition.”J. Algorithms 1, 359–373.

    Article  MATH  MathSciNet  Google Scholar 

  • Shepard, R. N. 1980. “Multidimentional Scaling, Tree-Fitting, and Clustering.”Science 210, 390–398.

    MathSciNet  Google Scholar 

  • Smith, T. F. and M. S. Waterman. 1981a. “Identification of Common Molecular Subsequences.”J. Mol. Biol. 147, 195–197.

    Article  Google Scholar 

  • — and—. 1981b. “Comparison of Biosequences.”Adv. appl. Math. 2, 482–489.

    Article  MATH  MathSciNet  Google Scholar 

  • Smith, T. F. and M. S. Waterman and C. Burks. 1984. “The Statistical Distribution of Nucleic Acids Similarities.” In prep.

  • — and— and W. M. Fitch. 1981. “Comparative Biosequence Metrics.”J. Mol. Evol. 18, 38–46.

    Article  Google Scholar 

  • Söll, D. and R. J. Roberts (Eds). 1982.The Application of Computers to Research on Nucleic Acids I. IRI Press, Oxford and Washington, D.C.

    Google Scholar 

  • — and—. 1984.The Application of Computers to Research on Nucleic Acids II. IRL Press, Oxford and Washington, D.C.

    Google Scholar 

  • Stanton, R. G. and D. D. Cowan. 1970. “Note on a ‘Square Functional’ Equation.”SIAM Rev,12, 277–279.

    Article  MATH  MathSciNet  Google Scholar 

  • Studnicka, G., G. Rahn, I. Cummings and W. Salser. 1978. “Computer Method for Predicting the Secondary Structure of Single-Stranded RNA.”Nucleic Acids Res.5, 3365–3387.

    Google Scholar 

  • Taylor, P. 1984. “A Fast Homology Program for Aligning Biological Sequences.”Nucleic Acids Res.12, 447–455.

    Google Scholar 

  • Ukkonen, E. 1983. “On Approximate String Matching.”Proc. Int. Conf. Found. Comp. Theor. Lectures Notes in Comp. Sci. 158, 487–496.

    MATH  MathSciNet  Google Scholar 

  • Ukkonen, E. 1984. “Algorithms for Approximate String Matching.”Informat. Control (in press).

  • Ulam, S. M. 1972. InApplications of Number Theory to Numerical Analysis, Ed. S. K. Zaremba, pp. 1–3. Academic Press, New York.

    Google Scholar 

  • Wagner, R. H. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds. D. Sankoff and J. B. Kruskal, pp. 215–235. Addison-Wesley, London.

    Google Scholar 

  • Waterman, M. S. 1976. “Secondary Structure of Single-stranded Nucleic Acids.”Adv. Math, Suppl. Stud.1, 167–212.

    MathSciNet  Google Scholar 

  • — 1983. “Sequence Alignment in the Neighborhood of the Optimum with General Applications to Dynamic Programming.”Proc. nant. Acad. Sci. U.S.A. 80, 3123–3124.

    Article  MATH  Google Scholar 

  • Waterman, M. S. 1984. “Efficient Sequence Alignment Algorithms.”J. Theor. Biol. (in press).

  • —, T. F. Smith and W. A. Beyer. 1976. “Some Biological Sequence Metrics.”Ad Math. 20, 367–387.

    Article  MATH  MathSciNet  Google Scholar 

  • Weiss, R. 1983. “Oncogenes and Growth Factors.”Nature 304, 12.

    Article  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Data Banks.”Proc. natn. Acad. Sci. U.S.A. 80, 726–730.

    Article  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman. 1984. “The Context Dependent Comparison of Biological Sequences”.SIAM J. appl. Math. (in press).

  • Wong, A. K. C., T. A. Reichert, D. N. Cohen and B. O. Aygun. 1974. “A Generalized Method for Matching Informational Macromolecular Code Sequences.”Comput. Biol. Med. 4, 43–57.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported by a grant from the System Development Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Waterman, M.S. General methods of sequence comparison. Bltn Mathcal Biology 46, 473–500 (1984). https://doi.org/10.1007/BF02459498

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02459498

Keywords

Navigation