Abstract
Mathematical methods for comparison of nucleic acid sequences are reviewed. There are two major methods of sequence comparison: dynamic programming and a method referred to here as the regions method. The problem types discussed are comparison of two sequences, location of long matching segments, efficient database searches and comparison of several sequences.
Similar content being viewed by others
Literature
Aho, V. A., J. E. Hopcroft and J. D. Ullman. 1974.The Design and Analysis of Computer Algorithms. Addison-Wesley, Menlo Park, California.
Arratia, R. A. and M. S. Waterman. 1984. “An Erdos-Renyi Law with Shifts.”Adv. appl. Math. (in press).
Arratia, R. A., L. Gordon and M. S. Waterman. 1984. “An Extreme Value Distribution for Sequence Matching.” Manuscript.
Beyer, W. A., C. Burks and W. B. Goad. 1983. “Quantitative Comparison of DNA Sequences.”Los Alamos Sci. 9, 62–63.
—, M. L. Stein, T. F. Smith and S. M. Ulam. 1974. “A Molecular-sequence Metric and Evolutionary Trees.”Math Biosci. 19, 9–25.
Boswell, D. R. and A. D. MacLachlan. 1984. “Sequence Comparison by Exponentially-damped Alignment.”Nucleic Acids Res.12, 457–464.
Byers, T. H. and M. S. Waterman. 1984. “Determining All Optimal and Near-optimal Solutions When Solving Shortest Path Problems by Dynamic Programming.”Operat. Res. (in press).
Cohen, D. N., T. A. Reichert and A. K. C. Wong. 1975. “Matching Code Sequences Utilizing Context Free Quality Measures.”Math. Biosci. 24, 25–30.
Collins, J. F. and A. F. W. Coulson. 1984. “Applications of Parallel Processing Algrithms for DNA Sequence Analysis.”Nucleic Acids Res.12, 181–192.
Delcoigne, A. and P. Hansen. 1975. “Sequence Comparison by Dynamic Programming.”Biometrika 62, 661–664.
Doolittle, R. F., M. W. Hunkapiller, L. E. Hood, S. G. Devare, K. C. Robbins, S. A. Aaronson and H. M. Antoniades. 1983. “Simian Sarcoma Viruses Onc Gene v-sis is Derived from the Gene (or Genes) Encoding a Platelet-derived Growth Factor.”Science 221, 275–276.
Dumas, J. P. and J. Ninio. 1982. “Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences.”Nucleic Acids Res.80, 197–206.
Dumey, A. I. 1956. “Indexing for Rapid Random-access Memory.”Comput. Automat. 5, 6–8.
Erickson, B. W. and P. H. Sellers. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Ed. D. Sankoff and J. B. Kruskal, pp. 55–90, Addison-Wesley, London.
Fickett, J. W. 1984. “Fast Optimal Alignment.”Nucleic Acids Res.12, 175–180.
Fitch, W. M. 1969. “Locating Gaps in Amino Acid Sequences to Optimize the Homology Between Two Proteins.”Biochem. Genet. 3, 99.
—. 1971. “Towards Defining the Course of Evolution: Miniumum Change for a Specific Tree Topology.”Syst. Zool. 20, 406–416.
— and E. Margoliash. 1967. “Construction of Polygenetic Trees.”Science 155, 279–284.
— and T. F. Smith. 1983. “Optimal Sequence Alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1386.
Gibbs, A. J. and G. A. McIntyre. 1970. “The Diagram, a Method for Comparing Sequences.”Euro. J. Biochem. 16, 1–11.
Goad, W. B., M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetries.”Nucleic Acids Res.10, 247–263.
Gordon, A. D. 1973. “A Sequence-comparison Statistic and Algorithm.”Biometrika 60, 197–200.
Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. Mol. Biol. 162, 705–708.
Harr, R., P. Hagblom and P. Gustafsson. 1982. “Two-dimensional Graphic Analysis of DNA Sequence Homologies.”Nucleic Acids Res.10, 365–374.
Jagadeeswaran, P. and P. M. McGuire. 1982. “Interactive Computer Programs in Sequence Data Analysis.”Nucleic Acids Res.10, 433–447.
Kanehisa, M. I. and W. B. Goad. 1982. “Pattern Recognition in Nucleic Acid Sequences II. An Efficient Method for Finding Locally Stable Secondary Structures.”Nucleic Acids Res.10, 265–277.
Karlin, S., G. Ghandour and D. E. Foulser. 1984. “Comparative Analysis of Human and Bovine Papallimaviruses.”Mol. Biol. Evol. 1, 357–370.
——, F. Ost, S. Tavare and L. J. Korn. 1983. “New Approaches for Computer Analysis of Nucleic Acid Sequences.”Proc. natn. Acad. Sci U.S.A. 80, 5660–5664.
Korn, L. J., C. L. Queen and M. N. Wegman. 1977. “Computer Analysis of Nucleic Acid Regulatory Sequences.”Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405.
Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. B. Kruskal, pp. 1–40. Addison-Wesley, London.
— and D. Sankoff. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. B. Kruskal, pp. 265–310. Addison-Wesley, London.
Laquer, T. H. 1981. “Asymptotic Limits for a Two-dimensional Recursion.”Stud. appl. Math. 64, 271–277.
Levenshtein, V. I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.”Cybernet. Control Theor. 10, 707–710;Dolklady Akademii nauk SSSR 163, 845–848.
Maizel, J. and R. Lenk. 1981. “Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences.”Proc. natn. Acad. Sci. U.S.A. 78, 7665–7669.
Martinez, H. M. 1980. “A New Algorithm for Calculating RNA Secondary Structure.” Manuscript.
— 1983. “An Efficient Method for Finding Repeats in Molecular Sequences.”Nucleic Acids Res.11, 4629–4634.
Naharro, G., K. C. Robbins and E. P. Reddy. 1984. “Gene Product of v-fgr Onc: Hybrid Protein Containing a Portion of Actin and Tyrosin-Specific Protein Kinase.”Science 223, 63–66.
Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. Mol. Biol. 48, 444–453.
Novotny, J. 1982. “Matrix Program to Analyze Primary Structure Homology.”Nucleic Acids Res.10, 127–131.
Queen, C. M., N. Wegman and L. J. Korn. 1982. “Improvements to a Program for DNA Analysis: a Procedure to find Homologies Among Many Sequences.”Nucleic Acids Res.10, 449–456.
Reichert, T. A., D. N. Cohen and A. K. C. Wong. 1973. “An Application of Informaton Theory to Genetic Mutations and Matching of Polypeptide Sequences.”J. Theor. Biol. 42, 245–261.
Sankoff, D. 1972. “Matching Sequences Under Deletion-Insertion Constraints.”Proc. natn. Acad. Sci. U.S.A. 68, 4–6.
— 1975. “Minimal Mutation Trees of Sequences.”SIAM J. appl. Math. 78, 35–42.
— and R. J. Cedergren. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds D. Sankoff and J. Kruskal, pp. 253–263. Addison-Wesley, London.
— and J. B. Kruskal (eds). 1983.Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Addison-Wesley: London.
— and P. H. Sellers. 1973. “Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets.”Discrete Math. 4, 287–293.
Sellers, P. 1974a. “An Algorithm for the Distance Between Two Finite Sequences.”Comb. Theory 16, 253–258.
— 1974b. “On the Theory and Computation of Evolutionary Distances.”SIAM J. appl. Math. 26, 787–793.
— 1979. “Pattern Recognition in Genetic Sequences.”Proc. natn. Acad. Sci. U.S.A. 76, 3041.
— 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recoginition.”J. Algorithms 1, 359–373.
Shepard, R. N. 1980. “Multidimentional Scaling, Tree-Fitting, and Clustering.”Science 210, 390–398.
Smith, T. F. and M. S. Waterman. 1981a. “Identification of Common Molecular Subsequences.”J. Mol. Biol. 147, 195–197.
— and—. 1981b. “Comparison of Biosequences.”Adv. appl. Math. 2, 482–489.
Smith, T. F. and M. S. Waterman and C. Burks. 1984. “The Statistical Distribution of Nucleic Acids Similarities.” In prep.
— and— and W. M. Fitch. 1981. “Comparative Biosequence Metrics.”J. Mol. Evol. 18, 38–46.
Söll, D. and R. J. Roberts (Eds). 1982.The Application of Computers to Research on Nucleic Acids I. IRI Press, Oxford and Washington, D.C.
— and—. 1984.The Application of Computers to Research on Nucleic Acids II. IRL Press, Oxford and Washington, D.C.
Stanton, R. G. and D. D. Cowan. 1970. “Note on a ‘Square Functional’ Equation.”SIAM Rev,12, 277–279.
Studnicka, G., G. Rahn, I. Cummings and W. Salser. 1978. “Computer Method for Predicting the Secondary Structure of Single-Stranded RNA.”Nucleic Acids Res.5, 3365–3387.
Taylor, P. 1984. “A Fast Homology Program for Aligning Biological Sequences.”Nucleic Acids Res.12, 447–455.
Ukkonen, E. 1983. “On Approximate String Matching.”Proc. Int. Conf. Found. Comp. Theor. Lectures Notes in Comp. Sci. 158, 487–496.
Ukkonen, E. 1984. “Algorithms for Approximate String Matching.”Informat. Control (in press).
Ulam, S. M. 1972. InApplications of Number Theory to Numerical Analysis, Ed. S. K. Zaremba, pp. 1–3. Academic Press, New York.
Wagner, R. H. 1983. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Eds. D. Sankoff and J. B. Kruskal, pp. 215–235. Addison-Wesley, London.
Waterman, M. S. 1976. “Secondary Structure of Single-stranded Nucleic Acids.”Adv. Math, Suppl. Stud.1, 167–212.
— 1983. “Sequence Alignment in the Neighborhood of the Optimum with General Applications to Dynamic Programming.”Proc. nant. Acad. Sci. U.S.A. 80, 3123–3124.
Waterman, M. S. 1984. “Efficient Sequence Alignment Algorithms.”J. Theor. Biol. (in press).
—, T. F. Smith and W. A. Beyer. 1976. “Some Biological Sequence Metrics.”Ad Math. 20, 367–387.
Weiss, R. 1983. “Oncogenes and Growth Factors.”Nature 304, 12.
Wilbur, W. J. and D. J. Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Data Banks.”Proc. natn. Acad. Sci. U.S.A. 80, 726–730.
Wilbur, W. J. and D. J. Lipman. 1984. “The Context Dependent Comparison of Biological Sequences”.SIAM J. appl. Math. (in press).
Wong, A. K. C., T. A. Reichert, D. N. Cohen and B. O. Aygun. 1974. “A Generalized Method for Matching Informational Macromolecular Code Sequences.”Comput. Biol. Med. 4, 43–57.
Author information
Authors and Affiliations
Additional information
This work was supported by a grant from the System Development Foundation.
Rights and permissions
About this article
Cite this article
Waterman, M.S. General methods of sequence comparison. Bltn Mathcal Biology 46, 473–500 (1984). https://doi.org/10.1007/BF02459498
Issue Date:
DOI: https://doi.org/10.1007/BF02459498