A theory for memory-based learning

Lin, Jyh-Han; Vitter, Jeffrey Scott

doi:10.1007/BF00993469

A theory for memory-based learning

Published: November 1994

Volume 17, pages 143–167, (1994)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A theory for memory-based learning

Download PDF

Jyh-Han Lin¹ &
Jeffrey Scott Vitter²

1301 Accesses
8 Citations
Explore all metrics

Abstract

A memory-based learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memory-based learning systems are the nearest-neighbor search, space decomposition techniques, and clustering. Research on memory-based learning is still in its early stage. In particular, there are very few rigorous theoretical results regarding memory requirement, sample size, expected performance, and computational complexity. In this paper, we propose a model for memory-based learning and use it to analyze several methods— ε-covering, hashing, clustering, tree-structured clustering, and receptive-fields—for learning smooth functions. The sample size and system complexity are derived for each method. Our model is built upon the generalized PAC learning model of Haussler (Haussler, 1989) and is closely related to the method of vector quantization in data compression. Our main result is that we can build memory-based learning systems using new clustering algorithms (Lin & Vitter, 1992a) to PAC-learn in polynomial time using only polynomial storage in typical situations.

References

Albus, J. S. (1975a). Data storage in the cerebellar model articulation controller (CMAC).Journal of Dynamic Systems, Measurement, and Control, 228–233.
Albus, J. S. (1975b). A new approach to manipulator control: The cerebellar model articulation controller (CMAC).Journal of Dynamic Systems, Measurement, and Control, 220–227.
Albus, J. S. (1981).Brains, Behaviour, and Robotics. Byte Books, Peterborough, NH.
Google Scholar
Carter, J. L., & Wegman, M. N. (1979). Universal classes of hash functions.Journal of Computer System and Science, 18(2):143–154.
Google Scholar
Chvátal, V. (1979). A greedy heuristic for the set-covering problem.Mathematics of Operations Research, 4(3):233–235.
Google Scholar
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification.IEEE Transactions on Information Theory, 13:21–27.
Google Scholar
Dantzig, G. (1951). Programming of interdependent activities, II, mathematical models. InActivity Analysis of Production and Allocation, 19–32. John Wiley & Sons, Inc, New York.
Google Scholar
Dean, T. L., & Wellman, M. P. (1991).Planning and Control. Morgan Kaufmann Publishers.
Devroye, L. (1988). Automatic pattern recognition: A study of the probability of error.IEEE Transactions on Pattern Recognition and Machine Intelligence, 10(4):530–543.
Google Scholar
Duda, R. M., & Hart, P. E. (1973).Pattern Classification and Scene Analysis. Wiley.
Dudley, R. M. (1978). Central limit theorems for empirical measures.Annals of Probability, 6(6):899–929.
Google Scholar
Dudley, R. M. (1984). A course on empirical processes. InLecture Note in Mathematics 1097. Springer Verlag.
Friedman, J. H. (1988).Multivariate Adaptive Regression Splines. Technical Report 102, Standford University, Lab for Computational Statistics.
Garey, M. R., & Johnson, D. S. (1979).Computers and intractability: A Guide to the Theory of NP-completeness. W. H. Freeman and Co., San Francisco, CA.
Google Scholar
Gersho, A. (1982). On the structure of vector quantizers.IEEE Transactions on Information Theory, 28(2):157–166.
Google Scholar
Gersho, A., & Gray, R. M. (1991).Vector Quantization and Signal Compression. Kluwer Academic Press, Massachusetts.
Google Scholar
Gray, R. M. (1984). Vector quantization.IEEE ASSP Magazine, 4–29.
Haussler, D. (1989). Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results. InProceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, 40–45.
Haussler, D., Kearns, M., Littlestone, N., & Warmuth, M. K. (1991). Equivalence of models for polynomial learnability.Information and Computation, 95:129–161.
Google Scholar
Haussler, D., & Long, P (1990). A generalization of sauer's lemma. Ucsc-crl-90-15, Dept. of Computer Science, UCSC.
Johnson, D. S. (1974). Approximation algorithms for combinatorial problems.Journal of Computer and System Sciences, 9:256–278.
Google Scholar
Kariv, O., & Hakimi, S. L. (1979). An algorithmic approach to network location problems. II: Thep-medians.SIAM Journal on Applied Mathematics, 539–560.
Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming.Combinatorica, 4:373–395.
Google Scholar
Khachiyan, L. G. (1979). A polynomial algorithm in linear programming.Soviet Math. Doklady, 20:191–194.
Google Scholar
Lin, J.-H., & Vitter, J. S. (1992a). ε-approximations with minimum packing constraint violation. InProceedings of the 24th Annual ACM Symposium on Theory of Computing, 771–782, Victoria, BC, Canada.
Lin, J.-H., & Vitter, J. S. (1992b). Nearly optimal vector quantization via linear programming. InProceedings of the IEEE Data Compression Conference, 22–31, Snowbird, Utah.
Lovász, L. (1975). On the ratio of optimal integral and fractional covers.Discrete Mathematics, 13:383–390.
Google Scholar
Megiddo, N., & Supowit, K. J. (1984). On the complexity of some common geometric location problems.SIAM Journal on Computing, 13(1):182–196.
Google Scholar
Miller, W. T. (1987). Sensor-based control of robotic manipulators using a general learning algorithms.IEEE Journal of Robotics and Automation, 3(2):157–165.
Google Scholar
Miller, W. T., Glanz, F. H., & Kraft, L. G. (1987a). Application of a general learning algorithm to the control of robotic manipulators.International Journal of Robotics Research, 6(2):84–98.
Google Scholar
Moody, J. (1989). Fast learning in multi-resolution hierarchies. InAdvances in Neural Information Processing Systems I, 29–39. Morgan Kaufmann Publisher.
Moody, J., & Darken, C. (1988). Learning with localized receptive fields. InProceedings of the 1988 Connectionist Models Summer School, 133–143. Morgan Kaufmann Publisher.
Moore, A. W. (1989).Acquisition of Dynamic Control Knowledge for Robotic Manipulator. Manuscript.
Papadimitriou, C. H. (1981). Worst-case and probabilistic analysis of a geometric location problem.SIAM Journal on Computing, 10:542–557.
Google Scholar
Poggio, T., & Girosi, F. (1989). A theory of networks for approximation and learning. A. I. Memo No. 1140, MIT. Artificial Intelligence Laboratory, Boston, MA.
Google Scholar
Poggio, T., & Girosi, F. (1990). Extensions of a theory of networks for approximation and learning: Dimensionality reduction and clustering. A. I. Memo No. 1167, MIT. Artificial Intelligence Laboratory, Boston, MA.
Google Scholar
Pollard, D. (1984).Convergence of Stochastic Processes. Springer-Verlag New York Inc.
Pollard, D. (1990).Empirical Processes: Theory and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics Volume 2.
Ramakrishna, M. V., & Awasthi, V. (1991).A Survey of Perfect Hashing. Manuscript.
Riskin, E. A. (1990).Variable Rate Vector Quantization of Images. Ph. D. Dissertation, Stanford University.
Sauer, N. (1972). On the density of families of sets.Journal of Combinatorial Theory (A), 13:145–147.
Google Scholar
Siegel, A. (1991).Coalesced Hashing is Computably Good. Manuscript.
Vapnik, V. N. (1982).Estimation of Dependences Based on Empirical Data. Springer Verlag, New York.
Google Scholar
Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and its Applications, 264–280.
Vitter, J. S., & Chen, W.-C. (1987).Design and Analysis of Coalesced Hashing. Oxford University Press.
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data.IEEE Transactions on Systems, Man, and Cybernetics, 2(3):408–421.
Google Scholar

Download references

Author information

Authors and Affiliations

Motorola Inc., Applied Research/Communications Lab., Paging Products Group, 33426, Boynton Beach, FL
Jyh-Han Lin
Department of Computer Science, Duke University, 27708, Durham, NC
Jeffrey Scott Vitter

Authors

Jyh-Han Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Scott Vitter
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This research was done while the authors were at Brown University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, JH., Vitter, J.S. A theory for memory-based learning. Mach Learn 17, 143–167 (1994). https://doi.org/10.1007/BF00993469

Download citation

Received: 16 November 1992
Accepted: 14 December 1993
Issue Date: November 1994
DOI: https://doi.org/10.1007/BF00993469

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A theory for memory-based learning

Abstract

Article PDF

Similar content being viewed by others

Imagery in the entropic associative memory

Computational principles of memory

The Self-Generating Model: An Adaptation of the Self-organizing Map for Intelligent Agents and Data Mining

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A theory for memory-based learning

Abstract

Article PDF

Similar content being viewed by others

Imagery in the entropic associative memory

Computational principles of memory

The Self-Generating Model: An Adaptation of the Self-organizing Map for Intelligent Agents and Data Mining

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation