Locally Weighted Learning

Atkeson, Christopher G.; Moore, Andrew W.; Schaal, Stefan

doi:10.1023/A:1006559212014

Locally Weighted Learning

Published: February 1997

Volume 11, pages 11–73, (1997)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Christopher G. Atkeson^1,2,
Andrew W. Moore³ &
Stefan Schaal^1,2

1902 Accesses
1057 Citations
6 Altmetric
Explore all metrics

Abstract

This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AAAI-91 (1991). Ninth National Conference on Artificial Intelligence. AAAI Press/The MIT Press, Cambridge, MA.
Google Scholar
Aha, D. W. (1989). Incremental, instance-based learning of independent and graded concept descriptions. In Sixth International Machine Learning Workshop, pp. 387–391. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Observations. PhD dissertation, University of California, Irvine, Department of Information and Computer Science.
Google Scholar
Aha, D. W. (1991). Incremental constructive induction: An instance-based approach. In Eighth International Machine Learning Workshop, pp. 117–121. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Aha, D. W. & Goldstone, R. L. (1990). Learning attribute relevance in context in instance-based learning algorithms. In 12th Annual Conference of the Cognitive Science Society, pp. 141–148. Lawrence Erlbaum, Cambridge, MA.
Google Scholar
Aha, D. W. & Goldstone, R. L. (1992). Concept learning and flexible weighting. In 14th Annual Conference of the Cognitive Science Society, pp. 534–539, Bloomington, IL. Lawrence Erlbaum Associates, Mahwah, NJ.
Google Scholar
Aha, D. W. & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Eleventh International Joint Conference on Artificial Intelligence, pp 794–799. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Aha, D. W. & McNulty, D. M. (1989). Learning relative attribute weights for instance-based concept descriptions. In 11th Annual Conference of the Cognitive Science Society, pp. 530–537. Lawrence Erlbaum Associates, Mahwah, NJ.
Google Scholar
Aha, D. W. & Salzberg, S. L. (1993). Learning to catch: Applying nearest neighbor algorithms to dynamic control tasks. In Proceedings of the Fourth International Workshop on Artificial Intelligence and Statistics, pp. 363–368, Ft. Lauderdale, FL.
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46(3): 175–185.
Google Scholar
Atkeson, C. G. (1990). Using local models to control movement. In Touretzky, D. S., editor, Advances In Neural Information Processing Systems 2, pp. 316–323. Morgan Kaufman, San Mateo, CA.
Google Scholar
Atkeson, C. G. (1992). Memory-based approaches to approximating continuous functions. In Casdagli and Eubank (1992), pp. 503–521. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Atkeson, C. G. (1996). Local learning. http://www.cc.gatech.edu/fac/Chris.Atkeson/local-learning/.
Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally weighted learning for control. Artificial Intelligence Review, this issue.
Atkeson, C. G. & Reinkensmeyer, D. J. (1988). Using associative content-addressable memories to control robots. In Proceedings of the 27th IEEE Conference on Decision and Control, volume 1, pp. 792–797, Austin, Texas. IEEE Cat. No.88CH2531–2.
Google Scholar
Atkeson, C. G. & Reinkensmeyer, D. J. (1989). Using associative content-addressable memories to control robots. In Proceedings, IEEE International Conference on Robotics and Automation, Scottsdale, Arizona.
Atkeson, C. G. & Schaal, S. (1995). Memory-based neural networks for robot learning. Neurocomputing 9: 243–269.
Google Scholar
Baird, L. C. & Klopf, A. H. (1993). Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR–93–1147, Wright Laboratory, Wright-Patterson Air Force Base Ohio. http://kirk.usafa.af.mil/∼baird/papers/index.html.
Google Scholar
Barnhill, R. E. (1977). Representation and approximation of surfaces. In Rice, J. R., editor, Mathematical Software III, pp. 69–120. Academic Press, New York, NY.
Google Scholar
Batchelor, B. G. (1974). Practical Approach To Pattern Classification. Plenum Press, New York, NY.
Google Scholar
Benedetti, J. K. (1977). On the nonparametric estimation of regression functions. Journal of the Royal Statistical Society, Series B 39: 248–253.
Google Scholar
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9): 509–517.
Google Scholar
Bentley, J. L. & Friedman, J. H. (1979). Data structures for range searching. ACM Comput. Surv. 11(4): 397–409.
Google Scholar
Bentley, J. L., Weide, B. & Yao, A. (1980). Optimal expected time algorithms for closest point problems. ACM Transactions on Mathematical Software 6: 563–580.
Google Scholar
Blyth, S. (1993). Optimal kernel weights under a power criterion. Journal of the American Statistical Association 88(424): 1284–1286.
Google Scholar
Bottou, L. & Vapnik, V. (1992). Local learning algorithms. Neural Computation 4(6): 888–900.
Google Scholar
Bregler, C. & Omohundro, S. M. (1994). Surface learning with applications to lipreading. In Cowan et al. (1994), pp. 43–50.
Brockmann, M., Gasser, T. & Herrmann, E. (1993). Locally adaptive bandwidth choice for kernel regression estimators. Journal of the American Statistical Association, 88(424): 1302–1309.
Google Scholar
Broder, A. J. (1990). Strategies for efficient incremental nearest neighbor search. Pattern Recognition 23: 171–178.
Google Scholar
Callan, J. P., Fawcett, T. E. & Rissland, E. L. (1991). CABOT: An adaptive approach to case based search. In IJCAI 12 (1991), pp. 803–808.
Google Scholar
Casdagli, M. & Eubank, S. (eds.) (1992). Nonlinear Modeling and Forecasting. Proceedings Volume XII in the Santa Fe Institute Studies in the Sciences of Complexity. Addison Wesley, New York, NY. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Cheng, P. E. (1984). Strong consistency of nearest neighbor regression function estimators. Journal of Multivariate Analysis 15: 63–72.
Google Scholar
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74: 829–836.
Google Scholar
Cleveland, W. S. (1993a). Coplots, nonparametric regression, and conditionally parametric fits. Technical Report 19, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Google Scholar
Cleveland, W. S. (1993b). Visualizing Data. Hobart Press, Summit, NJ. books@hobart.com.
Google Scholar
Cleveland, W. S. & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83: 596–610.
Google Scholar
Cleveland, W. S., Devlin, S. J. & Grosse, E. (1988). Regression by local fitting: Methods, properties, and computational algorithms. Journal of Econometrics 37: 87–114.
Google Scholar
Cleveland, W. S. & Grosse, E. (1991). Computational methods for local regression. Statistics and Computing 1(1): 47–62. ftp://cm.bell-labs.com/cm/cs/doc/91/4–04.ps.gz.
Google Scholar
Cleveland, W. S., Grosse, E. & Shyu, W. M. (1992). Local regression models. In Chambers, J. M. & Hastie, T. J. (eds.), Statistical Models in S, pp. 309–376. Wadsworth, Pacific Grove, CA. http://netlib.att.com/netlib/a/cloess.ps.Z.
Google Scholar
Cleveland, W. S. & Loader, C. (1994a). Computational methods for local regression. Technical Report 11, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Google Scholar
Cleveland, W. S. & Loader, C. (1994b). Local fitting for semiparametric (nonparametric) regression: Comments on a paper of Fan and Marron. Technical Report 8, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/, 94.8.ps, earlier version is 94.3.ps.
Google Scholar
Cleveland, W. S. & Loader, C. (1994c). Smoothing by local regression: Principles and methods. Technical Report 95.3, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Google Scholar
Cleveland, W. S., Mallows, C. L. & McRae, J. E. (1993). ATS methods: Nonparametric regression for non-Gaussian data. Journal of the American Statistical Association 88(423): 821–835.
Google Scholar
Connell, M. E. & Utgoff, P. E. (1987). Learning to control a dynamic physical system. In Sixth National Conference on Artificial Intelligence, pp. 456–460, Seattle, WA. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Cost, S. & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10(1): 57–78.
Google Scholar
Coughran, Jr., W. M. & Grosse, E. (1991). Seeing and hearing dynamic loess surfaces. In Interface '91 Proceedings, pp. 224–228. Springer-Verlag. ftp://cm.bell-labs.com/cm/cs/doc/91/4–07.ps.gz or 4–07long.ps.gz.
Cowan, J. D., Tesauro, G. & Alspector, J. (eds.) (1994). Advances In Neural Information Processing Systems 6. Morgan Kaufman, San Mateo, CA.
Google Scholar
Crain, I. K. & Bhattacharyya, B. K. (1967). Treatment of nonequispaced two dimensional data with a digital computer. Geoexploration 5: 173–194.
Google Scholar
Deheuvels, P. (1977). Estimation non-paramétrique del la densité par histogrammes généralisés. Revue Statistique Appliqué 25: 5–42.
Google Scholar
Deng, K. & Moore, A. W. (1995). Multiresolution instance-based learning. In Fourteenth International Joint Conference on Artificial Intelligence, pp. 1233–1239. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Dennis, J. E., Gay, D. M. & Welsch, R. E. (1981). An adaptive nonlinear least-squares algorithm. ACM Transactions on Mathematical Software 7(3): 369–383.
Google Scholar
Devroye, L. (1981). On the almost everywhere convergence of nonparametric regression function estimates. The Annals of Statistics 9(6): 1310–1319.
Google Scholar
Diebold, F. X. & Nason, J. A. (1990). Nonparametric exchange rate prediction? Journal of International Economics 28: 315–332.
Google Scholar
Dietterich, T. G., Wettschereck, D., Atkeson, C. G. & Moore, A. W. (1994). Memory-based methods for regression and classification. In Cowan et al. (1994), pp. 1165–1166.
Draper, N. R. & Smith, H. (1981). Applied Regression Analysis. John Wiley, New York, NY, 2nd edition.
Google Scholar
Elliot, T. & Scott, P. D. (1991). Instance-based and generalization-based learning procedures applied to solving integration problems. In Proceedings of the Eighth Conference of the Society for the Study of Artificial Intelligence, pp. 256–265, Leeds, England. Springer Verlag.
Google Scholar
Epanechnikov, V. A. (1969). Nonparametric estimation of a multivariate probability density. Theory of Probability and Its Applications 14: 153–158.
Google Scholar
Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York, NY.
Google Scholar
Falconer, K. J. (1971). A general purpose algorithm for contouring over scattered data points. Technical Report NAC 6, National Physical Laboratory, Teddington, Middlesex, United Kingdon, TW11 0LW.
Google Scholar
Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association 87(420): 998–1004.
Google Scholar
Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Annals of Statistics 21: 196–216.
Google Scholar
Fan, J. (1995). Local modeling. EES Update: written for the Encyclopedia of Statistics Science, http://www.stat.unc.edu/faculty/fan/papers.html.
Fan, J. & Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. The Annals of Statistics 20(4): 2008–2036.
Google Scholar
Fan, J. & Gijbels, I. (1994). Censored regression: Local linear approximations and their applications. Journal of the American statistical Association 89: 560–570.
Google Scholar
Fan, J. & Gijbels, I. (1995a). Adaptive order polynomial fitting: Bandwidth robustification and bias reduction. J. Comp. Graph. Statist. 4: 213–227.
Google Scholar
Fan, J. & Gijbels, I. (1995b). Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society B 57: 371–394.
Google Scholar
Fan, J. & Gijbels, I. (1996). Local Polynomial Modeling and its Applications. Chapman and Hall, London.
Google Scholar
Fan, J. & Hall, P. (1994). On curve estimation by minimizing mean absolute deviation and its implications. The Annals of Statistics 22(2): 867–885.
Google Scholar
Fan, J. & Kreutzberger, E. (1995). Automatic local smoothing for spectral density estimation. ftp://stat.unc.edu/pub/fan/spec.ps.
Fan, J. & Marron, J. S. (1993). Comment on [Hastie and Loader, 1993]. Statistical Science 8(2): 129–134.
Google Scholar
Fan, J. & Marron, J. S. (1994a). Fast implementations of nonparametric curve estimators. Journal of Computational and Statistical Graphics 3: 35–56.
Google Scholar
Fan, J. & Marron, J. S. (1994b). Rejoinder to discussion of Cleveland and Loader.
Farmer, J. D. & Sidorowich, J. J. (1987). Predicting chaotic time series. Physical Review Letters 59(8): 845–848.
Google Scholar
Farmer, J. D. & Sidorowich, J. J. (1988a). Exploiting chaos to predict the future and reduce noise. In Lee, Y. C. (ed.), Evolution, Learning, and Cognition, pp. 277----World Scientific Press, NJ. also available as Technical Report LA-UR–88–901, Los Alamos National Laboratory, Los Alamos, New Mexico.
Google Scholar
Farmer, J. D. & Sidorowich, J. J. (1988b). Predicting chaotic dynamics. In Kelso, J. A. S., Mandell, A. J. & Schlesinger, M. F. (eds.), Dynamic Patterns in Complex Systems, pp. 265–292. World Scientific, NJ.
Google Scholar
Farwig, R. (1987). Multivariate interpolation of scattered data by moving least squares methods. In Mason, J. C. & Cox, M. G. (eds.), Algorithms for Approximation, pp. 193–211. Clarendon Press, Oxford.
Google Scholar
Fedorov, V. V., Hackl, P. & Müller, W. G. (1993). Moving local regression: The weight function. Nonparametric Statistics 2(4): 355–368.
Google Scholar
Franke, R. & Nielson, G. (1980). Smooth interpolation of large sets of scattered data. International Journal for Numerical Methods in Engineering 15: 1691–1704.
Google Scholar
Friedman, J. H. (1984). A variable span smoother. Technical Report LCS 5, Stanford University, Statistics Department, Stanford, CA.
Google Scholar
Friedman, J. H. (1994). Flexible metric nearest neighbor classification. http://playfair.stanford.edu/reports/friedman/.
Friedman, J. H., Bentley, J. L. & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3(3): 209–226.
Google Scholar
Fritzke, B. (1995). Incremental learning of local linear mappings. In Proceedings of the International Conference on Artificial Neural Networks ICANN '95, pp. 217–222, Paris, France.
Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. Academic Press, New York, NY, second edition.
Google Scholar
Gasser, T. & Müller, H. G. (1979). Kernel estimation of regression functions. In Gasser, T. & Rosenblatt, M. (eds.), Smoothing Techniques for Curve Estimation, number 757 in Lecture Notes in Mathematics, pp. 23–67. Springer-Verlag, Heidelberg.
Google Scholar
Gasser, T. & Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandanavian Journal of Statistics 11: 171–185.
Google Scholar
Gasser, T., Müller, H. G. & Mammitzsch, V. (1985). Kernels for nonparametric regression. Journal of the Royal Statistical Society, Series B 47: 238–252.
Google Scholar
Ge, Z., Cavinato, A. G. & Callis, J. B. (1994). Noninvasive spectroscopy for monitoring cell density in a fermentation process. Analytical Chemistry 66: 1354–1362.
Google Scholar
Goldberg, K. Y. & Pearlmutter, B. (1988). Using a neural network to learn the dynamics of the CMU Direct-Drive Arm II. Technical Report CMU-CS–88–160, Carnegie-Mellon University, Pittsburgh, PA.
Google Scholar
Gorinevsky, D. & Connolly, T. H. (1994). Comparison of some neural network and scattered data approximations: The inverse manipulator kinematics example. Neural Computation 6: 521–542.
Google Scholar
Goshtasby, A. (1988). Image registration by local approximation methods. Image and Vision Computing 6(4): 255–261.
Google Scholar
Grosse, E. (1989). LOESS: Multivariate smoothing by moving least squares. In Chui, C. K., Schumaker, L. L. & Ward, J. D. (eds.), Approximation Theory VI, pp. 1–4. Academic Press, Boston, MA.
Google Scholar
Hammond, S. V. (1991). Nir analysis of antibiotic fermentations. In Murray, I. & Cowe, I. A. (eds.), Making Light Work: Advances in Near Infrared Spectroscopy, pp. 584–589. VCH: New York, NY. Developed from the 4th International Conference on Near Infrared Spectroscopy, Aberdeen, Scotland, August 19–23, 1991.
Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986). Robust Statistics: The Approach Based On Influence Functions. John Wiley, New York, NY.
Google Scholar
Härdle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, New York, NY.
Google Scholar
Hastie, T. & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science 8(2): 120–143.
Google Scholar
Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Regression. Chapman Hall, London.
Google Scholar
Hastie, T. J. & Tibshirani, R. J. (1994). Discriminant adaptive nearest neighbor classification. ftp://playfair.Stanford.EDU/pub/hastie/dann.ps.Z.
Higuchi, T., Kitano, H., Furuya, T., ichi Handa, K., Takahashi, N. & Kokubu, A. (1991). IXM2: A parallel associative processor for knowledge processing. In AAAI-9 (1991), pp. 296–303.
Hillis, D. (1985). The Connection Machine. MIT Press, Cambridge, MA.
Google Scholar
Huang, P. S. (1996). Planning For Dynamic Motions Using A Search Tree. MS thesis, University of Toronto, Graduate Department of Computer Science. http://www.dgp.utoronto.ca/people/psh/home.html.
IJCAI 12 (1991). Twelfth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
IJCAI 13 (1993). Thirteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
Jabbour, K., Riveros, J. F. W., Landsbergen, D. & Meyer, W. (1987). ALFA: Automated load forecasting assistant. In Proceedings of the 1987 IEEE Power Engineering Society Summer Meeting, San Francisco, CA.
James, M. (1985). Classification Algorithms. John Wiley and Sons, New York, NY.
Google Scholar
Jones, M. C., Davies, S. J. & Park, B. U. (1994). Versions of kernel-type regression estimators. Journal of the American Statistical Association 89(427): 825–832.
Google Scholar
Karalič, A. (1992). Employing linear regression in regression tree leaves. In Neumann, B. (ed.), ECAI 92: 10th European Conference on Artificial Intelligence, pp. 440–441, Vienna, Austria. John Wiley and Sons.
Google Scholar
Katkovnik, V. Y. (1979). Linear and nonlinear methods of nonparametric regression analysis. Soviet Automatic Control 5: 25–34.
Google Scholar
Kazmierczak, H. & Steinbuch, K. (1963). Adaptive systems in pattern recognition. IEEE Transactions on Electronic Computers EC-12: 822–835.
Google Scholar
Kibler, D., Aha, D. W. & Albert, M. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence 5: 51–57.
Google Scholar
Kitano, H. (1993a). Challenges of massive parallelism. In IJCAI 13 (1993), pp. 813–834.
Kitano, H. (1993b). A comprehensive and practical model of memory-based machine translation. In IJCAI 13 (1993), pp. 1276–1282.
Kitano, H. & Higuchi, T. (1991a). High performance memory-based translation on IXM2 massively parallel associative memory processor. In AAAI-9 (1991), pp. 149–154.
Kitano, H. & Higuchi, T. (1991b). Massively parallel memory-based parsing. In IJCAI 12 (1991), pp. 918–924.
Kitano, H., Moldovan, D. & Cha, S. (1991). High performance natural language processing on semantic network array processor. In IJCAI 12 (1991), pp. 911–917.
Kozek, A. S. (1992). A new nonparametric estimation method: Local and nonlinear. Interface 24: 389–393.
Google Scholar
Lancaster, P. (1979). Moving weighted least-squares methods. In Sahney, B. N. (ed.), Polynomial and Spline Approximation, pp. 103–120. D. Reidel Publishing, Boston, MA.
Google Scholar
Lancaster, P. & Šalkauskas, K. (1981). Surfaces generated by moving least squares methods. Mathematics of Computation 37(155): 141–158.
Google Scholar
Lancaster, P. & Šalkauskas, K. (1986). Curve And Surface Fitting. Academic Press, New York, NY.
Google Scholar
Lawrence, S., Tsoi, A. C. & Black, A. D. (1996). Function approximation with neural networks and local methods: Bias, variance and smoothness. In Australian Conference on Neural Networks, Canberra, Australia, Canberra, Australia. available from http://www.neci.nj.nec.com/homepages/lawrence and http://www.elec.uq.edu.au/∼lawrence.
LeBaron, B. (1990). Forecast improvements using a volatility index. Unpublished.
LeBaron, B. (1992). Nonlinear forecasts for the S&P stock index. In Casdagli and Eubank (1992), pp. 381–393. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Legg, M. P. C. & Brent, R. P. (1969). Automatic contouring. In 4th Australian Computer Conference, pp. 467–468.
Lejeune, M. (1984). Optimization in non-parametric regression. In COMPSTAT 1984: Proceedings in Computational Statistics, pp. 421–426, Prague. Physica-Verlag Wien.
Google Scholar
Lejeune, M. (1985). Estimation non-paramétrique par noyaux: Régression polynômial mobile. Revue de Statistique Appliquée 23(3): 43–67.
Google Scholar
Lejeune, M. & Sarda, P. (1992). Smooth estimators of distribution and density functions. Computational Statistics & Data Analysis 14: 457–471.
Google Scholar
Li, K. C. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. The Annals of Statistics 12: 230–240.
Google Scholar
Loader, C. (1994). Computing nonparametric function estimates. Technical Report 7, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. Available by anonymous FTP from netlib.att.com in /netlib/att/stat/doc/94/7.ps.
Google Scholar
Lodwick, G. D. & Whittle, J. (1970). A technique for automatic contouring field survey data. Australian Computer Journal 2: 104–109.
Google Scholar
Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.
Google Scholar
Maron, O. & Moore, A. W. (1997). The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, this issue.
Marron, J. S. (1988). Automatic smoothing parameter selection: A survey. Empirical Economics 13: 187–208.
Google Scholar
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Prieditis & Russell (eds.) (1995), pp. 387–395.
McIntyre, D. B., Pollard, D. D. & Smith, R. (1968). Computer programs for automatic contouring. Technical Report Kansas Geological Survey Computer Contributions 23, University of Kansas, Lawrence, KA.
Google Scholar
McLain, D. H. (1974). Drawing contours from arbitrary data points. The Computer Journal 17(4): 318–324.
Google Scholar
Medin, D. L. & Shoben, E. J. (1988). Context and structure in conceptual combination. Cognitive Psychology 20: 158–190.
Google Scholar
Meese, R. & Wallace, N. (1991). Nonparametric estimation of dynamic hedonic price models and the construction of residential housing price indices. American Real Estate and Urban Economics Association Journal 19(3): 308–332.
Google Scholar
Meese, R. A. & Rose, A. K. (1990). Nonlinear, nonparametric, nonessential exchange rate estimation. The American Economic Review May: 192–196.
Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall, London.
Google Scholar
Miller, W. T., Glanz, F. H. & Kraft, L. G. (1987). Application of a general learning algorithm to the control of robotic manipulators. International Journal of Robotics Research 6: 84–98.
Google Scholar
Mohri, T. & Tanaka, H. (1994). An optimal weighting criterion of case indexing for both numeric and symbolic attributes. In Aha, D. W. (ed.), AAAI-94 Workshop Program: Case-Based Reasoning, Working Notes, pp. 123–127. AAAI Press, Seattle, WA.
Google Scholar
Moore, A. W. (1990a). Acquisition of Dynamic Control Knowledge for a Robotic Manipulator. In Seventh International Machine Learning Workshop. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Moore, A. W. (1990b). Efficient Memory-based Learning for Robot Control. PhD. Thesis; Technical Report No. 209, Computer Laboratory, University of Cambridge.
Moore, A. W., Hill, D. J. & Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers, and function approximators. In Hanson, S., Judd, S. & Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press, Cambridge, MA.
Google Scholar
Moore, A. W. & Schneider, J. (1995). Memory-based stochastic optimization. To appear in the proceedings of NIPS-95, Also available as Technical Report CMU-RI-TR–95–30, ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/project/reinforcement/papers/memstoch.ps.
More, J. J., Garbow, B. S. & Hillstrom, K. E. (1980). User guide for MINPACK-1. Technical Report ANL–80–74, Argonne National Laboratory, Argonne, Illinois.
Google Scholar
Müller, H.-G. (1987). Weighted local regression and kernel methods for nonparametric curve fitting. Journal of the American Statistical Association 82: 231–238.
Google Scholar
Müller, H.-G. (1993). Comment on [Hastie and Loader, 1993]. Statistical Science 8(2): 134–139.
Google Scholar
Murphy, O. J. & Selkow, S. M. (1986). The efficiency of using k-d trees for finding nearest neighbors in discrete space. Information Processing Letters 23: 215–218.
Google Scholar
Myers, R. H. (1990). Classical and Modern Regression With Applications. PWS-KENT, Boston, MA.
Google Scholar
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and Its Applications 9: 141–142.
Google Scholar
Næs, T. & Isaksson, T. (1992). Locally weighted regression in diffuse near-infrared transmittance spectroscopy. Applied Spectroscopy 46(1): 34–43.
Google Scholar
Næs, T., Isaksson, T. & Kowalski, B. R. (1990). Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry 62(7): 664–673.
Google Scholar
Nguyen, T., Czerwinsksi, M. & Lee, D. (1993). COMPAQ Quicksource: Providing the consumer with the power of artificial intelligence. In Proceedings of the Fifth Annual Conference on Innovative Applications of Artificial Intelligence, pp. 142–150, Washington, DC. AAAI Press.
Google Scholar
Nosofsky, R. M., Clark, S. E. & Shin, H. J. (1989). Rules and exemplars in categorization, identification, and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition 15: 282–304.
Google Scholar
Omohundro, S. M. (1987). Efficient Algorithms with Neural Network Behaviour. Journal of Complex Systems 1(2): 273–347.
Google Scholar
Omohundro, S. M. (1991). Bumptrees for Efficient Function, Constraint, and Classification Learning. In Lippmann, R. P., Moody, J. E. & Touretzky, D. S. (eds.), Advances in Neural Information Processing Systems 3. Morgan Kaufmann.
Palmer, J. A. B. (1969). Automatic mapping. In 4th Australian Computer Conference, pp. 463–466.
Pelto, C. R., Elkins, T. A. & Boyd, H. A. (1968). Automatic contouring of irregularly spaced data. Geophysics 33: 424–430.
Google Scholar
Peng, J. (1995). Efficient memory-based dynamic programming. In Prieditis & Russell (eds.) (1995), pp. 438–446.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1988). Numerical Recipes in C. Cambridge University Press, New York, NY.
Google Scholar
Prieditis, A. & Russell, S. (eds.) (1995). Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Rachlin, J., Kasif, S., Salzberg, S. & Aha, D. W. (1994). Towards a better understanding of memory-based reasoning systems. In Eleventh International Conference on Machine Learning, pp. 242–250. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Racine, J. (1993). An efficient cross-validation algorithm for window width selection for non-parametric kernel regression. Communications in Statistics: Simulation and Computation 22(4): 1107–1114.
Google Scholar
Ramasubramanian, V. & Paliwal, K. K. (1989). A generalized optimization of the k-d tree for fast nearest-neighbour search. In International Conference on Acoustics, Speech, and Signal Processing.
Raz, J., Turetsky, B. I. & Fein, G. (1989). Selecting the smoothing parameter for estimation of smoothly changing evoked potential signals. Biometrics 45: 851–871.
Google Scholar
Renka, R. J. (1988). Multivariate interpolation of large sets of scattered data. ACM Transactions on Mathematical Software 14(2): 139–152.
Google Scholar
Ruppert, D. & Wand, M. P. (1994). Multivariate locally weighted least squares regression. The Annals of Statistics 22(3): 1346–1370.
Google Scholar
Ruprecht, D. & Müller, H. (1992). Image warping with scattered data interpolation methods. Technical Report 443, Universität Dortmund, Fachbereich Informatik, D-44221 Dortmund, Germany. Available for anonymous FTP from ftp-1s7.informatik.uni-dortmund.de in pub/reports/ls7/rr-443.ps.Z.
Google Scholar
Ruprecht, D. & Müller, H. (1993). Free form deformation with scattered data interpolation methods. In Farin, G., Hagen, H. & Noltemeier, H. (eds.), Geometric Modelling (Computing Suppl. 8), pp. 267–281. Springer Verlag. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/iif/rr-41.ps.Z.
Ruprecht, D. & Müller, H. (1994a). Deformed cross-dissolves for image interpolation in scientific visualization. The Journal of Visualization and Computer Animation 5(3): 167–181. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-491.ps.Z.
Google Scholar
Ruprecht, D. & Müller, H. (1994b). A framework for generalized scattered data interpolation. Technical Report 517, Universität Dortmund, Fachbereich Informatik, D-44221 Dortmund, Germany. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-517.ps.Z.
Google Scholar
Ruprecht, D., Nagel, R. & Müller, H. (1994). Spatial free form deformation with scattered data interpolation methods. Technical Report 539, Fachbereich Informatik der Universität Dortmund, 44221 Dortmund, Germany. Accepted for publication by Computers & Graphics, Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-539.ps.Z.
Google Scholar
Rust, R. T. & Bornman, E. O. (1982). Distribution-free methods of approximating nonlinear marketing relationships. Journal of Marketing Research XIX: 372–374.
Google Scholar
Sabin, M. A. (1980). Contouring — a review of methods for scattered data. In Brodlie, K. (ed.), Mathematical Methods in Computer Graphics and Design, pp. 63–86. Academic Press, New York, NY.
Google Scholar
Saitta, L. (ed.) (1996). Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.
Google Scholar
Schaal, S. & Atkeson, C. G. (1994). Assessing the quality of learned local models. In Cowan et al. (1994), pp. 160–167.
Schaal, S. & Atkeson, C. G. (1995). From isolation to cooperation: An alternative view of a system of experts. NIPS95 proceedings, in press.
Scott, D. W. (1992). Multivariate Density Estimation. Wiley, New York, NY.
Google Scholar
Seber, G. A. F. (1977). Linear Regression Analysis. John Wiley, New York, NY.
Google Scholar
Seifert, B., Brockmann, M., Engel, J. & Gasser, T. (1994). Fast algorithms for nonparametric curve estimation. Journal of Computational and Graphical Statistics 3(2): 192–213.
Google Scholar
Seifert, B. & Gasser, T. (1994). Variance properties of local polynomials. http://www.unizh.ch/biostat/manuscripts.html.
Shepard, D. (1968). A two-dimensional function for irregularly spaced data. In 23rd ACM National Conference, pp. 517–524.
Solow, A. R. (1988). Detecting changes through time in the variance of a long-term hemispheric temperature record: An application of robust locally weighted regression. Journal of Climate 1: 290–296.
Google Scholar
Specht, D. E. (1991). A general regression neural network. IEEE Transactions on Neural Networks 2(6): 568–576.
Google Scholar
Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-d trees. Algorithmica 6: 579–589.
Google Scholar
Stanfill, C. (1987). Memory-based reasoning applied to English pronunciation. In Sixth National Conference on Artificial Intelligence, pp. 577–581.
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM 29(12): 1213–1228.
Google Scholar
Steinbuch, K. (1961). Die lernmatrix. Kybernetik 1: 36–45.
Google Scholar
Steinbuch, K. & Piske, U. A. W. (1963). Learning matrices and their applications. IEEE Transactions on Electronic Computers EC-12: 846–862.
Google Scholar
Stone, C. J. (1975). Nearest neighbor estimators of a nonlinear regression function. In Computer Science and Statistics: 8th Annual Symposium on the Interface, pp. 413–418.
Stone, C. J. (1977). Consistent nonparametric regression. The Annals of Statistics 5: 595–645.
Google Scholar
Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. The Annals of Statistics 8: 1348–1360.
Google Scholar
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics 10(4): 1040–1053.
Google Scholar
Sumita, E., Oi, K., Furuse, O., Iida, H., Higuchi, T., Takahashi, N. & Kitano, H. (1993). Example-based machine translation on massively parallel processors. In IJCAI 13 (1993), pp. 1283–1288.
Google Scholar
Tadepalli, P. & Ok, D. (1996). Scaling up average reward reinforcement learning by approximating the domain models and the value function. In Saitta (1996). http://www.cs.orst.edu:80/∼tadepall/research/publications.html.
Tamada, T., Maruyama, M., Nakamura, Y., Abe, S. & Maeda, K. (1993). Water demand forecasting by memory based learning. Water Science and Technology 28(11–12): 133–140.
Google Scholar
Taylor, W. K. (1959). Pattern recognition by means of automatic analogue apparatus. Proceedings of The Institution of Electrical Engineers 106B: 198–209.
Google Scholar
Taylor, W. K. (1960). A parallel analogue reading machine. Control 3: 95–99.
Google Scholar
Thorpe, S. (1995). Localized versus distributed representations. In Arbib, M. A. (ed.), The Handbook of Brain Theory and Neural Networks, pp. 549–552. The MIT Press, Cambridge, MA.
Google Scholar
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems (NIPS) 8. http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/thrun/publications.html.
Thrun, S. & O'Sullivan, J. (1996). Discovering structure in multiple learning tasks: The TC algorithm. In Saitta (1996). http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/thrun/publications.html.
Tibshirani, R. & Hastie, T. (1987). Local likelihood estimation. Journal of the American Statistical Association 82: 559–567.
Google Scholar
Ting, K. M. & Cameron-Jones, R. M. (1994). Exploring a framework for instance based learning and naive Bayesian classifiers. In Proceedings of the Seventh Australian Joint Conference on Artificial Intelligence, Armidale, Australia. World Scientific.
Google Scholar
Tou, J. T. & Gonzalez, R. C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA.
Google Scholar
Townshend, B. (1992). Nonlinear prediction of speech signals. In Casdagli and Eubank (1992), pp. 433–453. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Tsybakov, A. B. (1986). Robust reconstruction of functions by the local approximation method. Problems of Information Transmission 22: 133–146.
Google Scholar
Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
Google Scholar
Turetsky, B. I., Raz, J. & Fein, G. (1989). Estimation of trial-to-trial variation in evoked potential signals by smoothing across trials. Psychophysiology 26(6): 700–712.
Google Scholar
Turlach, B. A. & Wand, M. P. (1995). Fast computation of auxiliary quantities in local polynomial regression. http://netec.wustl.edu/∼adnetec/WoPEc/agsmst/agsmst95009.html.
van der Smagt, P., Groen, F. & van het Groenewoud, F. (1994). The locally linear nested network for robot manipulation. In Proceedings of the IEEE International Conference on Neural Networks, pp. 2787–2792. ftp://ftp.fwi.uva.nl/pub/computer-systems/aut-sys/reports/SmaGroGro94b.ps.gz.
Vapnik, V. (1992). Principles of risk minimization for learning theory. In Moody, J. E., Hanson, S. J. & Lippmann, R. P. (eds.), Advances In Neural Information Processing Systems 4, pp. 831–838. Morgan Kaufman, San Mateo, CA.
Google Scholar
Vapnik, V. & Bottou, L. (1993). Local algorithms for pattern recognition and dependencies estimation. Neural Computation 5(6): 893–909.
Google Scholar
Walden, A. T. & Prescott, P. (1983). Identification of trends in annual maximum sea levels using robust locally weighted regression. Estuarine, Coastal and Shelf Science 16: 17–26.
Google Scholar
Walters, R. F. (1969). Contouring by machine: A user's guide. American Association of Petroleum Geologists Bulletin 53(11): 2324–2340.
Google Scholar
Waltz, D. L. (1987). Applications of the Connection Machine. Computer 20(1): 85–97.
Google Scholar
Wand, M. P. & Jones, M. C. (1993). Comparison of smoothing parameterizations in bivariate kernel density estimation. Journal of the American Statistical Association 88: 520–528.
Google Scholar
Wand, M. P. & Jones, M. C. (1994). Kernel Smoothing. Chapman and Hall, London.
Google Scholar
Wand, M. P. & Schucany, W. R. (1990). Gaussian-based kernels for curve estimation and window width selection. Canadian Journal of Statistics 18: 197–204.
Google Scholar
Wang, Z., Isaksson, T. & Kowalski, B. R. (1994). New approach for distance measurement in locally weighted regression. Analytical Chemistry 66(2): 249–260.
Google Scholar
Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 26: 359–372.
Google Scholar
Weisberg, S. (1985). Applied Linear Regression. John Wiley and Sons.
Wess, S., Althoff, K.-D. & Derwand, G. (1994). Using k-d trees to improve the retrieval step in case-based reasoning. In Wess, S., Althoff, K.-D. & Richter, M. M. (eds.), Topics in Case-Based Reasoning, pp. 167–181. Springer-Verlag, New York, NY. Proceedings of the First European Workshop, EWCBR-93.
Google Scholar
Wettschereck, D. (1994). A Study Of Distance-Based Machine Learning Algorithms. PhD dissertation, Oregon State University, Department of Computer Science, Corvalis, OR.
Google Scholar
Wijnberg, L. & Johnson, T. (1985). Estimation of missing values in lead air quality data sets. In Johnson, T. R. & Penkala, S. J. (eds.), Quality Assurance in Air Pollution Measurements. Air Pollution Control Association, Pittsburgh, PA. TR-3: Transactions: An APCA International Specialty Conference.
Google Scholar
Wolberg, G. (1990). Digital Image Warping. IEEE Computer Society Press, Los Alamitos, CA.
Google Scholar
Yasunaga, M. & Kitano, H. (1993). Robustness of the memory-based reasoning implemented by wafer scale integration. IEICE Transactions on Information and Systems E76-D(3): 336–344.
Google Scholar
Zografski, Z. (1989). Neuromorphic, Algorithmic, and Logical Models for the Automatic Synthesis of Robot Action. PhD dissertation, University of Ljubljana, Ljubljana, Slovenia, Yugoslavia.
Google Scholar
Zografski, Z. (1991). New methods of machine learning for the construction of integrated neuromorphic and associative-memory knowledge bases. In Zajc, B. & Solina, F. (eds.), Proceedings, 6th Mediterranean Electrotechnical Conference, volume II, pp. 1150–1153, Ljubljana, Slovenia, Yugoslavia. IEEE catalog number 91CH2964–5.
Zografski, Z. (1992). Geometric and neuromorphic learning for nonlinear modeling, control and forecasting. In Proceedings of the 1992 IEEE International Symposium on Intelligent Control, pp. 158–163, Glasgow, Scotland. IEEE catalog number 92CH3110–4.
Zografski, Z. & Durrani, T. (1995). Comparing predictions from neural networks and memory-based learning. In Proceedings, ICANN '95/NEURONIMES '95: International Conference on Artificial Neural Networks, pp. 221–226, Paris, France.

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA, 30332-0280. E-mail
Christopher G. Atkeson & Stefan Schaal
ATR Human Information Processing Research Laboratories, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-02, Japan
Christopher G. Atkeson & Stefan Schaal
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213. E-mail
Andrew W. Moore

Authors

Christopher G. Atkeson
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schaal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atkeson, C.G., Moore, A.W. & Schaal, S. Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1997). https://doi.org/10.1023/A:1006559212014

Download citation

Issue Date: February 1997
DOI: https://doi.org/10.1023/A:1006559212014

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locally Weighted Learning

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Particle swarm optimization algorithm: an overview

A random forest guided tour

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Locally Weighted Learning

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Particle swarm optimization algorithm: an overview

A random forest guided tour

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation