Abstract
This paper surveys locally weighted learning, a form of lazy learning and memory-based learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, interference between old and new data, implementing locally weighted learning efficiently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Similar content being viewed by others
References
AAAI-91 (1991). Ninth National Conference on Artificial Intelligence. AAAI Press/The MIT Press, Cambridge, MA.
Aha, D. W. (1989). Incremental, instance-based learning of independent and graded concept descriptions. In Sixth International Machine Learning Workshop, pp. 387–391. Morgan Kaufmann, San Mateo, CA.
Aha, D. W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Observations. PhD dissertation, University of California, Irvine, Department of Information and Computer Science.
Aha, D. W. (1991). Incremental constructive induction: An instance-based approach. In Eighth International Machine Learning Workshop, pp. 117–121. Morgan Kaufmann, San Mateo, CA.
Aha, D. W. & Goldstone, R. L. (1990). Learning attribute relevance in context in instance-based learning algorithms. In 12th Annual Conference of the Cognitive Science Society, pp. 141–148. Lawrence Erlbaum, Cambridge, MA.
Aha, D. W. & Goldstone, R. L. (1992). Concept learning and flexible weighting. In 14th Annual Conference of the Cognitive Science Society, pp. 534–539, Bloomington, IL. Lawrence Erlbaum Associates, Mahwah, NJ.
Aha, D. W. & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Eleventh International Joint Conference on Artificial Intelligence, pp 794–799. Morgan Kaufmann, San Mateo, CA.
Aha, D. W. & McNulty, D. M. (1989). Learning relative attribute weights for instance-based concept descriptions. In 11th Annual Conference of the Cognitive Science Society, pp. 530–537. Lawrence Erlbaum Associates, Mahwah, NJ.
Aha, D. W. & Salzberg, S. L. (1993). Learning to catch: Applying nearest neighbor algorithms to dynamic control tasks. In Proceedings of the Fourth International Workshop on Artificial Intelligence and Statistics, pp. 363–368, Ft. Lauderdale, FL.
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46(3): 175–185.
Atkeson, C. G. (1990). Using local models to control movement. In Touretzky, D. S., editor, Advances In Neural Information Processing Systems 2, pp. 316–323. Morgan Kaufman, San Mateo, CA.
Atkeson, C. G. (1992). Memory-based approaches to approximating continuous functions. In Casdagli and Eubank (1992), pp. 503–521. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Atkeson, C. G. (1996). Local learning. http://www.cc.gatech.edu/fac/Chris.Atkeson/local-learning/.
Atkeson, C. G., Moore, A. W. & Schaal, S. (1997). Locally weighted learning for control. Artificial Intelligence Review, this issue.
Atkeson, C. G. & Reinkensmeyer, D. J. (1988). Using associative content-addressable memories to control robots. In Proceedings of the 27th IEEE Conference on Decision and Control, volume 1, pp. 792–797, Austin, Texas. IEEE Cat. No.88CH2531–2.
Atkeson, C. G. & Reinkensmeyer, D. J. (1989). Using associative content-addressable memories to control robots. In Proceedings, IEEE International Conference on Robotics and Automation, Scottsdale, Arizona.
Atkeson, C. G. & Schaal, S. (1995). Memory-based neural networks for robot learning. Neurocomputing 9: 243–269.
Baird, L. C. & Klopf, A. H. (1993). Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR–93–1147, Wright Laboratory, Wright-Patterson Air Force Base Ohio. http://kirk.usafa.af.mil/∼baird/papers/index.html.
Barnhill, R. E. (1977). Representation and approximation of surfaces. In Rice, J. R., editor, Mathematical Software III, pp. 69–120. Academic Press, New York, NY.
Batchelor, B. G. (1974). Practical Approach To Pattern Classification. Plenum Press, New York, NY.
Benedetti, J. K. (1977). On the nonparametric estimation of regression functions. Journal of the Royal Statistical Society, Series B 39: 248–253.
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9): 509–517.
Bentley, J. L. & Friedman, J. H. (1979). Data structures for range searching. ACM Comput. Surv. 11(4): 397–409.
Bentley, J. L., Weide, B. & Yao, A. (1980). Optimal expected time algorithms for closest point problems. ACM Transactions on Mathematical Software 6: 563–580.
Blyth, S. (1993). Optimal kernel weights under a power criterion. Journal of the American Statistical Association 88(424): 1284–1286.
Bottou, L. & Vapnik, V. (1992). Local learning algorithms. Neural Computation 4(6): 888–900.
Bregler, C. & Omohundro, S. M. (1994). Surface learning with applications to lipreading. In Cowan et al. (1994), pp. 43–50.
Brockmann, M., Gasser, T. & Herrmann, E. (1993). Locally adaptive bandwidth choice for kernel regression estimators. Journal of the American Statistical Association, 88(424): 1302–1309.
Broder, A. J. (1990). Strategies for efficient incremental nearest neighbor search. Pattern Recognition 23: 171–178.
Callan, J. P., Fawcett, T. E. & Rissland, E. L. (1991). CABOT: An adaptive approach to case based search. In IJCAI 12 (1991), pp. 803–808.
Casdagli, M. & Eubank, S. (eds.) (1992). Nonlinear Modeling and Forecasting. Proceedings Volume XII in the Santa Fe Institute Studies in the Sciences of Complexity. Addison Wesley, New York, NY. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Cheng, P. E. (1984). Strong consistency of nearest neighbor regression function estimators. Journal of Multivariate Analysis 15: 63–72.
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74: 829–836.
Cleveland, W. S. (1993a). Coplots, nonparametric regression, and conditionally parametric fits. Technical Report 19, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Cleveland, W. S. (1993b). Visualizing Data. Hobart Press, Summit, NJ. books@hobart.com.
Cleveland, W. S. & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83: 596–610.
Cleveland, W. S., Devlin, S. J. & Grosse, E. (1988). Regression by local fitting: Methods, properties, and computational algorithms. Journal of Econometrics 37: 87–114.
Cleveland, W. S. & Grosse, E. (1991). Computational methods for local regression. Statistics and Computing 1(1): 47–62. ftp://cm.bell-labs.com/cm/cs/doc/91/4–04.ps.gz.
Cleveland, W. S., Grosse, E. & Shyu, W. M. (1992). Local regression models. In Chambers, J. M. & Hastie, T. J. (eds.), Statistical Models in S, pp. 309–376. Wadsworth, Pacific Grove, CA. http://netlib.att.com/netlib/a/cloess.ps.Z.
Cleveland, W. S. & Loader, C. (1994a). Computational methods for local regression. Technical Report 11, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Cleveland, W. S. & Loader, C. (1994b). Local fitting for semiparametric (nonparametric) regression: Comments on a paper of Fan and Marron. Technical Report 8, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/, 94.8.ps, earlier version is 94.3.ps.
Cleveland, W. S. & Loader, C. (1994c). Smoothing by local regression: Principles and methods. Technical Report 95.3, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. http://netlib.att.com/netlib/att/stat/doc/.
Cleveland, W. S., Mallows, C. L. & McRae, J. E. (1993). ATS methods: Nonparametric regression for non-Gaussian data. Journal of the American Statistical Association 88(423): 821–835.
Connell, M. E. & Utgoff, P. E. (1987). Learning to control a dynamic physical system. In Sixth National Conference on Artificial Intelligence, pp. 456–460, Seattle, WA. Morgan Kaufmann, San Mateo, CA.
Cost, S. & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10(1): 57–78.
Coughran, Jr., W. M. & Grosse, E. (1991). Seeing and hearing dynamic loess surfaces. In Interface '91 Proceedings, pp. 224–228. Springer-Verlag. ftp://cm.bell-labs.com/cm/cs/doc/91/4–07.ps.gz or 4–07long.ps.gz.
Cowan, J. D., Tesauro, G. & Alspector, J. (eds.) (1994). Advances In Neural Information Processing Systems 6. Morgan Kaufman, San Mateo, CA.
Crain, I. K. & Bhattacharyya, B. K. (1967). Treatment of nonequispaced two dimensional data with a digital computer. Geoexploration 5: 173–194.
Deheuvels, P. (1977). Estimation non-paramétrique del la densité par histogrammes généralisés. Revue Statistique Appliqué 25: 5–42.
Deng, K. & Moore, A. W. (1995). Multiresolution instance-based learning. In Fourteenth International Joint Conference on Artificial Intelligence, pp. 1233–1239. Morgan Kaufmann, San Mateo, CA.
Dennis, J. E., Gay, D. M. & Welsch, R. E. (1981). An adaptive nonlinear least-squares algorithm. ACM Transactions on Mathematical Software 7(3): 369–383.
Devroye, L. (1981). On the almost everywhere convergence of nonparametric regression function estimates. The Annals of Statistics 9(6): 1310–1319.
Diebold, F. X. & Nason, J. A. (1990). Nonparametric exchange rate prediction? Journal of International Economics 28: 315–332.
Dietterich, T. G., Wettschereck, D., Atkeson, C. G. & Moore, A. W. (1994). Memory-based methods for regression and classification. In Cowan et al. (1994), pp. 1165–1166.
Draper, N. R. & Smith, H. (1981). Applied Regression Analysis. John Wiley, New York, NY, 2nd edition.
Elliot, T. & Scott, P. D. (1991). Instance-based and generalization-based learning procedures applied to solving integration problems. In Proceedings of the Eighth Conference of the Society for the Study of Artificial Intelligence, pp. 256–265, Leeds, England. Springer Verlag.
Epanechnikov, V. A. (1969). Nonparametric estimation of a multivariate probability density. Theory of Probability and Its Applications 14: 153–158.
Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York, NY.
Falconer, K. J. (1971). A general purpose algorithm for contouring over scattered data points. Technical Report NAC 6, National Physical Laboratory, Teddington, Middlesex, United Kingdon, TW11 0LW.
Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association 87(420): 998–1004.
Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Annals of Statistics 21: 196–216.
Fan, J. (1995). Local modeling. EES Update: written for the Encyclopedia of Statistics Science, http://www.stat.unc.edu/faculty/fan/papers.html.
Fan, J. & Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. The Annals of Statistics 20(4): 2008–2036.
Fan, J. & Gijbels, I. (1994). Censored regression: Local linear approximations and their applications. Journal of the American statistical Association 89: 560–570.
Fan, J. & Gijbels, I. (1995a). Adaptive order polynomial fitting: Bandwidth robustification and bias reduction. J. Comp. Graph. Statist. 4: 213–227.
Fan, J. & Gijbels, I. (1995b). Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society B 57: 371–394.
Fan, J. & Gijbels, I. (1996). Local Polynomial Modeling and its Applications. Chapman and Hall, London.
Fan, J. & Hall, P. (1994). On curve estimation by minimizing mean absolute deviation and its implications. The Annals of Statistics 22(2): 867–885.
Fan, J. & Kreutzberger, E. (1995). Automatic local smoothing for spectral density estimation. ftp://stat.unc.edu/pub/fan/spec.ps.
Fan, J. & Marron, J. S. (1993). Comment on [Hastie and Loader, 1993]. Statistical Science 8(2): 129–134.
Fan, J. & Marron, J. S. (1994a). Fast implementations of nonparametric curve estimators. Journal of Computational and Statistical Graphics 3: 35–56.
Fan, J. & Marron, J. S. (1994b). Rejoinder to discussion of Cleveland and Loader.
Farmer, J. D. & Sidorowich, J. J. (1987). Predicting chaotic time series. Physical Review Letters 59(8): 845–848.
Farmer, J. D. & Sidorowich, J. J. (1988a). Exploiting chaos to predict the future and reduce noise. In Lee, Y. C. (ed.), Evolution, Learning, and Cognition, pp. 277----World Scientific Press, NJ. also available as Technical Report LA-UR–88–901, Los Alamos National Laboratory, Los Alamos, New Mexico.
Farmer, J. D. & Sidorowich, J. J. (1988b). Predicting chaotic dynamics. In Kelso, J. A. S., Mandell, A. J. & Schlesinger, M. F. (eds.), Dynamic Patterns in Complex Systems, pp. 265–292. World Scientific, NJ.
Farwig, R. (1987). Multivariate interpolation of scattered data by moving least squares methods. In Mason, J. C. & Cox, M. G. (eds.), Algorithms for Approximation, pp. 193–211. Clarendon Press, Oxford.
Fedorov, V. V., Hackl, P. & Müller, W. G. (1993). Moving local regression: The weight function. Nonparametric Statistics 2(4): 355–368.
Franke, R. & Nielson, G. (1980). Smooth interpolation of large sets of scattered data. International Journal for Numerical Methods in Engineering 15: 1691–1704.
Friedman, J. H. (1984). A variable span smoother. Technical Report LCS 5, Stanford University, Statistics Department, Stanford, CA.
Friedman, J. H. (1994). Flexible metric nearest neighbor classification. http://playfair.stanford.edu/reports/friedman/.
Friedman, J. H., Bentley, J. L. & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3(3): 209–226.
Fritzke, B. (1995). Incremental learning of local linear mappings. In Proceedings of the International Conference on Artificial Neural Networks ICANN '95, pp. 217–222, Paris, France.
Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. Academic Press, New York, NY, second edition.
Gasser, T. & Müller, H. G. (1979). Kernel estimation of regression functions. In Gasser, T. & Rosenblatt, M. (eds.), Smoothing Techniques for Curve Estimation, number 757 in Lecture Notes in Mathematics, pp. 23–67. Springer-Verlag, Heidelberg.
Gasser, T. & Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandanavian Journal of Statistics 11: 171–185.
Gasser, T., Müller, H. G. & Mammitzsch, V. (1985). Kernels for nonparametric regression. Journal of the Royal Statistical Society, Series B 47: 238–252.
Ge, Z., Cavinato, A. G. & Callis, J. B. (1994). Noninvasive spectroscopy for monitoring cell density in a fermentation process. Analytical Chemistry 66: 1354–1362.
Goldberg, K. Y. & Pearlmutter, B. (1988). Using a neural network to learn the dynamics of the CMU Direct-Drive Arm II. Technical Report CMU-CS–88–160, Carnegie-Mellon University, Pittsburgh, PA.
Gorinevsky, D. & Connolly, T. H. (1994). Comparison of some neural network and scattered data approximations: The inverse manipulator kinematics example. Neural Computation 6: 521–542.
Goshtasby, A. (1988). Image registration by local approximation methods. Image and Vision Computing 6(4): 255–261.
Grosse, E. (1989). LOESS: Multivariate smoothing by moving least squares. In Chui, C. K., Schumaker, L. L. & Ward, J. D. (eds.), Approximation Theory VI, pp. 1–4. Academic Press, Boston, MA.
Hammond, S. V. (1991). Nir analysis of antibiotic fermentations. In Murray, I. & Cowe, I. A. (eds.), Making Light Work: Advances in Near Infrared Spectroscopy, pp. 584–589. VCH: New York, NY. Developed from the 4th International Conference on Near Infrared Spectroscopy, Aberdeen, Scotland, August 19–23, 1991.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986). Robust Statistics: The Approach Based On Influence Functions. John Wiley, New York, NY.
Härdle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, New York, NY.
Hastie, T. & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science 8(2): 120–143.
Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Regression. Chapman Hall, London.
Hastie, T. J. & Tibshirani, R. J. (1994). Discriminant adaptive nearest neighbor classification. ftp://playfair.Stanford.EDU/pub/hastie/dann.ps.Z.
Higuchi, T., Kitano, H., Furuya, T., ichi Handa, K., Takahashi, N. & Kokubu, A. (1991). IXM2: A parallel associative processor for knowledge processing. In AAAI-9 (1991), pp. 296–303.
Hillis, D. (1985). The Connection Machine. MIT Press, Cambridge, MA.
Huang, P. S. (1996). Planning For Dynamic Motions Using A Search Tree. MS thesis, University of Toronto, Graduate Department of Computer Science. http://www.dgp.utoronto.ca/people/psh/home.html.
IJCAI 12 (1991). Twelfth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
IJCAI 13 (1993). Thirteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA.
Jabbour, K., Riveros, J. F. W., Landsbergen, D. & Meyer, W. (1987). ALFA: Automated load forecasting assistant. In Proceedings of the 1987 IEEE Power Engineering Society Summer Meeting, San Francisco, CA.
James, M. (1985). Classification Algorithms. John Wiley and Sons, New York, NY.
Jones, M. C., Davies, S. J. & Park, B. U. (1994). Versions of kernel-type regression estimators. Journal of the American Statistical Association 89(427): 825–832.
Karalič, A. (1992). Employing linear regression in regression tree leaves. In Neumann, B. (ed.), ECAI 92: 10th European Conference on Artificial Intelligence, pp. 440–441, Vienna, Austria. John Wiley and Sons.
Katkovnik, V. Y. (1979). Linear and nonlinear methods of nonparametric regression analysis. Soviet Automatic Control 5: 25–34.
Kazmierczak, H. & Steinbuch, K. (1963). Adaptive systems in pattern recognition. IEEE Transactions on Electronic Computers EC-12: 822–835.
Kibler, D., Aha, D. W. & Albert, M. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence 5: 51–57.
Kitano, H. (1993a). Challenges of massive parallelism. In IJCAI 13 (1993), pp. 813–834.
Kitano, H. (1993b). A comprehensive and practical model of memory-based machine translation. In IJCAI 13 (1993), pp. 1276–1282.
Kitano, H. & Higuchi, T. (1991a). High performance memory-based translation on IXM2 massively parallel associative memory processor. In AAAI-9 (1991), pp. 149–154.
Kitano, H. & Higuchi, T. (1991b). Massively parallel memory-based parsing. In IJCAI 12 (1991), pp. 918–924.
Kitano, H., Moldovan, D. & Cha, S. (1991). High performance natural language processing on semantic network array processor. In IJCAI 12 (1991), pp. 911–917.
Kozek, A. S. (1992). A new nonparametric estimation method: Local and nonlinear. Interface 24: 389–393.
Lancaster, P. (1979). Moving weighted least-squares methods. In Sahney, B. N. (ed.), Polynomial and Spline Approximation, pp. 103–120. D. Reidel Publishing, Boston, MA.
Lancaster, P. & Šalkauskas, K. (1981). Surfaces generated by moving least squares methods. Mathematics of Computation 37(155): 141–158.
Lancaster, P. & Šalkauskas, K. (1986). Curve And Surface Fitting. Academic Press, New York, NY.
Lawrence, S., Tsoi, A. C. & Black, A. D. (1996). Function approximation with neural networks and local methods: Bias, variance and smoothness. In Australian Conference on Neural Networks, Canberra, Australia, Canberra, Australia. available from http://www.neci.nj.nec.com/homepages/lawrence and http://www.elec.uq.edu.au/∼lawrence.
LeBaron, B. (1990). Forecast improvements using a volatility index. Unpublished.
LeBaron, B. (1992). Nonlinear forecasts for the S&P stock index. In Casdagli and Eubank (1992), pp. 381–393. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Legg, M. P. C. & Brent, R. P. (1969). Automatic contouring. In 4th Australian Computer Conference, pp. 467–468.
Lejeune, M. (1984). Optimization in non-parametric regression. In COMPSTAT 1984: Proceedings in Computational Statistics, pp. 421–426, Prague. Physica-Verlag Wien.
Lejeune, M. (1985). Estimation non-paramétrique par noyaux: Régression polynômial mobile. Revue de Statistique Appliquée 23(3): 43–67.
Lejeune, M. & Sarda, P. (1992). Smooth estimators of distribution and density functions. Computational Statistics & Data Analysis 14: 457–471.
Li, K. C. (1984). Consistency for cross-validated nearest neighbor estimates in nonparametric regression. The Annals of Statistics 12: 230–240.
Loader, C. (1994). Computing nonparametric function estimates. Technical Report 7, AT&T Bell Laboratories, Statistics Department, Murray Hill, NJ. Available by anonymous FTP from netlib.att.com in /netlib/att/stat/doc/94/7.ps.
Lodwick, G. D. & Whittle, J. (1970). A technique for automatic contouring field survey data. Australian Computer Journal 2: 104–109.
Lowe, D. G. (1995). Similarity metric learning for a variable-kernel classifier. Neural Computation 7: 72–85.
Maron, O. & Moore, A. W. (1997). The racing algorithm: Model selection for lazy learners. Artificial Intelligence Review, this issue.
Marron, J. S. (1988). Automatic smoothing parameter selection: A survey. Empirical Economics 13: 187–208.
McCallum, R. A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Prieditis & Russell (eds.) (1995), pp. 387–395.
McIntyre, D. B., Pollard, D. D. & Smith, R. (1968). Computer programs for automatic contouring. Technical Report Kansas Geological Survey Computer Contributions 23, University of Kansas, Lawrence, KA.
McLain, D. H. (1974). Drawing contours from arbitrary data points. The Computer Journal 17(4): 318–324.
Medin, D. L. & Shoben, E. J. (1988). Context and structure in conceptual combination. Cognitive Psychology 20: 158–190.
Meese, R. & Wallace, N. (1991). Nonparametric estimation of dynamic hedonic price models and the construction of residential housing price indices. American Real Estate and Urban Economics Association Journal 19(3): 308–332.
Meese, R. A. & Rose, A. K. (1990). Nonlinear, nonparametric, nonessential exchange rate estimation. The American Economic Review May: 192–196.
Miller, A. J. (1990). Subset Selection in Regression. Chapman and Hall, London.
Miller, W. T., Glanz, F. H. & Kraft, L. G. (1987). Application of a general learning algorithm to the control of robotic manipulators. International Journal of Robotics Research 6: 84–98.
Mohri, T. & Tanaka, H. (1994). An optimal weighting criterion of case indexing for both numeric and symbolic attributes. In Aha, D. W. (ed.), AAAI-94 Workshop Program: Case-Based Reasoning, Working Notes, pp. 123–127. AAAI Press, Seattle, WA.
Moore, A. W. (1990a). Acquisition of Dynamic Control Knowledge for a Robotic Manipulator. In Seventh International Machine Learning Workshop. Morgan Kaufmann, San Mateo, CA.
Moore, A. W. (1990b). Efficient Memory-based Learning for Robot Control. PhD. Thesis; Technical Report No. 209, Computer Laboratory, University of Cambridge.
Moore, A. W., Hill, D. J. & Johnson, M. P. (1992). An empirical investigation of brute force to choose features, smoothers, and function approximators. In Hanson, S., Judd, S. & Petsche, T. (eds.), Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press, Cambridge, MA.
Moore, A. W. & Schneider, J. (1995). Memory-based stochastic optimization. To appear in the proceedings of NIPS-95, Also available as Technical Report CMU-RI-TR–95–30, ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/project/reinforcement/papers/memstoch.ps.
More, J. J., Garbow, B. S. & Hillstrom, K. E. (1980). User guide for MINPACK-1. Technical Report ANL–80–74, Argonne National Laboratory, Argonne, Illinois.
Müller, H.-G. (1987). Weighted local regression and kernel methods for nonparametric curve fitting. Journal of the American Statistical Association 82: 231–238.
Müller, H.-G. (1993). Comment on [Hastie and Loader, 1993]. Statistical Science 8(2): 134–139.
Murphy, O. J. & Selkow, S. M. (1986). The efficiency of using k-d trees for finding nearest neighbors in discrete space. Information Processing Letters 23: 215–218.
Myers, R. H. (1990). Classical and Modern Regression With Applications. PWS-KENT, Boston, MA.
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and Its Applications 9: 141–142.
Næs, T. & Isaksson, T. (1992). Locally weighted regression in diffuse near-infrared transmittance spectroscopy. Applied Spectroscopy 46(1): 34–43.
Næs, T., Isaksson, T. & Kowalski, B. R. (1990). Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry 62(7): 664–673.
Nguyen, T., Czerwinsksi, M. & Lee, D. (1993). COMPAQ Quicksource: Providing the consumer with the power of artificial intelligence. In Proceedings of the Fifth Annual Conference on Innovative Applications of Artificial Intelligence, pp. 142–150, Washington, DC. AAAI Press.
Nosofsky, R. M., Clark, S. E. & Shin, H. J. (1989). Rules and exemplars in categorization, identification, and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition 15: 282–304.
Omohundro, S. M. (1987). Efficient Algorithms with Neural Network Behaviour. Journal of Complex Systems 1(2): 273–347.
Omohundro, S. M. (1991). Bumptrees for Efficient Function, Constraint, and Classification Learning. In Lippmann, R. P., Moody, J. E. & Touretzky, D. S. (eds.), Advances in Neural Information Processing Systems 3. Morgan Kaufmann.
Palmer, J. A. B. (1969). Automatic mapping. In 4th Australian Computer Conference, pp. 463–466.
Pelto, C. R., Elkins, T. A. & Boyd, H. A. (1968). Automatic contouring of irregularly spaced data. Geophysics 33: 424–430.
Peng, J. (1995). Efficient memory-based dynamic programming. In Prieditis & Russell (eds.) (1995), pp. 438–446.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1988). Numerical Recipes in C. Cambridge University Press, New York, NY.
Prieditis, A. & Russell, S. (eds.) (1995). Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Mateo, CA.
Rachlin, J., Kasif, S., Salzberg, S. & Aha, D. W. (1994). Towards a better understanding of memory-based reasoning systems. In Eleventh International Conference on Machine Learning, pp. 242–250. Morgan Kaufmann, San Mateo, CA.
Racine, J. (1993). An efficient cross-validation algorithm for window width selection for non-parametric kernel regression. Communications in Statistics: Simulation and Computation 22(4): 1107–1114.
Ramasubramanian, V. & Paliwal, K. K. (1989). A generalized optimization of the k-d tree for fast nearest-neighbour search. In International Conference on Acoustics, Speech, and Signal Processing.
Raz, J., Turetsky, B. I. & Fein, G. (1989). Selecting the smoothing parameter for estimation of smoothly changing evoked potential signals. Biometrics 45: 851–871.
Renka, R. J. (1988). Multivariate interpolation of large sets of scattered data. ACM Transactions on Mathematical Software 14(2): 139–152.
Ruppert, D. & Wand, M. P. (1994). Multivariate locally weighted least squares regression. The Annals of Statistics 22(3): 1346–1370.
Ruprecht, D. & Müller, H. (1992). Image warping with scattered data interpolation methods. Technical Report 443, Universität Dortmund, Fachbereich Informatik, D-44221 Dortmund, Germany. Available for anonymous FTP from ftp-1s7.informatik.uni-dortmund.de in pub/reports/ls7/rr-443.ps.Z.
Ruprecht, D. & Müller, H. (1993). Free form deformation with scattered data interpolation methods. In Farin, G., Hagen, H. & Noltemeier, H. (eds.), Geometric Modelling (Computing Suppl. 8), pp. 267–281. Springer Verlag. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/iif/rr-41.ps.Z.
Ruprecht, D. & Müller, H. (1994a). Deformed cross-dissolves for image interpolation in scientific visualization. The Journal of Visualization and Computer Animation 5(3): 167–181. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-491.ps.Z.
Ruprecht, D. & Müller, H. (1994b). A framework for generalized scattered data interpolation. Technical Report 517, Universität Dortmund, Fachbereich Informatik, D-44221 Dortmund, Germany. Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-517.ps.Z.
Ruprecht, D., Nagel, R. & Müller, H. (1994). Spatial free form deformation with scattered data interpolation methods. Technical Report 539, Fachbereich Informatik der Universität Dortmund, 44221 Dortmund, Germany. Accepted for publication by Computers & Graphics, Available for anonymous FTP from ftp-ls7.informatik.uni-dortmund.de in pub/reports/ls7/rr-539.ps.Z.
Rust, R. T. & Bornman, E. O. (1982). Distribution-free methods of approximating nonlinear marketing relationships. Journal of Marketing Research XIX: 372–374.
Sabin, M. A. (1980). Contouring — a review of methods for scattered data. In Brodlie, K. (ed.), Mathematical Methods in Computer Graphics and Design, pp. 63–86. Academic Press, New York, NY.
Saitta, L. (ed.) (1996). Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Mateo, CA.
Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA.
Schaal, S. & Atkeson, C. G. (1994). Assessing the quality of learned local models. In Cowan et al. (1994), pp. 160–167.
Schaal, S. & Atkeson, C. G. (1995). From isolation to cooperation: An alternative view of a system of experts. NIPS95 proceedings, in press.
Scott, D. W. (1992). Multivariate Density Estimation. Wiley, New York, NY.
Seber, G. A. F. (1977). Linear Regression Analysis. John Wiley, New York, NY.
Seifert, B., Brockmann, M., Engel, J. & Gasser, T. (1994). Fast algorithms for nonparametric curve estimation. Journal of Computational and Graphical Statistics 3(2): 192–213.
Seifert, B. & Gasser, T. (1994). Variance properties of local polynomials. http://www.unizh.ch/biostat/manuscripts.html.
Shepard, D. (1968). A two-dimensional function for irregularly spaced data. In 23rd ACM National Conference, pp. 517–524.
Solow, A. R. (1988). Detecting changes through time in the variance of a long-term hemispheric temperature record: An application of robust locally weighted regression. Journal of Climate 1: 290–296.
Specht, D. E. (1991). A general regression neural network. IEEE Transactions on Neural Networks 2(6): 568–576.
Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-d trees. Algorithmica 6: 579–589.
Stanfill, C. (1987). Memory-based reasoning applied to English pronunciation. In Sixth National Conference on Artificial Intelligence, pp. 577–581.
Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM 29(12): 1213–1228.
Steinbuch, K. (1961). Die lernmatrix. Kybernetik 1: 36–45.
Steinbuch, K. & Piske, U. A. W. (1963). Learning matrices and their applications. IEEE Transactions on Electronic Computers EC-12: 846–862.
Stone, C. J. (1975). Nearest neighbor estimators of a nonlinear regression function. In Computer Science and Statistics: 8th Annual Symposium on the Interface, pp. 413–418.
Stone, C. J. (1977). Consistent nonparametric regression. The Annals of Statistics 5: 595–645.
Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. The Annals of Statistics 8: 1348–1360.
Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics 10(4): 1040–1053.
Sumita, E., Oi, K., Furuse, O., Iida, H., Higuchi, T., Takahashi, N. & Kitano, H. (1993). Example-based machine translation on massively parallel processors. In IJCAI 13 (1993), pp. 1283–1288.
Tadepalli, P. & Ok, D. (1996). Scaling up average reward reinforcement learning by approximating the domain models and the value function. In Saitta (1996). http://www.cs.orst.edu:80/∼tadepall/research/publications.html.
Tamada, T., Maruyama, M., Nakamura, Y., Abe, S. & Maeda, K. (1993). Water demand forecasting by memory based learning. Water Science and Technology 28(11–12): 133–140.
Taylor, W. K. (1959). Pattern recognition by means of automatic analogue apparatus. Proceedings of The Institution of Electrical Engineers 106B: 198–209.
Taylor, W. K. (1960). A parallel analogue reading machine. Control 3: 95–99.
Thorpe, S. (1995). Localized versus distributed representations. In Arbib, M. A. (ed.), The Handbook of Brain Theory and Neural Networks, pp. 549–552. The MIT Press, Cambridge, MA.
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems (NIPS) 8. http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/thrun/publications.html.
Thrun, S. & O'Sullivan, J. (1996). Discovering structure in multiple learning tasks: The TC algorithm. In Saitta (1996). http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/thrun/publications.html.
Tibshirani, R. & Hastie, T. (1987). Local likelihood estimation. Journal of the American Statistical Association 82: 559–567.
Ting, K. M. & Cameron-Jones, R. M. (1994). Exploring a framework for instance based learning and naive Bayesian classifiers. In Proceedings of the Seventh Australian Joint Conference on Artificial Intelligence, Armidale, Australia. World Scientific.
Tou, J. T. & Gonzalez, R. C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA.
Townshend, B. (1992). Nonlinear prediction of speech signals. In Casdagli and Eubank (1992), pp. 433–453. Proceedings of a Workshop on Nonlinear Modeling and Forecasting September 17–21, 1990, Santa Fe, New Mexico.
Tsybakov, A. B. (1986). Robust reconstruction of functions by the local approximation method. Problems of Information Transmission 22: 133–146.
Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
Turetsky, B. I., Raz, J. & Fein, G. (1989). Estimation of trial-to-trial variation in evoked potential signals by smoothing across trials. Psychophysiology 26(6): 700–712.
Turlach, B. A. & Wand, M. P. (1995). Fast computation of auxiliary quantities in local polynomial regression. http://netec.wustl.edu/∼adnetec/WoPEc/agsmst/agsmst95009.html.
van der Smagt, P., Groen, F. & van het Groenewoud, F. (1994). The locally linear nested network for robot manipulation. In Proceedings of the IEEE International Conference on Neural Networks, pp. 2787–2792. ftp://ftp.fwi.uva.nl/pub/computer-systems/aut-sys/reports/SmaGroGro94b.ps.gz.
Vapnik, V. (1992). Principles of risk minimization for learning theory. In Moody, J. E., Hanson, S. J. & Lippmann, R. P. (eds.), Advances In Neural Information Processing Systems 4, pp. 831–838. Morgan Kaufman, San Mateo, CA.
Vapnik, V. & Bottou, L. (1993). Local algorithms for pattern recognition and dependencies estimation. Neural Computation 5(6): 893–909.
Walden, A. T. & Prescott, P. (1983). Identification of trends in annual maximum sea levels using robust locally weighted regression. Estuarine, Coastal and Shelf Science 16: 17–26.
Walters, R. F. (1969). Contouring by machine: A user's guide. American Association of Petroleum Geologists Bulletin 53(11): 2324–2340.
Waltz, D. L. (1987). Applications of the Connection Machine. Computer 20(1): 85–97.
Wand, M. P. & Jones, M. C. (1993). Comparison of smoothing parameterizations in bivariate kernel density estimation. Journal of the American Statistical Association 88: 520–528.
Wand, M. P. & Jones, M. C. (1994). Kernel Smoothing. Chapman and Hall, London.
Wand, M. P. & Schucany, W. R. (1990). Gaussian-based kernels for curve estimation and window width selection. Canadian Journal of Statistics 18: 197–204.
Wang, Z., Isaksson, T. & Kowalski, B. R. (1994). New approach for distance measurement in locally weighted regression. Analytical Chemistry 66(2): 249–260.
Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 26: 359–372.
Weisberg, S. (1985). Applied Linear Regression. John Wiley and Sons.
Wess, S., Althoff, K.-D. & Derwand, G. (1994). Using k-d trees to improve the retrieval step in case-based reasoning. In Wess, S., Althoff, K.-D. & Richter, M. M. (eds.), Topics in Case-Based Reasoning, pp. 167–181. Springer-Verlag, New York, NY. Proceedings of the First European Workshop, EWCBR-93.
Wettschereck, D. (1994). A Study Of Distance-Based Machine Learning Algorithms. PhD dissertation, Oregon State University, Department of Computer Science, Corvalis, OR.
Wijnberg, L. & Johnson, T. (1985). Estimation of missing values in lead air quality data sets. In Johnson, T. R. & Penkala, S. J. (eds.), Quality Assurance in Air Pollution Measurements. Air Pollution Control Association, Pittsburgh, PA. TR-3: Transactions: An APCA International Specialty Conference.
Wolberg, G. (1990). Digital Image Warping. IEEE Computer Society Press, Los Alamitos, CA.
Yasunaga, M. & Kitano, H. (1993). Robustness of the memory-based reasoning implemented by wafer scale integration. IEICE Transactions on Information and Systems E76-D(3): 336–344.
Zografski, Z. (1989). Neuromorphic, Algorithmic, and Logical Models for the Automatic Synthesis of Robot Action. PhD dissertation, University of Ljubljana, Ljubljana, Slovenia, Yugoslavia.
Zografski, Z. (1991). New methods of machine learning for the construction of integrated neuromorphic and associative-memory knowledge bases. In Zajc, B. & Solina, F. (eds.), Proceedings, 6th Mediterranean Electrotechnical Conference, volume II, pp. 1150–1153, Ljubljana, Slovenia, Yugoslavia. IEEE catalog number 91CH2964–5.
Zografski, Z. (1992). Geometric and neuromorphic learning for nonlinear modeling, control and forecasting. In Proceedings of the 1992 IEEE International Symposium on Intelligent Control, pp. 158–163, Glasgow, Scotland. IEEE catalog number 92CH3110–4.
Zografski, Z. & Durrani, T. (1995). Comparing predictions from neural networks and memory-based learning. In Proceedings, ICANN '95/NEURONIMES '95: International Conference on Artificial Neural Networks, pp. 221–226, Paris, France.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Atkeson, C.G., Moore, A.W. & Schaal, S. Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1997). https://doi.org/10.1023/A:1006559212014
Issue Date:
DOI: https://doi.org/10.1023/A:1006559212014