A reinforcement connectionist approach to robot path finding in non-maze-like environments

Millán, José Del R.; Torras, Carme

doi:10.1007/BF00992702

A reinforcement connectionist approach to robot path finding in non-maze-like environments

Published: May 1992

Volume 8, pages 363–395, (1992)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A reinforcement connectionist approach to robot path finding in non-maze-like environments

Download PDF

José Del R. Millán¹ &
Carme Torras²

920 Accesses
28 Citations
Explore all metrics

Abstract

This paper presents a reinforcement connectionist system which finds and learns the suitable situation-action rules so as to generate feasible paths for a point robot in a 2D environment with circular obstacles. The basic reinforcement algorithm is extended with a strategy for discovering stable solution paths. Equipped with this strategy and a powerful codification scheme, the path-finder (i) learns quickly, (ii) deals with continuous-valued inputs and outputs, (iii) exhibits good noise-tolerance and generalization capabilities, (iv) copes with dynamic environments, and (v) solves an instance of the path finding problem with strong performance demands.

Avoid common mistakes on your manuscript.

References

Agre, P.E., & Chapman, D. (1987). Pengi: An implementation of a theory of activity.Proceedings of the Seventh AAAI Conference (pp. 268–272).
Anderson, C.W. (1986).Learning and problem solving with multilayer connectionist systems. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts, Amherst.
Google Scholar
Anderson, C.W. (1987). Strategy learning with multilayer connectionist representations.Proceedings of the Fourth International Workshop on Machine Learning (pp. 103–114).
Arkins, R.C. (1987). Motor schema based navigation for a mobile robot: An approach to programming by behavior.Proceedings of the IEEE International Conference on Robotics and Automation (pp. 264–271).
Barto, A.G. (1985). Learning by statistical cooperation of self-interested neuron-like computing elements.Human Neurobiology, 4, 229–256.
Google Scholar
Barto, A.G., & Anandan, P. (1985). Pattern-recognizing stochastic learning automata.IEEE Transactions on Systems, Man, and Cybernetics, 15, 360–374.
Google Scholar
Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems.IEEE Transactions on Systems, Man, and Cybernetics, 13, 835–846.
Google Scholar
Barto, A.G., Sutton, R.S., & Brouwer, P.S. (1981). Associative search network: A reinforcement learning associative memory.Biological Cybernetics, 40, 201–211.
Google Scholar
Barto, A.G., Sutton, R.S. & Watkins, C.J.C.H. (1989).Learning and sequential decision making (Technical Report COINS-89-95). University of Massachusetts, Amherst, MA: Dept. of Computer and Information Science.
Google Scholar
Blythe, J., & Mitchell, T.M. (1989). On becoming reactive.Proceedings of the Sixth International Workshop on Machine Learning (pp. 255–259).
Brady, M., Hollerbach, J.M., Johnson, T.L., Lozano-Pérez, T., & Mason, M.T., (Eds.) (1982).Robot motion: Planning and control. Cambridge, MA: MIT Press.
Google Scholar
Brooks, R.A. (1986). A robust layered control system for a mobile robot.IEEE Journal of Robotics and Automation, 2, 14–23.
Google Scholar
Canny, J.F. (1988).The complexiity of robot motion planning. Cambridge, MA: MIT Press.
Google Scholar
Chapman, D., & Kaelbling, L.P. (1990).Learning from delayed reinforcement in a complex domain (Technical Report 90-11). Palo Alto, CA: Teleos Research.
Google Scholar
Donald, B.R. (1987). A search algorithm for robot motion planning with six degrees of freedom.Artificial Intelligence, 31, 295–353.
Google Scholar
Graf, D.H., & LaLonde, W.R. (1988). A neural controller for collision-free movement of general robot manipulators.Proceedings of the IEEE Second International Conference on Neural Networks, Vol 1 (pp. 77–84).
Gullapalli, V. (1988).A stochastic algorithm for learning real-valued functions via reinforcement feedback (Technical Report COINS-88-91). University of Massachusetts, Amherst, MA: Dept. of Computer and Information Science.
Google Scholar
Ilari, J., & Torras, C. (1990). 2D path planning: A configuration space heuristic approach.The International Journal of Robotics Research, 9, 75–91.
Google Scholar
Jordan, M.I., & Jacobs, R.A. (1990). Learning to control an unstable system with forward modeling. In D.S. Touretzky (Ed.),Advances in neural information procesing systems 2, 324–331. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Jorgensen, C.C. (1987). Neural network representation of sensor graphs in autonomous robot navigation.Proceedings of the IEEE First International Conference on Neural Networks, Vol IV (pp. 507–515).
Khatib, O. (1986). Real time obstacle avoidance for manipulators and mobile robots.The International Journal of Robotics Research, 5, 90–98.
Google Scholar
Langley, P. (1985). Learning to search: From weak methods to domain-specific heuristics.Cognitive Science, 9, 217–260.
Google Scholar
Lin, L.-J. (1990). Self-improving reactive agents: Case studies of reinforcement learning frameworks.Proceedings of the First International Conference on the Simulation of Adaptive Behavior: From Animals to Animats (pp. 297–305).
Lozano-Pérez, T. (1983). Spatial planning: A configuration space approach.IEEE Transactions on Computers, 32, 108–120.
Google Scholar
Lozano-Pérez, T., & Wesley, M. (1979). An algorithm for planning collison-free paths among polyhedral obstacles.Communications of the ACM, 22, 560–570.
Google Scholar
Mahadevan, S., & Connell, J. (1990).Automatic programming of behavior-based robots using reinforcement learning (Technical Report RC 16359). Yorktown Heights, NY: IBM, T.J. Watson Research Center.
Google Scholar
Mel, B.W. (1989). MURPHY:A neurally-inspired connectionist approach to learning and performance in vision-based robot motion planning. Ph.D. Thesis, Graduate College, University of Illinois, Urbana-Champaign.
Google Scholar
Millán, J. del R., & Torras, C. (1990). Reinforcement learning: Discovering stable solutions in the robot path finding domain.Proceedings of the Ninth European Conference on Artificial Intelligence (pp. 219–221).
Millán, J. del R., & Torras, C. (1991a). Connectionist approaches to robot path finding. In O.M. Omidvar (Ed.),Progress in neural networks series, Vol 3. Norwood, NJ: Ablex.
Google Scholar
Millán, J. del R., & Torras, C. (1991b). Learning to avoid obstacles through reinforcement. In L. Birnbaum & G. Collins (Eds.)Machine learning: Proceedings of the Eighth International Workshop, 298–302. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Mozer, M.C., & Bachrach, J. (1989).Discovering the structure of a reactive environment by exploration (Technical Report CU-CS-451-89). Boulder, CO: University of Colorado, Dept. of Computer Science.
Google Scholar
Munro, P. (1987). A dual back-propagation scheme for scalar reward learning.Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 165–176).
Rivest, R.L., & Schapire, R.E. (1987). A new approach to unsupervised learning in deterministic environments.Proceedings of the Fourth International Workshop on Machine Learning (pp. 364–375).
Robinson, A.J. (1989).Dynamic error propagation networks. Ph.D. Thesis, Engineering Department, Cambridge University, Cambridge, England.
Google Scholar
Saerens, M., & Soquet, A. (1989). A neural controller.Proceedings of the First IEE International Conference on Artificial Neural Networks (pp. 211–215).
Schoppers, M.J. (1987). Universal plans for reactive robots in unpredictable environments.Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 1039–1046).
Singh, S.P. (1991). Transfer of learning across compositions of sequential tasks. In L. Birnbaum & G. Collins (Eds.)Machine learning: Proceedings of the Eighth International Workshop, 348–352. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Steels, L. (1988). Steps towards common sense.Proceedings of the Eighth European Conference on Artificial Intelligence (pp. 49–54).
Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. Ph.D. Thesis, Dept. of Computer and Information Science, University of Massachusetts, Amherst.
Google Scholar
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences.Machine Learning, 3, 9–44.
Google Scholar
Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming.Proceedings of the Seventh International Conference on Machine Learning (pp. 216–224).
Torras, C. (1990). Motion planning and control: Symbolic and neural levels of computation.Proceedings of the Third COGNITIVA Conference (pp. 207–218).
Watkins, C.J.C.H. (1989).Learning with delayed rewards. Ph.D. Thesis, Psychology Department, Cambridge University, Cambridge, England.
Google Scholar
Werbos, P.J. (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.IEEE Transactions on Systems, Man, and Cybernetics, 17, 7–20.
Google Scholar
Whitesides, S.H. (1985). Computational geometry and motion planning. In G. Toussaint (Ed.),Computational geometry. Amsterdam, New York, Oxford: North-Holland.
Google Scholar
Williams, R.J. (1986).Reinforcement learning in connectionist networks: A mathematical analysis (Technical Report ICS-8605). San Diego, CA: University of California, Institute for Cognitive Science.
Google Scholar
Williams, R.J. (1987).Reinforcement-learning connectionist systems (Technical Report NU-CCS-87-3). Northeastern University, Boston, MA: College of Computer Science.
Google Scholar
Yap, C.-K. (1987). Algorithmic motion planning. In J.T. Schwartz & C.-K. Yap (Eds.),Advances in robotics, Vol. I. Algorithmic and geometric aspects of robotics, 95–143. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for System Engineering and Informatics, Commission of the European Communities, Joint Research Centre, TP 361, 21020, Ispra (VA), Italy
José Del R. Millán
Institut de Cibernetica (CSIC-UPC), Diagonal, 647, 08028, Barcelona, Spain
Carme Torras

Authors

José Del R. Millán
View author publications
You can also search for this author in PubMed Google Scholar
Carme Torras
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millán, J.D.R., Torras, C. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Mach Learn 8, 363–395 (1992). https://doi.org/10.1007/BF00992702

Download citation

Issue Date: May 1992
DOI: https://doi.org/10.1007/BF00992702

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A reinforcement connectionist approach to robot path finding in non-maze-like environments

Abstract

Article PDF

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Reinforcement learning in robotic applications: a comprehensive survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A reinforcement connectionist approach to robot path finding in non-maze-like environments

Abstract

Article PDF

Similar content being viewed by others

Path Planning and Trajectory Planning Algorithms: A General Overview

A review of motion planning algorithms for intelligent robots

Reinforcement learning in robotic applications: a comprehensive survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation