ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (38)
  • reinforcement learning  (38)
  • Springer  (38)
  • American Chemical Society
  • 1995-1999  (38)
  • Computer Science  (38)
  • 1
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 123-158 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, thereplacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 28 (1997), S. 169-210 
    ISSN: 0885-6125
    Keywords: Explanation-based learning ; reinforcement learning ; dynamic programming ; goal regression ; speedup learning ; incomplete theory problem ; intractable theory problem
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a state-by-state basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a region-by-region basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of point-based dynamic programming and to standard EBL. The results show that region-based dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    Springer
    Journal of intelligent and robotic systems 23 (1998), S. 165-182 
    ISSN: 1573-0409
    Keywords: compliance tasks ; reinforcement learning ; robust control
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract The complexity in planning and control of robot compliance tasks mainly results from simultaneous control of both position and force and inevitable contact with environments. It is quite difficult to achieve accurate modeling of the interaction between the robot and the environment during contact. In addition, the interaction with the environment varies even for compliance tasks of the same kind. To deal with these phenomena, in this paper, we propose a reinforcement learning and robust control scheme for robot compliance tasks. A reinforcement learning mechanism is used to tackle variations among compliance tasks of the same kind. A robust compliance controller that guarantees system stability in the presence of modeling uncertainties and external disturbances is used to execute control commands sent from the reinforcement learning mechanism. Simulations based on deburring compliance tasks demonstrate the effectiveness of the proposed scheme.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    Springer
    Journal of intelligent and robotic systems 21 (1998), S. 221-238 
    ISSN: 1573-0409
    Keywords: artificial neural network ; dynamic control ; reinforcement learning ; robot control
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract Conventional robot control schemes are basically model-based methods. However, exact modeling of robot dynamics poses considerable problems and faces various uncertainties in task execution. This paper proposes a reinforcement learning control approach for overcoming such drawbacks. An artificial neural network (ANN) serves as the learning structure, and an applied stochastic real-valued (SRV) unit as the learning method. Initially, force tracking control of a two-link robot arm is simulated to verify the control design. The simulation results confirm that even without information related to the robot dynamic model and environment states, operation rules for simultaneous controlling force and velocity are achievable by repetitive exploration. Hitherto, however, an acceptable performance has demanded many learning iterations and the learning speed proved too slow for practical applications. The approach herein, therefore, improves the tracking performance by combining a conventional controller with a reinforcement learning strategy. Experimental results demonstrate improved trajectory tracking performance of a two-link direct-drive robot manipulator using the proposed method.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Electronic Resource
    Electronic Resource
    Springer
    Journal of intelligent and robotic systems 12 (1995), S. 103-125 
    ISSN: 1573-0409
    Keywords: Machine learning ; reinforcement learning ; intelligent control ; machine tool ; tool monitoring ; metal cutting ; manufacturing
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract This paper deals with the issue of automatic learning and recognition of various conditions of a machine tool. The ultimate goal of the research discussed in this paper is to develop a comparehensive monitor and control (M&C) system that can substitute for the expert machinist and perform certain critical in-process tasks to assure quality production. The M&C system must reliably recognize and respond to qualitatively different behaviours of the machine tool, learn new behaviors, respond faster than its human counterpart to quality threatening circumstances, and interface with an existing controller. The research considers a series of face-milling anomalies that were subsequently simulated and used as a first step towards establishing the feasibility of employing machine learning as an integral component of the intelligent controller. We address the question of feasibility in two steps. First, it is important to know if the process models (dull tool, broken tool, etc.) can be learned (model learning). And second, if the models are learned, can an algorithm reliably select an appropriate model (distinguish between dull and broken tools) based on input from the model learner and from the sensors (model selection). The results of the simulation-based tests demonstrate that the milling-process anomalies can be learned, and the appropriate model can be reliably selected. Such a model can be subsequently utilized to make compensating in-process machine-tool adjustments. In addition, we observed that the learning curve need not approach the 100% level to be functional.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Electronic Resource
    Electronic Resource
    Springer
    Journal of intelligent and robotic systems 21 (1998), S. 51-71 
    ISSN: 1573-0409
    Keywords: Q-learning algorithm ; reinforcement learning ; experience generalisation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle – learning through experimentation – that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 31 (1998), S. 55-85 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; module-based RL ; robot learning ; problem decomposition ; Markovian Decision Problems ; feature space ; subgoals ; local control ; switching control
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to:i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 33 (1998), S. 105-115 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; Q-learning ; TD(λ) ; online Q(λ) ; lazy learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 33 (1998), S. 201-233 
    ISSN: 0885-6125
    Keywords: Markov games ; differential games ; pursuit games ; multiagent learning ; reinforcement learning ; Q-learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multiagent learning in the context of game playing and develop algorithms for “co-learning” in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a tree-based reinforcement learner.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    ISSN: 0885-6125
    Keywords: reinforcement learning ; vision ; learning from easy mission ; state-action deviation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a “state-action deviation” problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 28 (1997), S. 77-104 
    ISSN: 0885-6125
    Keywords: Continual learning ; transfer ; reinforcement learning ; sequence learning ; hierarchical neural networks
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 123-158 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    ISSN: 0885-6125
    Keywords: inductive bias ; reinforcement learning ; reward acceleration ; Levin search ; success-story algorithm ; incremental self-improvement
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the “success-story algorithm” (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger long-term reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies involve complex, partially observable environments where traditional reinforcement learning fails.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 283-290 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 32 (1998), S. 5-40 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 35 (1999), S. 155-185 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; multi-agent systems ; planning ; evolutionary economics ; tragedy of the commons ; classifier systems ; agoric systems ; autonomous programming ; cognition ; artificial intelligence ; Hayek ; complex adaptive systems ; temporal difference learning ; evolutionary computation ; economic models of mind ; economic models of computation ; Blocks World ; reasoning ; learning ; computational learning theory ; learning to reason ; meta-reasoning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called “The Hayek Machine” is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    ISSN: 0885-6125
    Keywords: reinforcement learning ; vision ; learning from easy mission ; state-action deviation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a state-action deviation problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 19 (1995), S. 209-240 
    ISSN: 0885-6125
    Keywords: learning classifier systems ; reinforcement learning ; genetic algorithms ; animat problem
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 59-94 
    ISSN: 0885-6125
    Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 19 (1995), S. 209-240 
    ISSN: 0885-6125
    Keywords: learning classifier systems ; reinforcement learning ; genetic algorithms ; animat problem
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 35 (1999), S. 117-154 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; exploration vs. exploitation dilemma ; Markov decision processes ; bandit problems
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Q-learning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The first is to define a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suffers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular configurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and back-propagating them with the dynamic programming or temporal difference mechanisms. This allows reproducing global-scale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the efficiency of these propositions.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 283-290 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 22 (1996), S. 59-94 
    ISSN: 0885-6125
    Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Electronic Resource
    Electronic Resource
    Springer
    Applied intelligence 11 (1999), S. 109-127 
    ISSN: 1573-7497
    Keywords: hybrid models ; sequential decision making ; neural networks ; reinforcement learning ; cognitive modeling
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract In developing autonomous agents, one usually emphasizes only (situated) procedural knowledge, ignoring more explicit declarative knowledge. On the other hand, in developing symbolic reasoning models, one usually emphasizes only declarative knowledge, ignoring procedural knowledge. In contrast, we have developed a learning model CLARION, which is a hybrid connectionist model consisting of both localist and distributed representations, based on the two-level approach proposed in [40]. CLARION learns and utilizes both procedural and declarative knowledge, tapping into the synergy of the two types of processes, and enables an agent to learn in situated contexts and generalize resulting knowledge to different scenarios. It unifies connectionist, reinforcement, and symbolic learning in a synergistic way, to perform on-line, bottom-up learning. This summary paper presents one version of the architecture and some results of the experiments.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Electronic Resource
    Electronic Resource
    Springer
    Applied intelligence 11 (1999), S. 241-258 
    ISSN: 1573-7497
    Keywords: analysis strategies ; limited resources ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Intelligent data analysis implies the reasoned application of autonomous or semi-autonomous tools to data sets drawn from problem domains. Automation of this process of reasoning about analysis (based on factors such as available computational resources, cost of analysis, risk of failure, lessons learned from past errors, and tentative structural models of problem domains) is highly non-trivial. By casting the problem of reasoning about analysis (MetaReasoning) as yet another data analysis problem domain, we have previously [R. Levinson and J. Wilkinson, in Advances in Intelligent Data Analysis, edited by X. Liu, P. Cohen, and M. Berthold, volume LNCS 1280, Springer-Verlag, Berlin, pp. 89–100, 1997] presented a design framework, MetaReasoning for Data Analysis Tool Allocation (MRDATA). Crucial to this framework is the ability of a Tool Allocator to track resource consumption (i.e. processor time and memory usage) by the Tools it employs, as well as the ability to allocate measured quantities of resources to these Tools. In order to test implementations of the MRDATA design, we now implement a Runtime Environment for Data Analysis Tool Allocation, RE:DATA. Tool Allocators run as processes under RE:DATA, are allotted system resources, and may use these resources to run their Tools as spawned sub-processes. We also present designs of native RE:DATA implementations of analysis tools used by MRDATA: K-Nearest Neighbor Tables, Regression Trees, Interruptible (“Any-Time”) Regression Trees, and “Hierarchy Diffusion” Temporal Difference Learners. Preliminary results are discussed and techniques for integration with non-native tools are explored.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 5 (1998), S. 273-295 
    ISSN: 1573-7527
    Keywords: reinforcement learning ; module-based RL ; robot learning ; problem decomposition ; Markovian decision problems ; feature space ; subgoals ; local control ; switching control
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: (i) decompose the task into subtasks using the qualitative knowledge at hand; (ii) design local controllers to solve the subtasks using the available quantitative knowledge, and (iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to nonadaptive ones in complex environments.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 6 (1999), S. 281-292 
    ISSN: 1573-7527
    Keywords: mobile robotics ; reinforcement learning ; artificial neural networks ; simulation ; real world
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract We present a case study of reinforcement learning on a real robot that learns how to back up a trailer and discuss the lessons learned about the importance of proper experimental procedure and design. We identify areas of particular concern to the experimental robotics community at large. In particular, we address concerns pertinent to robotics simulation research, implementing learning algorithms on real robotic hardware, and the difficulties involved with transferring research between the two.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Electronic Resource
    Electronic Resource
    Springer
    Neural processing letters 3 (1996), S. 11-15 
    ISSN: 1573-773X
    Keywords: hardware realisation ; RAM-based nodes ; reinforcement learning ; reward-penalty
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract RAM-based neural networks are designed to be efficiently implemented in hardware. The desire to retain this property influences the training algorithms used, and has led to the use of reinforcement (reward-penalty) learning. An analysis of the reinforcement algorithm applied to RAM-based nodes has shown the ease with which unlearning can occur. An amended algorithm is proposed which demonstrates improved learning performance compared to previously published reinforcement regimes.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Electronic Resource
    Electronic Resource
    Springer
    Neural processing letters 9 (1999), S. 119-127 
    ISSN: 1573-773X
    Keywords: reinforcement learning ; neurocontrol ; optimization ; polytope algorithm ; pole balancing ; genetic reinforcement
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Electronic Resource
    Electronic Resource
    Springer
    Neural processing letters 4 (1996), S. 167-172 
    ISSN: 1573-773X
    Keywords: fuzzy min-max neural network ; reinforcement learning ; autonomous vehicle navigation
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The fuzzy min-max neural network constitutes a neural architecture that is based on hyperbox fuzzy sets and can be incrementally trained by appropriately adjusting the number of hyperboxes and their corresponding volumes. Two versions have been proposed: for supervised and unsupervised learning. In this paper a modified approach is presented that is appropriate for reinforcement learning problems with discrete action space and is applied to the difficult task of autonomous vehicle navigation when no a priori knowledge of the enivronment is available. Experimental results indicate that the proposed reinforcement learning network exhibits superior learning behavior compared to conventional reinforcement schemes.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Electronic Resource
    Electronic Resource
    Springer
    Journal of computational neuroscience 6 (1999), S. 191-214 
    ISSN: 1573-6873
    Keywords: Neural network ; prefrontal ; reinforcement learning ; striatum ; timing
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Medicine , Physics
    Notes: Abstract A neural network model of how dopamine and prefrontal cortex activity guides short- and long-term information processing within the cortico-striatal circuits during reward-related learning of approach behavior is proposed. The model predicts two types of reward-related neuronal responses generated during learning: (1) cell activity signaling errors in the prediction of the expected time of reward delivery and (2) neural activations coding for errors in the prediction of the amount and type of reward or stimulus expectancies. The former type of signal is consistent with the responses of dopaminergic neurons, while the latter signal is consistent with reward expectancy responses reported in the prefrontal cortex. It is shown that a neural network architecture that satisfies the design principles of the adaptive resonance theory of Carpenter and Grossberg (1987) can account for the dopamine responses to novelty, generalization, and discrimination of appetitive and aversive stimuli. These hypotheses are scrutinized via simulations of the model in relation to the delivery of free food outside a task, the timed contingent delivery of appetitive and aversive stimuli, and an asymmetric, instructed delay response task.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Electronic Resource
    Electronic Resource
    Springer
    Artificial intelligence review 12 (1998), S. 445-468 
    ISSN: 1573-7462
    Keywords: architectures for autonomous robots ; artificial neural networks ; behaviour-based robots ; emergent properties ; reinforcement learning ; supervised learning ; unsupervised learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract This paper is a survey of some recentconnectionist approaches to the design and developmentof behaviour-based mobile robots. The research isanalysed in terms of principal connectionist learningmethods and neurological modeling trends. Possibleadvantages over conventionally programmed methods areconsidered and the connectionist achievements to dateare assessed. A realistic view is taken of theprospects for medium term progress and someobservations are made concerning the direction thismight profitably take.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Electronic Resource
    Electronic Resource
    Springer
    Artificial intelligence review 11 (1997), S. 343-370 
    ISSN: 1573-7462
    Keywords: lazy learning ; nearest neighbor ; genetic algorithms ; differential games ; pursuit games ; teaching ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches – lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm – to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm performed even better. These results motivated the next step in the experiments, where we hypothesized k-NN was having difficulty because it did not have good examples – a common source of difficulty for lazy learning. Therefore, we used the genetic algorithm as a bootstrapping method for k-NN to create a system to provide these examples. Our experiments demonstrate that the resulting joint system learned to solve the pursuit games with a high degree of accuracy – outperforming either method alone – and with relatively small memory requirements.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 4 (1997), S. 73-83 
    ISSN: 1573-7527
    Keywords: robotics ; robot learning ; group behavior ; multi-agent systems ; reinforcement learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 6 (1999), S. 187-201 
    ISSN: 1573-7527
    Keywords: landmark recognition ; learning in computer vision ; local learning ; recognition feedback ; reinforcement learning ; traffic sign recognition
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract Current machine perception techniques that typically use segmentation followed by object recognition lack the required robustness to cope with the large variety of situations encountered in real-world navigation. Many existing techniques are brittle in the sense that even minor changes in the expected task environment (e.g., different lighting conditions, geometrical distortion, etc.) can severely degrade the performance of the system or even make it fail completely. In this paper we present a system that achieves robust performance by using local reinforcement learning to induce a highly adaptive mapping from input images to segmentation strategies for successful recognition. This is accomplished by using the confidence level of model matching as reinforcement to drive learning. Local reinforcement learning gives rises to better improvement in recognition performance. The system is verified through experiments on a large set of real images of traffic signs.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 7 (1999), S. 41-56 
    ISSN: 1573-7527
    Keywords: classical conditioning ; reinforcement learning ; biological models
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract Classical conditioning is a basic learning mechanism in animals and can be found in almost all organisms. If we want to construct robots with abilities matching those of their biological counterparts, this is one of the learning mechanisms that needs to be implemented first. This article describes a computational model of classical conditioning where the goal of learning is assumed to be the prediction of a temporally discounted reward or punishment based on the current stimulus situation. The model is well suited for robotic implementation as it models a number of classical conditioning paradigms and learning in the model is guaranteed to converge with arbitrarily complex stimulus sequences. This is an essential feature once the step is taken beyond the simple laboratory experiment with two or three stimuli to the real world where no such limitations exist. It is also demonstrated how the model can be included in a more complex system that includes various forms of sensory pre-processing and how it can handle reinforcement learning, timing of responses and function as an adaptive world model.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 7 (1999), S. 77-88 
    ISSN: 1573-7527
    Keywords: reinforcement learning ; CMAC ; world models ; simulated soccer ; Q(λ) ; evolutionary computation ; PIPE
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(λ)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Electronic Resource
    Electronic Resource
    Springer
    Autonomous robots 7 (1999), S. 57-75 
    ISSN: 1573-7527
    Keywords: sensor-based manipulators ; multi-goal reaching tasks ; reinforcement learning ; neural networks ; differential inverse kinematics
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
    Notes: Abstract Our work focuses on making an autonomous robot manipulator learn suitable collision-free motions from local sensory data while executing high-level descriptions of tasks. The robot arm must reach a sequence of targets where it undertakes some manipulation. The robot manipulator has a sonar sensing skin covering its links to perceive the obstacles in its surroundings. We use reinforcement learning for that purpose, and the neural controller acquires appropriate reaction strategies in acceptable time provided it has some a priori knowledge. This knowledge is specified in two main ways: an appropriate codification of the signals of the neural controller—inputs, outputs and reinforcement—and decomposition of the learning task. The codification facilitates the generalization capabilities of the network as it takes advantage of inherent symmetries and is quite goal-independent. On the other hand, the task of reaching a certain goal position is decomposed into two sequential subtasks: negotiate obstacles and move to goal. Experimental results show that the controller achieves a good performance incrementally in a reasonable time and exhibits high tolerance to failing sensors.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...