ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Palo Alto, Calif. : Annual Reviews
    Annual Review of Neuroscience 26 (2003), S. 381-410 
    ISSN: 0147-006X
    Source: Annual Reviews Electronic Back Volume Collection 1932-2001ff
    Topics: Biology , Medicine
    Notes: Abstract In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    [s.l.] : Nature Publishing Group
    Nature 377 (1995), S. 725-728 
    ISSN: 1476-4687
    Source: Nature Archives 1869 - 2009
    Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics
    Notes: [Auszug] Real and colleagues8 '] performed a series of experiments on bumblebees foraging on artificial blue and yellow flowers whose colours were the only predictor of the nectar delivery. They examined how bees respond to the mean and variability of this delivery in a foraging version of a ...
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    ISSN: 1476-4687
    Source: Nature Archives 1869 - 2009
    Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics
    Notes: [Auszug] The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have ...
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    [s.l.] : Nature Publishing Group
    Nature 441 (2006), S. 876-879 
    ISSN: 1476-4687
    Source: Nature Archives 1869 - 2009
    Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics
    Notes: [Auszug] Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on ...
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 14 (1994), S. 295-301 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal differences ; $$\mathcal{Q}$$ -learning
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 32 (1998), S. 5-40 
    ISSN: 0885-6125
    Keywords: reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 8 (1992), S. 341-362 
    ISSN: 0885-6125
    Keywords: Reinforcement learning ; temporal differences ; asynchronous dynamic programming
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 25 (1996), S. 5-22 
    ISSN: 0885-6125
    Keywords: Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This system-atizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 25 (1996), S. 5-22 
    ISSN: 0885-6125
    Keywords: Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton‘s DYNA system.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Electronic Resource
    Electronic Resource
    Springer
    Machine learning 8 (1992), S. 279-292 
    ISSN: 0885-6125
    Keywords: Q-learning ; reinforcement learning ; temporal differences ; asynchronous dynamic programming
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...