ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Electronic Resource

INFERENCE AND COMPUTATION WITH POPULATION CODES (2003)

Pouget, Alexandre ; Dayan, Peter ; Zemel, Richard S.

Palo Alto, Calif. : Annual Reviews

Annual Review of Neuroscience 26 (2003), S. 381-410

add to mindlist on the mindlist

Details

ISSN: 0147-006X

Source: Annual Reviews Electronic Back Volume Collection 1932-2001ff

Topics: Biology , Medicine

Notes: Abstract In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1146/annurev.neuro.26.041002.131112

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

Bee foraging in uncertain environments using predictive hebbian learning (1995)

Montague, P. Read ; Dayan, Peter ; Person, Christophe ; [et al.]

[s.l.] : Nature Publishing Group

Nature 377 (1995), S. 725-728

add to mindlist on the mindlist

Details

ISSN: 1476-4687

Source: Nature Archives 1869 - 2009

Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics

Notes: [Auszug] Real and colleagues8 '] performed a series of experiments on bumblebees foraging on artificial blue and yellow flowers whose colours were the only predictor of the nectar delivery. They examined how bees respond to the mean and variability of this delivery in a foraging version of a ...

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1038/377725a0

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Temporal difference models describe higher-order learning in humans (2004)

O'Doherty, John P. ; Dayan, Peter ; Koltzenburg, Martin ; [et al.]

[s.l.] : Macmillian Magazines Ltd.

Nature 429 (2004), S. 664-667

add to mindlist on the mindlist

Details

ISSN: 1476-4687

Source: Nature Archives 1869 - 2009

Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics

Notes: [Auszug] The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have ...

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1038/nature02581

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Cortical substrates for exploratory decisions in humans (2006)

Dayan, Peter ; Seymour, Ben ; Dolan, Raymond J. ; [et al.]

[s.l.] : Nature Publishing Group

Nature 441 (2006), S. 876-879

add to mindlist on the mindlist

Details

ISSN: 1476-4687

Source: Nature Archives 1869 - 2009

Topics: Biology , Chemistry and Pharmacology , Medicine , Natural Sciences in General , Physics

Notes: [Auszug] Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on ...

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1038/nature04766

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

TD(λ) Converges with Probability 1 (1994)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 14 (1994), S. 295-301

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal differences ; $$\mathcal{Q}$$ -learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022657612745

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

Analytical Mean Squared Error Curves for Temporal Difference Learning (1998)

Singh, Satinder ; Dayan, Peter

Springer

Machine learning 32 (1998), S. 5-40

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007495401240

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

The convergence of TD(λ) for general λ (1992)

Dayan, Peter

Springer

Machine learning 8 (1992), S. 341-362

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Reinforcement learning ; temporal differences ; asynchronous dynamic programming

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones. It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00992701

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

Exploration bonuses and dual control (1996)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 25 (1996), S. 5-22

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This system-atizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00115298

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

Exploration Bonuses and Dual Control (1996)

Dayan, Peter ; Sejnowski, Terrence J.

Springer

Machine learning 25 (1996), S. 5-22

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Reinforcement learning ; dynamic programming ; exploration bonuses ; certainty equivalence ; non-stationary environment

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton‘s DYNA system.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018357105171

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Electronic Resource

Q-learning (1992)

Watkins, Christopher J. C. H. ; Dayan, Peter

Springer

Machine learning 8 (1992), S. 279-292

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Q-learning ; reinforcement learning ; temporal differences ; asynchronous dynamic programming

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00992698

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext