ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Electronic Resource

Explanation-Based Learning and Reinforcement Learning: A Unified View (1997)

Dietterich, Thomas G. ; Flann, Nicholas S.

Springer

Machine learning 28 (1997), S. 169-210

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Explanation-based learning ; reinforcement learning ; dynamic programming ; goal regression ; speedup learning ; incomplete theory problem ; intractable theory problem

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a state-by-state basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a region-by-region basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of point-based dynamic programming and to standard EBL. The results show that region-based dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007355226281

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning (1996)

Asada, Minoru ; Noda, Shoichi ; Tawaratsumida, Sukoya ; [et al.]

Springer

Machine learning 23 (1996), S. 279-303

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; vision ; learning from easy mission ; state-action deviation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a state-action deviation problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018237008823

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Analytical Mean Squared Error Curves for Temporal Difference Learning (1998)

Singh, Satinder ; Dayan, Peter

Springer

Machine learning 32 (1998), S. 5-40

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference ; Monte Carlo ; MSE ; bias ; variance ; eligibility trace ; Markov reward process

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007495401240

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement (1997)

Schmidhuber, Jürgen ; Zhao, Jieyu ; Wiering, Marco

Springer

Machine learning 28 (1997), S. 105-130

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: inductive bias ; reinforcement learning ; reward acceleration ; Levin search ; success-story algorithm ; incremental self-improvement

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the “success-story algorithm” (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger long-term reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies involve complex, partially observable environments where traditional reinforcement learning fails.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007383707642

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

Reinforcement learning with replacing eligibility traces (1996)

Singh, Satinder P. ; Sutton, Richard S.

Springer

Machine learning 22 (1996), S. 123-158

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, thereplacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00114726

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

Incremental multi-step Q-learning (1996)

Peng, Jing ; Williams, Ronald J.

Springer

Machine learning 22 (1996), S. 283-290

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00114731

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

Reinforcement Learning with Replacing Eligibility Traces (1996)

Singh, Satinder P. ; Sutton, Richard S.

Springer

Machine learning 22 (1996), S. 123-158

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning ; eligibility trace ; Monte Carlo method ; Markov chain ; CMAC

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replace-trace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replace-trace TD is unbiased. In addition, we show that the method corresponding to replacing traces is closely related to the maximum likelihood solution for these tasks, and that its mean squared error is always lower in the long run. Computational results confirm these analyses and show that they are applicable more generally. In particular, we show that replacing traces significantly improve performance and reduce parameter sensitivity on the "Mountain-Car" task, a full reinforcement-learning problem with a continuous state space, when using a feature-based function approximator.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018012322525

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

Purposive behavior acquisition for a real robot by vision-based reinforcement learning (1996)

Asada, Minoru ; Noda, Shoichi ; Tawaratsumida, Sukoya ; [et al.]

Springer

Machine learning 23 (1996), S. 279-303

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; vision ; learning from easy mission ; state-action deviation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper presents a method of vision-based reinforcement learning by which a robot learns to shoot a ball into a goal. We discuss several issues in applying the reinforcement learning method to a real robot with vision sensor by which the robot can obtain information about the changes in an environment. First, we construct a state space in terms of size, position, and orientation of a ball and a goal in an image, and an action space is designed in terms of the action commands to be sent to the left and right motors of a mobile robot. This causes a “state-action deviation” problem in constructing the state and action spaces that reflect the outputs from physical sensors and actuators, respectively. To deal with this issue, an action set is constructed in a way that one action consists of a series of the same action primitive which is successively executed until the current state changes. Next, to speed up the learning time, a mechanism of Learning from Easy Missions (or LEM) is implemented. LEM reduces the learning time from exponential to almost linear order in the size of the state space. The results of computer simulations and real robot experiments are given.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00117447

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

ALECSYS and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems (1995)

Dorigo, Marco

Springer

Machine learning 19 (1995), S. 209-240

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: learning classifier systems ; reinforcement learning ; genetic algorithms ; animat problem

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022649410928

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Electronic Resource

Toward a Model of Intelligence as an Economy of Agents (1999)

Baum, Eric B.

Springer

Machine learning 35 (1999), S. 155-185

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; multi-agent systems ; planning ; evolutionary economics ; tragedy of the commons ; classifier systems ; agoric systems ; autonomous programming ; cognition ; artificial intelligence ; Hayek ; complex adaptive systems ; temporal difference learning ; evolutionary computation ; economic models of mind ; economic models of computation ; Blocks World ; reasoning ; learning ; computational learning theory ; learning to reason ; meta-reasoning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called “The Hayek Machine” is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007593124513

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

11

Electronic Resource

Technical Note: Bias and the Quantification of Stability (1995)

Turney, Peter

Springer

Machine learning 20 (1995), S. 23-33

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: stability ; bias ; accuracy ; repeatability ; agreement ; similarity

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is the stability of the algorithm; in other words, the repeatability of the results. If we obtain two sets of data from the same phenomenon, with the same underlying probability distribution, then we would like our learning algorithm to induce approximately the same concepts from both sets of data. This paper introduces a method for quantifying stability, based on a measure of the agreement between concepts. We also discuss the relationships among stability, predictive accuracy, and bias.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022682001417

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

12

Electronic Resource

Feature-Based Methods for Large Scale Dynamic Programming (1996)

Tsitsiklis, John N. ; Van Roy, Benjamin

Springer

Machine learning 22 (1996), S. 59-94

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018008221616

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

13

Electronic Resource

Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems (1995)

Dorigo, Marco

Springer

Machine learning 19 (1995), S. 209-240

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: learning classifier systems ; reinforcement learning ; genetic algorithms ; animat problem

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00996270

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

14

Electronic Resource

Feature-based methods for large scale dynamic programming (1996)

Tsitsiklis, John N. ; Roy, Benjamin

Springer

Machine learning 22 (1996), S. 59-94

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Compact representation ; curse of dimensionality ; dynamic programming ; features ; function approximation ; neuro-dynamic programming ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00114724

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

15

Electronic Resource

Incremental Multi-Step Q-Learning (1996)

Peng, Jing ; Williams, Ronald J.

Springer

Machine learning 22 (1996), S. 283-290

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; temporal difference learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018076709321

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

16

Electronic Resource

Technical note: Bias and the quantification of stability (1995)

Turney, Peter

Springer

Machine learning 20 (1995), S. 23-33

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: stability ; bias ; accuracy ; repeatability ; agreement ; similarity

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is the stability of the algorithm; in other words, the repeatability of the results. If we obtain two sets of data from the same phenomenon, with the same underlying probability distribution, then we would like our learning algorithm to induce approximately the same concepts from both sets of data. This paper introduces a method for quantifying stability, based on a measure of the agreement between concepts. We also discuss the relationships among stability, predictive accuracy, and bias.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00993473

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

17

Electronic Resource

CHILD: A First Step Towards Continual Learning (1997)

Ring, Mark B.

Springer

Machine learning 28 (1997), S. 77-104

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Continual learning ; transfer ; reinforcement learning ; sequence learning ; hierarchical neural networks

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007331723572

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

18

Electronic Resource

Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty (1999)

Meuleau, Nicolas ; Bourgine, Paul

Springer

Machine learning 35 (1999), S. 117-154

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; exploration vs. exploitation dilemma ; Markov decision processes ; bandit problems

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Q-learning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The first is to define a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suffers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular configurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and back-propagating them with the dynamic programming or temporal difference mechanisms. This allows reproducing global-scale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the efficiency of these propositions.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007541107674

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

19

Electronic Resource

Module-Based Reinforcement Learning: Experiments with a Real Robot (1998)

Kalmár, Zsolt ; Szepesvári, Csaba ; Lörincz, András

Springer

Machine learning 31 (1998), S. 55-85

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; module-based RL ; robot learning ; problem decomposition ; Markovian Decision Problems ; feature space ; subgoals ; local control ; switching control

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to:i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007440607681

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

20

Electronic Resource

Fast Online Q(λ) (1998)

Wiering, Marco ; Schmidhuber, Jürgen

Springer

Machine learning 33 (1998), S. 105-115

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: reinforcement learning ; Q-learning ; TD(λ) ; online Q(λ) ; lazy learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007562800292

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

21

Electronic Resource

Colearning in Differential Games (1998)

Sheppard, John W.

Springer

Machine learning 33 (1998), S. 201-233

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Markov games ; differential games ; pursuit games ; multiagent learning ; reinforcement learning ; Q-learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multiagent learning in the context of game playing and develop algorithms for “co-learning” in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a tree-based reinforcement learner.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007566607659

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

22

Electronic Resource

Forward Assembly Planning Based on Stability (1997)

Roberto, Caracciolo ; Enrico, Ceresole

Springer

Journal of intelligent and robotic systems 19 (1997), S. 411-436

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: assembly planning ; stability ; robot ; forward ; operations

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract The paper presents an approach to sequence planning consisting in determining assembly sequences defined in terms of mating and non-mating operations and based on a dynamic expansion of the assembly tree obtained using a knowledge base management system. The planner considers the case of a single-robot assembly workcell. The use of stability and the detailed definition of sequences also by means of several non-mating operations are shown to be powerful instruments in the control of the tree expansion. Forward assembly planning has been chosen, in order to minimize the number of stability checks. Backtracking is avoided by combining precedence relations and stability analysis. Hard and soft constrains are introduced to drive the tree expansion. Hard constraints are precedence relations and stability analysis. All operations are associated to costs, which are used as soft constraints. The operation based approach enables one to manage even non-mating operations and to easily overcome the linearity constraint. Costs enable the planner to manage the association among tools and components. The first section of the paper concerns Stability Analysis that is subdivided into Static and Dynamic Stability Analysis. The former is mainly involved in analyzing gravity effects; the latter is mainly involved in evaluate inertia effects due to manipulation. Stability Analysis is implemented in a simplified form. Fundamental assumptions are: no rotational equilibrium condition is considered; for each reaction force only direction and versus, but not magnitude, are considered; friction is neglected. The second section discusses the structure of the planner and its implementation. The planner is a rule based system. Forward chaining and hypothetical reasoning are the inference strategies used. The knowledge base and the data base of the system are presented and the advantages obtained using a rule based system are discussed. The third section shows two planning examples, showing the performance of the system in a simple case and in an industrial test case, the assembly of a microwave branching filter composed of 26 components.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007928631050

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

23

Electronic Resource

Reinforcement Learning and Robust Control for Robot Compliance Tasks (1998)

Kuan, Cheng-Peng ; Young, Kuu-young

Springer

Journal of intelligent and robotic systems 23 (1998), S. 165-182

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: compliance tasks ; reinforcement learning ; robust control

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract The complexity in planning and control of robot compliance tasks mainly results from simultaneous control of both position and force and inevitable contact with environments. It is quite difficult to achieve accurate modeling of the interaction between the robot and the environment during contact. In addition, the interaction with the environment varies even for compliance tasks of the same kind. To deal with these phenomena, in this paper, we propose a reinforcement learning and robust control scheme for robot compliance tasks. A reinforcement learning mechanism is used to tackle variations among compliance tasks of the same kind. A robust compliance controller that guarantees system stability in the presence of modeling uncertainties and external disturbances is used to execute control commands sent from the reinforcement learning mechanism. Simulations based on deburring compliance tasks demonstrate the effectiveness of the proposed scheme.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008083631190

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

24

Electronic Resource

Robust Regulator for Flexible-Joint Robots Using Integrator Backstepping (1998)

Abouelsoud, A. A.

Springer

Journal of intelligent and robotic systems 22 (1998), S. 23-38

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: robot dynamic model ; stiffness matrix ; constant disturbance ; integrator backstepping ; Liapunov functions ; Barbalat lemma ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract A robust regulator for flexible-joint robots is proposed, which yields constant torque disturbance rejection acting on the links. The design uses the integrator backstepping technique [4,5] to cancel nonlinearities and disturbance not in the range space of the control. Stability of the closed loop system is shown using iterative Liapunov functions.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007947416837

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

25

Electronic Resource

Stable Sampled-data Adaptive Control of Robot Arms Using Neural Networks (1997)

Sun, Fuchun ; Sun, Zengqi

Springer

Journal of intelligent and robotic systems 20 (1997), S. 131-155

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: robot adaptive control ; basis function-like networks ; stability ; discrete variable structure

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract Stable neural network-based sampled-data indirect and direct adaptivecontrol approaches, which are the integration of a neural network (NN)approach and the adaptive implementation of the discrete variable structurecontrol, are developed in this paper for the trajectory tracking control ofa robot arm with unknown nonlinear dynamics. The robot arm is assumed tohave an upper and lower bound of its inertia matrix norm and its states areavailable for measurement. The discrete variable structure control servestwo purposes, i.e., one is to force the system states to be within the stateregion in which neural networks are used when the system goes out of neuralcontrol; and the other is to improve the tracking performance within the NNapproximation region. Main theory results for designing stable neuralnetwork-based sampled data indirect and direct adaptive controllers aregiven, and the extension of the proposed control approaches to the compositeadaptive control of a flexible-link robot is discussed. Finally, theeffectiveness of the proposed control approaches is illustrated throughsimulation studies.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007900125801

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

26

Electronic Resource

Stable Neuro-Adaptive Control for Robots with the Upper Bound Estimation on the Neural Approximation Errors (1999)

Sun, Fuchun ; Sun, Zengqi ; Zhu, Yunyue ; [et al.]

Springer

Journal of intelligent and robotic systems 26 (1999), S. 91-100

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: robots ; neural networks ; adaptiveness ; stability ; approximation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract An indirect adaptive control approach is developed in this paper for robots with unknown nonlinear dynamics using neural networks (NNs). A key property of the proposed approach is that the actual joint angle values in the control law are replaced by the desired joint angles, angle velocities and accelerators, and the bound on the NN reconstruction errors is assumed to be unknown. Main theoretical results for designing such a neuro-controller are given, and the control performance of the proposed controller is verified with simulation studies.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008195720685

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

27

Electronic Resource

Embedding a Priori Knowledge in Reinforcement Learning (1998)

Ribeiro, Carlos H. C.

Springer

Journal of intelligent and robotic systems 21 (1998), S. 51-71

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: Q-learning algorithm ; reinforcement learning ; experience generalisation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle – learning through experimentation – that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007968115863

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

28

Electronic Resource

Robot Control Optimization Using Reinforcement Learning (1998)

Song, Kai-Tai ; Sun, Wen-Yu

Springer

Journal of intelligent and robotic systems 21 (1998), S. 221-238

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: artificial neural network ; dynamic control ; reinforcement learning ; robot control

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract Conventional robot control schemes are basically model-based methods. However, exact modeling of robot dynamics poses considerable problems and faces various uncertainties in task execution. This paper proposes a reinforcement learning control approach for overcoming such drawbacks. An artificial neural network (ANN) serves as the learning structure, and an applied stochastic real-valued (SRV) unit as the learning method. Initially, force tracking control of a two-link robot arm is simulated to verify the control design. The simulation results confirm that even without information related to the robot dynamic model and environment states, operation rules for simultaneous controlling force and velocity are achievable by repetitive exploration. Hitherto, however, an acceptable performance has demanded many learning iterations and the learning speed proved too slow for practical applications. The approach herein, therefore, improves the tracking performance by combining a conventional controller with a reinforcement learning strategy. Experimental results demonstrate improved trajectory tracking performance of a two-link direct-drive robot manipulator using the proposed method.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007904418265

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

29

Electronic Resource

Autonomous Intelligent Cruise Control with Actuator Delays (1998)

Huang, Sunan ; Ren, Wei

Springer

Journal of intelligent and robotic systems 23 (1998), S. 27-43

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: autonomous control ; actuator delays ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract In this paper, we consider the control design problem of vehicle following systems with actuator delays. An upper bound for the time delays is first constructed to guarantee the vehicle stability. Second, sufficient conditions are presented to avoid slinky-effects in the vehicle following. Next, zero steady state achieved by the proposed controller is proven. Finally, simulations are given to examine our claims.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007946007601

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

30

Electronic Resource

Learning to monitor a machine tool (1995)

Kokar, Mieczyslaw M. ; Letkowski, Jerzy ; Callahan, Thomas F.

Springer

Journal of intelligent and robotic systems 12 (1995), S. 103-125

add to mindlist on the mindlist

Details

ISSN: 1573-0409

Keywords: Machine learning ; reinforcement learning ; intelligent control ; machine tool ; tool monitoring ; metal cutting ; manufacturing

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract This paper deals with the issue of automatic learning and recognition of various conditions of a machine tool. The ultimate goal of the research discussed in this paper is to develop a comparehensive monitor and control (M&C) system that can substitute for the expert machinist and perform certain critical in-process tasks to assure quality production. The M&C system must reliably recognize and respond to qualitatively different behaviours of the machine tool, learn new behaviors, respond faster than its human counterpart to quality threatening circumstances, and interface with an existing controller. The research considers a series of face-milling anomalies that were subsequently simulated and used as a first step towards establishing the feasibility of employing machine learning as an integral component of the intelligent controller. We address the question of feasibility in two steps. First, it is important to know if the process models (dull tool, broken tool, etc.) can be learned (model learning). And second, if the models are learned, can an algorithm reliably select an appropriate model (distinguish between dull and broken tools) based on input from the model learner and from the sensors (model selection). The results of the simulation-based tests demonstrate that the milling-process anomalies can be learned, and the appropriate model can be reliably selected. Such a model can be subsequently utilized to make compensating in-process machine-tool adjustments. In addition, we observed that the learning curve need not approach the 100% level to be functional.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01258381

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

31

Electronic Resource

Difference schemes for the dispersive equation (1983)

Mengzhao, Qin

Springer

Computing 31 (1983), S. 261-267

add to mindlist on the mindlist

Details

ISSN: 1436-5057

Keywords: 65M10 ; Dispersive equation ; finite difference ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Description / Table of Contents: Zusammenfassung Dieser Artikel beinhaltet eine Zusammenstellung von Differenzenverfahren für die Dispersionsgleichungu 1=au xxx. Es werden Kriterien zur Herleitung von Stabilitätsbedingungen für Differenzenverfahren angegeben und auf die angegebenen Differenzenverfahren angewendet.

Notes: Abstract In this paper a table of difference schemes for the dispersive equationu i=au xxx is presented. A collection of criterions for deriving stability conditions of difference schemes is given and applied to these difference schemes.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02263436

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

32

Electronic Resource

D-stability and Kaps-Rentrop-methods (1984)

Veldhuizen, M.

Springer

Computing 32 (1984), S. 229-237

add to mindlist on the mindlist

Details

ISSN: 1436-5057

Keywords: 65L05 ; 65L07 ; Stiff system ; Rosenbroek method ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Description / Table of Contents: Zusammenfassung In dieser Arbeit wird die Stabilität des Kaps-Rentrop-Verfahrens in die Anwesenheit nichtlinearer Steifheit (Stiffness) analysiert. Dazu werden mittels eines einfachen Modells zwei Größen introduziert. Die Werte dieser Größen reflektieren gewissermaßen das Verhalten eines Kaps-Rentrop-Verfahrens in die Anwesenheit einer bestimmten Kopplung zwischen die beiden Komponenten in das steife System gewöhnlicher Differentialgleichungen. Einige numerische Beispiele veranschaulichen die Analyse.

Notes: Abstract In this paper we give an analysis of the effect of stiff nonlinearities on the behavior of a Kaps-Rentrop method. To that end we introduce two quantities related to a simple model. The values of these quantities determine to some extent the behavior of a Kaps-Rentrop method in case of a strong coupling between the smooth component and the transient one. Numerical examples illustrate the theoretical results.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02243574

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

33

Electronic Resource

On the stability of retrial queues (1997)

Altman, Eitan ; Borovkov, Aleksandr A.

Springer

Queueing systems 26 (1997), S. 343-363

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: retrial queues ; stability ; ergodicity ; renovation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We consider the following Type of problems. Calls arrive at a queue of capacity K (which is called the primary queue), and attempt to get served by a single server. If upon arrival, the queue is full and the server is busy, the new arriving call moves into an infinite capacity orbit, from which it makes new attempts to reach the primary queue, until it finds it non-full (or it finds the server idle). If the queue is not full upon arrival, then the call (customer) waits in line, and will be served according to the FIFO order. If λ is the arrival rate (average number per time unit) of calls and μ is one over the expected service time in the facility, it is well known that μ 〉 λ is not always sufficient for stability. The aim of this paper is to provide general conditions under which it is a sufficient condition. In particular, (i) we derive conditions for Harris ergodicity and obtain bounds for the rate of convergence to the steady state and large deviations results, in the case that the inter-arrival times, retrial times and service times are independent i.i.d. sequences and the retrial times are exponentially distributed; (ii) we establish conditions for strong coupling convergence to a stationary regime when either service times are general stationary ergodic (no independence assumption), and inter-arrival and retrial times are i.i.d. exponentially distributed; or when inter-arrival times are general stationary ergodic, and service and retrial times are i.i.d. exponentially distributed; (iii) we obtain conditions for the existence of uniform exponential bounds of the queue length process under some rather broad conditions on the retrial process. We finally present conditions for boundedness in distribution for the case of nonpatient (or non persistent) customers.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019193527040

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

34

Electronic Resource

Stability and performance analysis of rate-based feedback flow controlled ATM networks (1998)

Sharma, Vinod ; Kuri, Joy

Springer

Queueing systems 29 (1998), S. 129-159

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: rate-based feedback control ; ATM networks ; stability ; optimal algorithms

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Motivated by ABR class of service in ATM networks, we study a continuous time queueing system with a feedback control of the arrival rate of some of the sources. The feedback about the queue length or the total workload is provided at regular intervals (variations on it, especially the traffic management specification TM 4.0, are also considered). The propagation delays can be nonnegligible. For a general class of feedback algorithms, we obtain the stability of the system in the presence of one or more bottleneck nodes in the virtual circuit. Our system is general enough that it can be useful to study feedback control in other network protocols. We also obtain rates of convergence to the stationary distributions and finiteness of moments. For the single botterneck case, we provide algorithms to compute the stationary distributions and the moments of the sojourn times in different sets of states. We also show analytically (by showing continuity of stationary distributions and moments) that for small propagation delays, we can provide feedback algorithms which have higher mean throughput, lower probability of overflow and lower delay jitter than any open loop policy. Finally these results are supplemented by some computational results.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019127929282

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

35

Electronic Resource

Recurrence and transience properties of some neural networks: an approach via fluid limit models (1999)

Last, G. ; Stamer, H.

Springer

Queueing systems 32 (1999), S. 99-130

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: neural network ; inhibition ; stability ; Markov process ; fluid limit ; Harris-recurrence ; transience

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The subject of the paper is the stability analysis of some neural networks consisting of a finite number of interacting neurons. Following the approach of Dai [5] we use the fluid limit model of the network to derive a sufficient condition for positive Harris-recurrence of the associated Markov process. This improves the main result in Karpelevich et al. [11] and, at the same time, sheds some new light on it. We further derive two different conditions that are sufficient for transience of the state process and illustrate our results by classifying some examples according to positive recurrence or transience.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019135020138

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

36

Electronic Resource

Stability of a three‐station fluid network (1999)

Dai, J.G. ; Hasenbein, J.J. ; Vande Vate, J.H.

Springer

Queueing systems 33 (1999), S. 293-325

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: stability ; fluid models ; multiclass queueing networks ; piecewise linear Lyapunov functions ; linear Lyapunov functions ; monotone global stability ; static buffer priority disciplines

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper studies the stability of a three‐station fluid network. We show that, unlike the two‐station networks in Dai and Vande Vate [18], the global stability region of our three‐station network is not the intersection of its stability regions under static buffer priority disciplines. Thus, the “worst” or extremal disciplines are not static buffer priority disciplines. We also prove that the global stability region of our three‐station network is not monotone in the service times and so, we may move a service time vector out of the global stability region by reducing the service time for a class. We introduce the monotone global stability region and show that a linear program (LP) related to a piecewise linear Lyapunov function characterizes this largest monotone subset of the global stability region for our three‐station network. We also show that the LP proposed by Bertsimas et al. [1] does not characterize either the global stability region or even the monotone global stability region of our three‐station network. Further, we demonstrate that the LP related to the linear Lyapunov function proposed by Chen and Zhang [11] does not characterize the stability region of our three‐station network under a static buffer priority discipline.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019184331042

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

37

Electronic Resource

Stability of non-Markovian polling systems (1995)

Massoulié, Laurent

Springer

Queueing systems 21 (1995), S. 67-95

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: Polling systems ; stability ; stationary regime

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A stationary regime for polling systems with general ergodic (G/G) arrival processes at each station is constructed. Mutual independence of the arrival processes is not required. It is shown that the stationary workload so constructed is minimal in the stochastic ordering sense. In the model considered the server switches from station to station in a Markovian fashion, and a specific service policy is applied to each queue. Our hypotheses cover the purely gated, thea-limited, the binomial-gated and other policies. As a by-product we obtain sufficient conditions for the stationary regime of aG/G/1/∞ queue with multiple server vacations (see Doshi [11]) to be ergodic.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01158575

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

38

Electronic Resource

Pathwise rate- stability for input-output processes (1996)

El-Taha, Muhammad

Springer

Queueing systems 22 (1996), S. 47-63

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: Sample-path analysis ; stability ; rate stability ; ω-rate stability ; input-output process ; queueing ; infinite-server queues

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract An input-output processZ = {Z(t), t ⩾ 0} is said to beω-rate stable ifZ(t) = o(ω(t)) for some non-negative functionω(t). We prove that the processZ is ω-rate stable under weak conditions that include the assumption that input satisfies a linear burstiness condition and Z is asymptotically average stable. In many cases of interest, the conditions forω-rate-stability can be verified from input data. For example, using input information, we establishω-rate stability of the workload for multiserver queues, an ATM multiplexer, andω-rate stability of queue-length processes for infinite server queues.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01159392

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

39

Electronic Resource

On Whitt's conjecture for queues in which service times and interarrival times depend linearly and randomly upon waiting times (1996)

Zhang, Hanqin

Springer

Queueing systems 22 (1996), S. 345-366

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: State-dependent service and interarrival times ; Lindley equation ; recursive stochastic equations ; stability ; normal approximations

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We consider a modification of the standardG/G/1 queueing system with infinite waiting space and the first-in-first-out discipline in which the service times and interarrival times depend linearly and randomly on the waiting times. In this model the waiting times satisfy a modified version of the classical Lindley recursion. When the waiting-time distributions converge to a proper limit, Whitt [10] proposed a normal approximation for this steady-state limit. In this paper we prove a limit theorem for the steady-state limit of the system. Thus, our result provides a solid foundation for Whitt's normal approximation of the steady-state distribution of the system.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01149178

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

40

Electronic Resource

Dam processes with state dependent batch sizes and intermittent production processes with state dependent rates (1996)

Kaspi, Haya ; Kella, Offer ; Perry, David

Springer

Queueing systems 24 (1996), S. 37-57

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: dam ; storage process ; saturation rule ; intermittent production ; state dependent rates ; state dependent jumps ; stability ; positive Harris recurrence

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We consider a dam process with a general (state dependent) release rule and a pure jump input process, where the jump sizes are state dependent. We give sufficient conditions under which the process has a stationary version in the case where the jump times and sizes are governed by a marked point process which is point (Palm) stationary and ergodic. We give special attention to the Markov and Markov regenerative cases for which the main stability condition is weakened. We then study an intermittent production process with state dependent rates. We provide sufficient conditions for stability for this process and show that if these conditions are satisfied, then an interesting new relationship exists between the stationary distribution of this process and a dam process of the type we explore here.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01149079

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

41

Electronic Resource

Piecewise linear test functions for stability and instability of queueing networks (1997)

Down, D. ; Meyn, S.P.

Springer

Queueing systems 27 (1997), S. 205-226

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: multiclass queueing networks ; ergodicity ; stability ; performance analysis

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We develop the use of piecewise linear test functions for the analysis of stability of multiclass queueing networks and their associated fluid limit models. It is found that if an associated LP admits a positive solution, then a Lyapunov function exists. This implies that the fluid limit model is stable and hence that the network model is positive Harris recurrent with a finite polynomial moment. Also, it is found that if a particular LP admits a solution, then the network model is transient.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019166115653

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

42

Electronic Resource

On the guaranteed throughput and efficiency of closed re-entrant lines (1998)

Morrison, James R. ; Kumar, P.R.

Springer

Queueing systems 28 (1998), S. 33-54

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: queueing networks ; throughput ; closed networks ; efficiency ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A closed network is said to be “guaranteed efficient” if the throughput converges under all non-idling policies to the capacity of the bottlenecks in the network, as the number of trapped customers increases to infinity. We obtain a necessary condition for guaranteed efficiency of closed re-entrant lines. For balanced two-station systems, this necessary condition is almost sufficient, differing from it only by the strictness of an inequality. This near characterization is obtained by studying a special type of virtual station called “alternating visit virtual station”. These special virtual stations allow us to relate the necessary condition to certain indices arising in heavy traffic studies using a Brownian network approximation, as well as to certain policies proposed as being extremal with respect to the asymptotic loss in the throughput. Using the near characterization of guaranteed efficiency we also answer the often pondered question of whether an open network or its closed counterpart has greater throughput - the answer is that neither can assure a greater guaranteed throughput.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019199022922

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

43

Electronic Resource

On the stability of a partially accessible multi‐station queue with state‐dependent routing (1998)

Foss, Serguei ; Chernova, Natalia

Springer

Queueing systems 29 (1998), S. 55-73

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: multi‐server queue ; customer class ; state‐dependent routing ; stability ; Markov chain ; fluid limit

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We consider a multi‐station queue with a multi‐class input process when any station is available for the service of only some (not all) customer classes. Upon arrival, any customer may choose one of its accessible stations according to some state‐dependent policy. We obtain simple stability criteria for this model in two particular cases when service rates are either station‐ or class‐independent. Then, we study a two‐station queue under general assumptions on service rates. Our proofs are based on the fluid approximation approach.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019175812444

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

44

Electronic Resource

A stability criterion via fluid limits and its application to a polling system (1999)

Foss, Serguei ; Kovalevskii, Artyom

Springer

Queueing systems 32 (1999), S. 131-168

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: stability ; positive recurrence ; fluid limit ; polling system ; exhaustive service policy

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We introduce a generalized criterion for the stability of Markovian queueing systems in terms of stochastic fluid limits. We consider an example in which this criterion may be applied: a polling system with two stations and two heterogeneous servers.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019187004209

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

45

Electronic Resource

Dynamic scheduling in multiclass queueing networks: Stability under discrete-review policies (1999)

Maglaras, Constantinos

Springer

Queueing systems 31 (1999), S. 171-206

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: scheduling ; open multiclass queueing networks ; discrete-review policies ; fluid models ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper describes a family of discrete-review policies for scheduling open multiclass queueing networks. Each of the policies in the family is derived from what we call a dynamic reward function: such a function associates with each queue length vector q and each job class k a positive value r k (q), which is treated as a reward rate for time devoted to processing class k jobs. Assuming that each station has a traffic intensity parameter less than one, all policies in the family considered are shown to be stable. In such a policy, system status is reviewed at discrete points in time, and at each such point the controller formulates a processing plan for the next review period, based on the queue length vector observed. Stability is proved by combining elementary large deviations theory with an analysis of an associated fluid control problem. These results are extended to systems with class dependent setup times as well as systems with alternate routing and admission control capabilities.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019106213778

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

46

Electronic Resource

Window flow control in FIFO networks with cross traffic (1999)

Baccelli, F. ; Bonald, T.

Springer

Queueing systems 32 (1999), S. 195-231

add to mindlist on the mindlist

Details

ISSN: 1572-9443

Keywords: window flow control ; TCP ; stability ; multiclass networks ; stationary ergodic point processes ; (max,+)-linear system

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We focus on window flow control as used in packet-switched communication networks. The approach consists in studying the stability of a system where each node on the path followed by the packets of the controlled connection is modeled by a FIFO (First-In-First-Out) queue of infinite capacity which receives in addition some cross traffic represented by an exogenous flow. Under general stochastic assumptions, namely for stationary and ergodic input processes, we show the existence of a maximum throughput allowed by the flow control. Then we establish bounds on the value of this maximum throughput. These bounds, which do not coincide in general, are reached by time-space scalings of the exogenous flows. Therefore, the performance of the window flow control depends not only on the traffic intensity of the cross flows, but also on fine statistical characteristics such as the burstiness of these flows. These results are illustrated by several examples, including the case of a nonmonotone, nonconvex and fractal stability region.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019191105117

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

47

Electronic Resource

Emergence of symmetric, modular, and reciprocal connections in recurrent networks with Hebbian learning (1999)

Hua, Sherwin E. ; Houk, James C. ; Mussa-Ivaldi, Ferdinando A.

Springer

Biological cybernetics 81 (1999), S. 211-225

add to mindlist on the mindlist

Details

ISSN: 1432-0770

Keywords: Key words: Hebbian learning rule ; attractor dynamics ; symmetric connections ; multiplicative normalization ; self-organization ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Biology , Computer Science , Physics

Notes: Abstract. While learning and development are well characterized in feedforward networks, these features are more difficult to analyze in recurrent networks due to the increased complexity of dual dynamics – the rapid dynamics arising from activation states and the slow dynamics arising from learning or developmental plasticity. We present analytical and numerical results that consider dual dynamics in a recurrent network undergoing Hebbian learning with either constant weight decay or weight normalization. Starting from initially random connections, the recurrent network develops symmetric or near-symmetric connections through Hebbian learning. Reciprocity and modularity arise naturally through correlations in the activation states. Additionally, weight normalization may be better than constant weight decay for the development of multiple attractor states that allow a diverse representation of the inputs. These results suggest a natural mechanism by which synaptic plasticity in recurrent networks such as cortical and brainstem premotor circuits could enhance neural computation and the generation of motor programs.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/s004220050557

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

48

Electronic Resource

A Teaching Strategy for Memory-Based Control (1997)

Sheppard, John W. ; Salzberg, Steven L.

Springer

Artificial intelligence review 11 (1997), S. 343-370

add to mindlist on the mindlist

Details

ISSN: 1573-7462

Keywords: lazy learning ; nearest neighbor ; genetic algorithms ; differential games ; pursuit games ; teaching ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches – lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm – to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm performed even better. These results motivated the next step in the experiments, where we hypothesized k-NN was having difficulty because it did not have good examples – a common source of difficulty for lazy learning. Therefore, we used the genetic algorithm as a bootstrapping method for k-NN to create a system to provide these examples. Our experiments demonstrate that the resulting joint system learned to solve the pursuit games with a high degree of accuracy – outperforming either method alone – and with relatively small memory requirements.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1006597715165

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

49

Electronic Resource

Research Environment for Data Analysis Tool Allocators (1999)

Wilkinson, Jeff ; Levinson, Robert

Springer

Applied intelligence 11 (1999), S. 241-258

add to mindlist on the mindlist

Details

ISSN: 1573-7497

Keywords: analysis strategies ; limited resources ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Intelligent data analysis implies the reasoned application of autonomous or semi-autonomous tools to data sets drawn from problem domains. Automation of this process of reasoning about analysis (based on factors such as available computational resources, cost of analysis, risk of failure, lessons learned from past errors, and tentative structural models of problem domains) is highly non-trivial. By casting the problem of reasoning about analysis (MetaReasoning) as yet another data analysis problem domain, we have previously [R. Levinson and J. Wilkinson, in Advances in Intelligent Data Analysis, edited by X. Liu, P. Cohen, and M. Berthold, volume LNCS 1280, Springer-Verlag, Berlin, pp. 89–100, 1997] presented a design framework, MetaReasoning for Data Analysis Tool Allocation (MRDATA). Crucial to this framework is the ability of a Tool Allocator to track resource consumption (i.e. processor time and memory usage) by the Tools it employs, as well as the ability to allocate measured quantities of resources to these Tools. In order to test implementations of the MRDATA design, we now implement a Runtime Environment for Data Analysis Tool Allocation, RE:DATA. Tool Allocators run as processes under RE:DATA, are allotted system resources, and may use these resources to run their Tools as spawned sub-processes. We also present designs of native RE:DATA implementations of analysis tools used by MRDATA: K-Nearest Neighbor Tables, Regression Trees, Interruptible (“Any-Time”) Regression Trees, and “Hierarchy Diffusion” Temporal Difference Learners. Preliminary results are discussed and techniques for integration with non-native tools are explored.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008382825019

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

50

Electronic Resource

Learning to Perceive Objects for Autonomous Navigation (1999)

Peng, Jing ; Bhanu, Bir

Springer

Autonomous robots 6 (1999), S. 187-201

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: landmark recognition ; learning in computer vision ; local learning ; recognition feedback ; reinforcement learning ; traffic sign recognition

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract Current machine perception techniques that typically use segmentation followed by object recognition lack the required robustness to cope with the large variety of situations encountered in real-world navigation. Many existing techniques are brittle in the sense that even minor changes in the expected task environment (e.g., different lighting conditions, geometrical distortion, etc.) can severely degrade the performance of the system or even make it fail completely. In this paper we present a system that achieves robust performance by using local reinforcement learning to induce a highly adaptive mapping from input images to segmentation strategies for successful recognition. This is accomplished by using the confidence level of model matching as reinforcement to drive learning. Local reinforcement learning gives rises to better improvement in recognition performance. The system is verified through experiments on a large set of real images of traffic signs.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008887511945

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

51

Electronic Resource

Reinforcement Learning Soccer Teams with Incomplete World Models (1999)

Wiering, Marco ; Sałustowicz, Rafał ; Schmidhuber, Jürgen

Springer

Autonomous robots 7 (1999), S. 77-88

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: reinforcement learning ; CMAC ; world models ; simulated soccer ; Q(λ) ; evolutionary computation ; PIPE

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(λ)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008921914343

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

52

Electronic Resource

A New ADI Scheme for Solving Three-Dimensional Parabolic Differential Equations (1997)

Dai, Weizhong

Springer

Journal of scientific computing 12 (1997), S. 361-369

add to mindlist on the mindlist

Details

ISSN: 1573-7691

Keywords: Alternating-direction implicit ; difference scheme ; stability ; convergence

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A new alternating-direction implicit (ADI) scheme for solving three-dimensional parabolic differential equations has been developed based on the idea of regularized difference scheme. It is unconditionally stable and second-order accurate. Further, it overcomes the drawback of the Douglas scheme and is to be very well to simulate fast transient phenomena and to efficiently capture steady state solutions of parabolic differential equations. Numerical example is illustrated.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1025620828055

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

53

Electronic Resource

Constrained Learning in Neural Networks: Application to Stable Factorization of 2-D Polynomials (1998)

Perantonis, Stavros ; Ampazis, Nikolaos ; Varoufakis, Stavros ; [et al.]

Springer

Neural processing letters 7 (1998), S. 5-14

add to mindlist on the mindlist

Details

ISSN: 1573-773X

Keywords: constrained learning ; factorization ; feedforward networks ; IIR filters ; polynomials ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Adaptive artificial neural network techniques are introduced and applied to the factorization of 2-D second order polynomials. The proposed neural network is trained using a constrained learning algorithm that achieves minimization of the usual mean square error criterion along with simultaneous satisfaction of multiple equality and inequality constraints between the polynomial coefficients. Using this method, we are able to obtain good approximate solutions for non-factorable polynomials. By incorporating stability constraints into the formalism, our method can be successfully used for the realization of stable 2-D second order IIR filters in cascade form.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1009655902122

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

54

Electronic Resource

Connectionist Learning in Behaviour-Based Mobile Robots: A Survey (1998)

Rylatt, Mark ; Czarnecki, Chris ; Routen, Tom

Springer

Artificial intelligence review 12 (1998), S. 445-468

add to mindlist on the mindlist

Details

ISSN: 1573-7462

Keywords: architectures for autonomous robots ; artificial neural networks ; behaviour-based robots ; emergent properties ; reinforcement learning ; supervised learning ; unsupervised learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract This paper is a survey of some recentconnectionist approaches to the design and developmentof behaviour-based mobile robots. The research isanalysed in terms of principal connectionist learningmethods and neurological modeling trends. Possibleadvantages over conventionally programmed methods areconsidered and the connectionist achievements to dateare assessed. A realistic view is taken of theprospects for medium term progress and someobservations are made concerning the direction thismight profitably take.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1006567623867

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

55

Electronic Resource

A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making (1999)

Sun, Ron ; Peterson, Todd ; Merrill, Edward

Springer

Applied intelligence 11 (1999), S. 109-127

add to mindlist on the mindlist

Details

ISSN: 1573-7497

Keywords: hybrid models ; sequential decision making ; neural networks ; reinforcement learning ; cognitive modeling

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In developing autonomous agents, one usually emphasizes only (situated) procedural knowledge, ignoring more explicit declarative knowledge. On the other hand, in developing symbolic reasoning models, one usually emphasizes only declarative knowledge, ignoring procedural knowledge. In contrast, we have developed a learning model CLARION, which is a hybrid connectionist model consisting of both localist and distributed representations, based on the two-level approach proposed in [40]. CLARION learns and utilizes both procedural and declarative knowledge, tapping into the synergy of the two types of processes, and enables an agent to learn in situated contexts and generalize resulting knowledge to different scenarios. It unifies connectionist, reinforcement, and symbolic learning in a synergistic way, to perform on-line, bottom-up learning. This summary paper presents one version of the architecture and some results of the experiments.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008332731824

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

56

Electronic Resource

Reinforcement Learning in the Multi-Robot Domain (1997)

Matarić, Maja J.

Springer

Autonomous robots 4 (1997), S. 73-83

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: robotics ; robot learning ; group behavior ; multi-agent systems ; reinforcement learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008819414322

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

57

Electronic Resource

Module-Based Reinforcement Learning: Experiments with a Real Robot (1998)

Kalmár, Zsolt ; Szepesvári, Csaba ; Lőrincz, András

Springer

Autonomous robots 5 (1998), S. 273-295

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: reinforcement learning ; module-based RL ; robot learning ; problem decomposition ; Markovian decision problems ; feature space ; subgoals ; local control ; switching control

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: (i) decompose the task into subtasks using the qualitative knowledge at hand; (ii) design local controllers to solve the subtasks using the available quantitative knowledge, and (iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to nonadaptive ones in complex environments.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008858222277

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

58

Electronic Resource

Dynamics of a Classical Conditioning Model (1999)

Balkenius, Christian

Springer

Autonomous robots 7 (1999), S. 41-56

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: classical conditioning ; reinforcement learning ; biological models

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract Classical conditioning is a basic learning mechanism in animals and can be found in almost all organisms. If we want to construct robots with abilities matching those of their biological counterparts, this is one of the learning mechanisms that needs to be implemented first. This article describes a computational model of classical conditioning where the goal of learning is assumed to be the prediction of a temporally discounted reward or punishment based on the current stimulus situation. The model is well suited for robotic implementation as it models a number of classical conditioning paradigms and learning in the model is guaranteed to converge with arbitrarily complex stimulus sequences. This is an essential feature once the step is taken beyond the simple laboratory experiment with two or three stimuli to the real world where no such limitations exist. It is also demonstrated how the model can be included in a more complex system that includes various forms of sensory pre-processing and how it can handle reinforcement learning, timing of responses and function as an adaptive world model.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008965713435

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

59

Electronic Resource

Repeatability of Real World Training Experiments: A Case Study (1999)

Hougen, Dean F. ; Rybski, Paul E. ; Gini, Maria

Springer

Autonomous robots 6 (1999), S. 281-292

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: mobile robotics ; reinforcement learning ; artificial neural networks ; simulation ; real world

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract We present a case study of reinforcement learning on a real robot that learns how to back up a trailer and discuss the lessons learned about the importance of proper experimental procedure and design. We identify areas of particular concern to the experimental robotics community at large. In particular, we address concerns pertinent to robotics simulation research, implementing learning algorithms on real robotic hardware, and the difficulties involved with transferring research between the two.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008984312527

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

60

Electronic Resource

Learning of Sensor-Based Arm Motions while Executing High-Level Descriptions of Tasks (1999)

Martín, Pedro ; Millán, José Del R.

Springer

Autonomous robots 7 (1999), S. 57-75

add to mindlist on the mindlist

Details

ISSN: 1573-7527

Keywords: sensor-based manipulators ; multi-goal reaching tasks ; reinforcement learning ; neural networks ; differential inverse kinematics

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics

Notes: Abstract Our work focuses on making an autonomous robot manipulator learn suitable collision-free motions from local sensory data while executing high-level descriptions of tasks. The robot arm must reach a sequence of targets where it undertakes some manipulation. The robot manipulator has a sonar sensing skin covering its links to perceive the obstacles in its surroundings. We use reinforcement learning for that purpose, and the neural controller acquires appropriate reaction strategies in acceptable time provided it has some a priori knowledge. This knowledge is specified in two main ways: an appropriate codification of the signals of the neural controller—inputs, outputs and reinforcement—and decomposition of the learning task. The codification facilitates the generalization capabilities of the network as it takes advantage of inherent symmetries and is quite goal-independent. On the other hand, the task of reaching a certain goal position is decomposed into two sequential subtasks: negotiate obstacles and move to goal. Experimental results show that the controller achieves a good performance incrementally in a reasonable time and exhibits high tolerance to failing sensors.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008969830273

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

61

Electronic Resource

Improving reinforcement learning in stochastic RAM-based neural networks (1996)

Ferguson, Alistair ; Bolouri, Hamid

Springer

Neural processing letters 3 (1996), S. 11-15

add to mindlist on the mindlist

Details

ISSN: 1573-773X

Keywords: hardware realisation ; RAM-based nodes ; reinforcement learning ; reward-penalty

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract RAM-based neural networks are designed to be efficiently implemented in hardware. The desire to retain this property influences the training algorithms used, and has led to the use of reinforcement (reward-penalty) learning. An analysis of the reinforcement algorithm applied to RAM-based nodes has shown the ease with which unlearning can occur. An amended algorithm is proposed which demonstrates improved learning performance compared to previously published reinforcement regimes.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00417784

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

62

Electronic Resource

Training Reinforcement Neurocontrollers Using the Polytope Algorithm (1999)

Likas, Aristidis ; Lagaris, Isaac E.

Springer

Neural processing letters 9 (1999), S. 119-127

add to mindlist on the mindlist

Details

ISSN: 1573-773X

Keywords: reinforcement learning ; neurocontrol ; optimization ; polytope algorithm ; pole balancing ; genetic reinforcement

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018669223478

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

63

Electronic Resource

A reinforcement learning approach based on the fuzzy min-max neural network (1996)

Likas, Aristidis ; Blekas, Kostas

Springer

Neural processing letters 4 (1996), S. 167-172

add to mindlist on the mindlist

Details

ISSN: 1573-773X

Keywords: fuzzy min-max neural network ; reinforcement learning ; autonomous vehicle navigation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The fuzzy min-max neural network constitutes a neural architecture that is based on hyperbox fuzzy sets and can be incrementally trained by appropriately adjusting the number of hyperboxes and their corresponding volumes. Two versions have been proposed: for supervised and unsupervised learning. In this paper a modified approach is presented that is appropriate for reinforcement learning problems with discrete action space and is applied to the difficult task of autonomous vehicle navigation when no a priori knowledge of the enivronment is available. Experimental results indicate that the proposed reinforcement learning network exhibits superior learning behavior compared to conventional reinforcement schemes.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00426025

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

64

Electronic Resource

Splitting Methods for Three-Dimensional Transport Models with Interaction Terms (1997)

van der Houwen, P. J. ; Sommeijer, B. P.

Springer

Journal of scientific computing 12 (1997), S. 215-231

add to mindlist on the mindlist

Details

ISSN: 1573-7691

Keywords: Transport models ; shallow water ; splitting methods ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We investigate the use of splitting methods for the numerical integration of three-dimensional transport-chemistry models. In particular, we investigate various possibilities for the time discretization that can take advantage of the parallelization and vectorization facilities offered by multi-processor vector computers. To suppress wiggles in the numerical solution, we use third-order, upwind-biased discretization of the advection terms, resulting in a five-point coupling in each direction. As an alternative to the usual splitting functions, such as co-ordinate splitting or operator splitting, we consider a splitting function that is based on a three-coloured hopscotch-type splitting in the horizontal direction, whereas full coupling is retained in the vertical direction. Advantages of this splitting function are the easy application of domain decomposition techniques and unconditional stability in the vertical, which is an important property for transport in shallow water. The splitting method is obtained by combining the hopscotch-type splitting function with various second-order splitting formulae from the literature. Although some of the resulting methods are highly accurate, their stability behaviour (due to horizontal advection) is quite poor. Therefore we also discuss several new splitting formulae with the aim to improve the stability characteristics. It turns out that this is possible indeed, but the price to pay is a reduction of the accuracy. Therefore, such methods are to be preferred if accuracy is less crucial than stability; such a situation is frequently encountered in solving transport problems. As part of the project TRUST (Transport and Reactions Unified by Splitting Techniques), preliminary versions of the schemes are implemented on the Cray C98 4256 computer and are available for benchmarking.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1025645326705

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

65

Electronic Resource

A Generalized Peaceman–Rachford ADI Scheme for Solving Two-Dimensional Parabolic Differential Equations (1997)

Dai, Weizhong

Springer

Journal of scientific computing 12 (1997), S. 353-360

add to mindlist on the mindlist

Details

ISSN: 1573-7691

Keywords: Alternating-direction implicit ; difference scheme ; stability ; convergence

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A generalized Peaceman–Rachford alternating-direction implicit (ADI) scheme for solving two-dimensional parabolic differential equations has been developed based on the idea of regularized difference scheme. It is to be very well to simulate fast transient phenomena and to efficiently capture steady state solutions of parabolic differential equations. Numerical example is illustrated.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1025631211217

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

66

Electronic Resource

Modified Conjugate Gradient Method for the Solution of Ax=b (1998)

Gottlieb, Sigal ; Fischer, Paul F.

Springer

Journal of scientific computing 13 (1998), S. 173-183

add to mindlist on the mindlist

Details

ISSN: 1573-7691

Keywords: Modified conjugate gradient method ; conjugate gradient method ; Krylov space ; convergence rate ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this note, we examine a modified conjugate gradient procedure for solving $$A\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{x} = \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ in which the approximation space is based upon the Krylov space ( $$A\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{x} = \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ ) associated with $$\sqrt A$$ and $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ . We show that, given initial vectors $$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ and $$\sqrt A \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ (possibly computed at some expense), the best fit solution in $$K^k \sqrt A ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{b}$$ can be computed using a finite-term recurrence requiring only one multiplication by A per iteration. The initial convergence rate appears, as expected, to be twice as fast as that of the standard conjugate gradient method, but stability problems cause the convergence to be degraded.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1023222110984

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

67

Electronic Resource

Stability of traffic patterns in broadband networks (1995)

Goodman, R. M. ; Ambrose, B. E.

Springer

Journal of network and systems management 3 (1995), S. 371-380

add to mindlist on the mindlist

Details

ISSN: 1573-7705

Keywords: Telephone traffic ; network management ; control theory ; dynamic flows ; stability ; routing algorithms ; broadband networks ; simulation

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The control of telephony traffic is the task of network management and routing algorithms. In this paper, a study of two trunk groups carrying telephony traffic is used to show that instabilities can arise if there is a delay in getting feedback information for a network controller. The network controller seeks to balance the traffic in the two trunk groups, which may represent two paths from a source to a destination. An analysis shows how factors such as holding time, controller gain and feedback delay influence stability. Simulation of a two service case is also carried out to show that the same instabilities can arise.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02139530

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

68

Electronic Resource

Convergence of Delayed Dynamical Systems (1999)

Chen, Tianping

Springer

Neural processing letters 10 (1999), S. 267-271

add to mindlist on the mindlist

Details

ISSN: 1573-773X

Keywords: recurrent neural networks ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this paper, we point out that the conditions given in [1] are sufficient but unnecessary for the global asymptotically stable equilibrium of a class of delay differential equations. Instead, we prove that under weaker conditions, it is still global asymptotically stable.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1018753013035

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

69

Electronic Resource

Stability analysis of a general toeplitz systems solver (1995)

Bojanczyk, Adam W. ; Brent, Richard P. ; Hoog, Frank R.

Springer

Numerical algorithms 10 (1995), S. 225-244

add to mindlist on the mindlist

Details

ISSN: 1572-9265

Keywords: Cholesky factorization error analysis ; Hankel matrix ; least squares ; normal equations ; orthogonal factorization ; QR factorization ; semi-normal equations ; stability ; Toeplitz matrix ; weak stability ; Primary 65F25 ; Secondary 47B35 ; 65F05 ; 65F30 ; 65Y05 ; 65Y10

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mathematics

Notes: Abstract We show that a fast algorithm for theQR factorization of a Toeplitz or Hankel matrixA is weakly stable in the sense thatR T R is close toA T A. Thus, when the algorithm is used to solve the semi-normal equationsR TRx=AT b, we obtain a weakly stable method for the solution of a nonsingular Toeplitz or Hankel linear systemAx=b. The algorithm also applies to the solution of the full-rank Toeplitz or Hankel least squares problem min ||Ax-b||2.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02140770

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

70

Electronic Resource

Progressive stable interpolation (1997)

Nigro, Abdelmalek ; Laurent, Pierre‐Jean

Springer

Numerical algorithms 14 (1997), S. 343-359

add to mindlist on the mindlist

Details

ISSN: 1572-9265

Keywords: progressive interpolation ; stability ; spline ; shape parameters ; geometric continuity ; 41A05 ; 41A15 ; 65D05 ; 65D07

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mathematics

Notes: Abstract In this paper, we study several interpolating and smoothing methods for data which are known “progressively”. The algorithms proposed are governed by recurrence relations and our principal goal is to study their stability. A recurrence relation will be said stable if the spectral radius of the associated matrix is less than one. The iteration matrices depend on shape parameters which come either from the connection at the knots, or from the nature of the interpolant between two knots. We obtain various stability domains. Moving the parameters inside these domains leads to interesting shape effects.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1019129400864

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

71

Electronic Resource

Multistep methods for differential algebraic equations (1995)

Yan, Xiaopu

Springer

Numerical algorithms 10 (1995), S. 245-260

add to mindlist on the mindlist

Details

ISSN: 1572-9265

Keywords: Multistep methods ; differential-algebraic equations ; stability ; existence and uniqueness ; convergence of iterative method ; 65L06 ; 65L20 ; 65N22

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Mathematics

Notes: Abstract Multistep methods for the differential/algebraic equations (DAEs) in the form of $$F_1 (x) = 0, F_2 (x,x',z) = 0$$ are presented, whereF 1 maps from ℝ n to ℝ ′ ,F 2 from ℝ n x ℝ n x ℝ m to ℝ s andr〈n≤r+s=n+m. By employing the deviations of the available existence theories, a new form of the multistep method for solutions of (1) is developed. Furthermore, it is shown that this method has no typical instabilities such as those that may occur in the application of multistep method to DAEs in the traditional manner. A proof of the solvability of the multistep system is provided, and an iterative method is developed for solving these nonlinear algebraic equations. Moreover, a proof of the convergence of this iterative method is presented.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02140771

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

72

Electronic Resource

Making Beliefs Coherentl. The Subtraction and Addition Strategies. (1998)

Olsson, Erick J.

Springer

Journal of logic, language and information 7 (1998), S. 143-163

add to mindlist on the mindlist

Details

ISSN: 1572-9583

Keywords: Belief revision ; consolidation ; coherence ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Linguistics and Literary Studies , Computer Science

Notes: Abstract The notion of epistemic coherence is interpreted as involving not only consistency but also stability. The problem how to consolidate a belief system, i.e., revise it so that it becomes coherent, is studied axiomatically as well as in terms of set-theoretical constructions. Representation theorems are given for subtractive consolidation (where coherence is obtained by deleting beliefs) and additive consolidation (where coherence is obtained by adding beliefs).

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008274704871

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

73

Electronic Resource

A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior (1999)

Contreras-Vidal, José L. ; Schultz, Wolfram

Springer

Journal of computational neuroscience 6 (1999), S. 191-214

add to mindlist on the mindlist

Details

ISSN: 1573-6873

Keywords: Neural network ; prefrontal ; reinforcement learning ; striatum ; timing

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science , Medicine , Physics

Notes: Abstract A neural network model of how dopamine and prefrontal cortex activity guides short- and long-term information processing within the cortico-striatal circuits during reward-related learning of approach behavior is proposed. The model predicts two types of reward-related neuronal responses generated during learning: (1) cell activity signaling errors in the prediction of the expected time of reward delivery and (2) neural activations coding for errors in the prediction of the amount and type of reward or stimulus expectancies. The former type of signal is consistent with the responses of dopaminergic neurons, while the latter signal is consistent with reward expectancy responses reported in the prefrontal cortex. It is shown that a neural network architecture that satisfies the design principles of the adaptive resonance theory of Carpenter and Grossberg (1987) can account for the dopamine responses to novelty, generalization, and discrimination of appetitive and aversive stimuli. These hypotheses are scrutinized via simulations of the model in relation to the delivery of free food outside a task, the timed contingent delivery of appetitive and aversive stimuli, and an asymmetric, instructed delay response task.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008862904946

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

74

Electronic Resource

Time separation technique for large databases (1983)

Kostovetsky, Alex

Springer

International journal of parallel programming 12 (1983), S. 193-209

add to mindlist on the mindlist

Details

ISSN: 1573-7640

Keywords: Database ; characteristic frequency ; aggregated model ; decomposition ; stability ; dynamic distribution

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract A time decomposition technique is suggested for large-database (DB) models. The problem of network aggregation is studied and the results used to create a meaningful decomposed model. Decomposition conditions and assumptions are discussed and illustrated by examples. A practical operating schedule is presented for the time-separated DB model. The schedule uses a sequence of decomposed models, which are to be constructed recursively. The application of the time separation technique for large-DB models is presented in the form of a closed-loop algorithm. The problem of decomposition stability with respect to variations in time constants is considered as well. Two alternative approaches to the problem are suggested. For a probabilistic approach, practical approximate formulas are obtained for subsystem time constants and recommendations are made with respect to the decomposition structure. An approximate performance analysis is done for both standard and time-decomposed models. A comparison of the results is given.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00995891

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

75

Electronic Resource

A note on stability investigations for Rosenbrock-type methods for quasilinear-implicit differential equations (1996)

Büttner, M. ; Weiner, R. ; Strehmel, K.

Springer

Computing 56 (1996), S. 47-59

add to mindlist on the mindlist

Details

ISSN: 1436-5057

Keywords: 65 L 05 ; Rosenbrock-type methods ; quasilinear-implicit differential equations ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Description / Table of Contents: Zusammenfassung Bei der Lösung quasilinear-impliziter ODEs mittels Rosenbrock-Typ-Methoden können trotz guter Stabilitätseigenschaften (A- bzw. L-Stabilität) des Grundverfahrens Stabilitätsprobleme auftreten. Diese Schwierigkeiten sind auf Ungenauigkeiten bei der Berechnung künstlich eingeführter Komponenten (Überführung in DAEs) zurückzuführen. Die Arbeit untersucht die Ursachen für diese Effekte und zeigt Möglichkeiten, diese zu überwinden.

Notes: Abstract The solution of quasilinear-implicit ODEs using Rosenbrock type methods may suffer from stability problems despite stability properties such as A-stability or L-stability, respectively. These problems are caused by inexact computation of artificial introduced components (transformation to DAE system). The paper investigates the source of the numerical difficulties and shows modifications to overcome them.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02238291

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

76

Electronic Resource

On the stability of multistep formulas for Volterra integral equations of the second kind (1980)

Houwen, P. J. ; Wolkenfelt, P. H. M.

Springer

Computing 24 (1980), S. 341-347

add to mindlist on the mindlist

Details

ISSN: 1436-5057

Keywords: Numerical analysis ; Volterra integral equations of the second kind ; stability

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Description / Table of Contents: Zusammenfassung Ziel dieser Arbeit ist es, die Stabilitätseigenschaften einer Klasse Volterrascher Integralgleichungen zweiter Art zu untersuchen. Unsere Behandlung ist der üblichen Stabilitätsanalyse ähnlich, in der die Kernfunktionen zu einer im voraus beschränkten Klasse von Testfunktionen gehören. Wir haben die Klasse der “endlich zerlegbaren” Kerne betrachtet. Stabilitätsbedingungen werden abgeleitet und verglichen mit den Bedingungen für die einfache Testgleichung. Es zeigt sich, daß die neuen Kriteria einschränkender sind als die konventionellen Bedingungen. Der praktische Wert wird getestet durch numerische Experimente mit der Trapezregel.

Notes: Abstract The purpose of this paper is to analyse the stability properties of a class of multistep methods for second kind Volterra integral equations. Our approach follows the usual analysis in which the kernel function is a priori restricted to a special class of test functions. We consider the class of finitely decomposable kernels. Stability conditions will be derived and compared with those obtained with the simple test equation. It turns out that the new criteria are more severe than the conventional conditions. The practical value is tested by numerical experiments with the trapezoidal rule.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02237819

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

77

Electronic Resource

The finite volume element method with quadratic basis functions (1996)

Liebau, F.

Springer

Computing 57 (1996), S. 281-299

add to mindlist on the mindlist

Details

ISSN: 1436-5057

Keywords: 65N15 ; 65N99 ; 35A40 ; Finite volume method ; box scheme ; stability ; error estimates

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Description / Table of Contents: Zusammenfassung Es wird eine Box-Methode mit quadratischen Ansatzfunktionen zur Diskretisierung elliptischer Randwertaufgaben vorgestellt. Die entstehende Diskretisierungsmatrix ist nichsymmetrisch. Die Stabilitätsanalyse basiert auf einer elementweisen Abschätzung des Skalarproduktes 〈A h u h ,u h 〉. Hinreichende Bedingungen an die Geometrie der Dreiecke der Triangulierung führen zur diskreten Elliptizität. Unter diesen Voraussetzungen wird eineO(h 2)-Fehlerabschätzung bewiesen.

Notes: Abstract The paper presents a box scheme with quadratic basis functions for the discretisation of elliptic boundary value problems. The resulting discretisation matrix is non-symmetrical (and also not an M-matrix). The stability analysis is based on an elementwise estimation of the scalar product 〈A h u h ,u h 〉. Sufficient conditions placed on the triangles of the triangulation lead to discrete ellipticity. Proof of anO(h 2) error estimate is given for these conditions.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF02252250

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext