ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2008-10-01
    Print ISSN: 0167-8655
    Electronic ISSN: 1872-7344
    Topics: Computer Science
    Published by Elsevier
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2018-06-06
    Description: A collective is a set of self-interested agents which try to maximize their own utilities, along with a a well-defined, time-extended world utility function which rates the performance of the entire system. In this paper, we use theory of collectives to design time-extended payoff utilities for agents that are both aligned with the world utility, and are "learnable", i.e., the agents can readily see how their behavior affects their utility. We show that in systems where each agent aims to optimize such payoff functions, coordination arises as a byproduct of the agents selfishly pursuing their own goals. A game theoretic analysis shows that such payoff functions have the net effect of aligning the Nash equilibrium, Pareto optimal solution and world utility optimum, thus eliminating undesirable behavior such as agents working at cross-purposes. We then apply collective-based payoff functions to the token collection in a gridworld problem where agents need to optimize the aggregate value of tokens collected across an episode of finite duration (i.e., an abstracted version of rovers on Mars collecting scientifically interesting rock samples, subject to power limitations). We show that, regardless of the initial token distribution, reinforcement learning agents using collective-based payoff functions significantly outperform both natural extensions of single agent algorithms and global reinforcement learning solutions based on "team games".
    Keywords: Cybernetics, Artificial Intelligence and Robotics
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2019-07-13
    Description: A collective of agents often needs to maximize a "world utility" function which rates the performance of an entire system, while subject to communication restrictions among the agents. Such communication restrictions make it difficult for agents which try to pursue their own "private" utilities to take actions that also help optimize the world utility. Team formation presents a solution to this problem, where by joining other agents, an agent can significantly increase its knowledge about the environment and improve its chances of both optimizing its own utility and that its doing so will contribute to the world utility. In this article we show how utilities that have been previously shown to be effective in collectives can be modified to be more effective in domains with moderate communication restrictions resulting in performance improvements of up to 75%. Additionally we show that even severe communication constraints can be overcome by forming teams where each agent of a team shares the same utility, increasing performance an additional 25%. We show that utilities and team sizes can be manipulated to form the best compromise between how "aligned" an agent s utility is with the world utility and how easily an agent can learn that utility.
    Keywords: Cybernetics, Artificial Intelligence and Robotics
    Type: AAMAS 2003; Jul 14, 2003 - Jul 18, 2003; Melbourne; Australia
    Format: text
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2019-07-13
    Description: In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. We are particularly interested in instances of this problem where centralized control is either impossible or impractical. For single agent systems in similar domains, machine learning methods (e.g., reinforcement learners) have been successfully used. However, applying such solutions directly to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in multiagent systems centers on determining the private objectives of each agent so that as the agents strive for those objectives, the system reaches a good global solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence to design goals for the agents that are 'aligned' with the global goal, and are 'learnable' in that agents can readily see how their behavior affects their utility. We show that reinforcement learning agents using those goals outperform both 'natural' extensions of single agent algorithms and global reinforcement, learning solutions based on 'team games'.
    Keywords: Cybernetics, Artificial Intelligence and Robotics
    Type: Paper-439 , Autonomous Agents and Multi-Agent Systems; Jan 01, 2002; Unknown
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2019-07-13
    Description: Many large distributed system are characterized by having a large number of components (eg., agents, neurons) whose actions and interactions determine a %orld utility which rates the performance of the overall system. Such collectives are often subject to communication restrictions, making it difficult for components which try to optimize their own private utilities, to take actions that also help optimize the world utility. In this article we address that coordination problem and derive four utility functions which present different compromises between how aligned a component s private utility is with the world utility and how readily that component can determine the actions that optimize its utility. The results show that the utility functions specifically derived to operate under communication restrictions outperform both traditional methods and previous collective-based methods by up to 75%.
    Keywords: Cybernetics, Artificial Intelligence and Robotics
    Type: Joint Conference on Neural Networks; Jul 26, 2004 - Jul 30, 2004; Budapest; Hungary
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2019-07-13
    Description: Single-agent reinforcement learners in time-extended domains and multi-agent systems share a common dilemma known as the credit assignment problem. Multi-agent systems have the structural credit assignment problem of determining the contributions of a particular agent to a common task. Instead, time-extended single-agent systems have the temporal credit assignment problem of determining the contribution of a particular action to the quality of the full sequence of actions. Traditionally these two problems are considered different and are handled in separate ways. In this article we show how these two forms of the credit assignment problem are equivalent. In this unified frame-work, a single-agent Markov decision process can be broken down into a single-time-step multi-agent process. Furthermore we show that Monte-Carlo estimation or Q-learning (depending on whether the values of resulting actions in the episode are known at the time of learning) are equivalent to different agent utility functions in a multi-agent system. This equivalence shows how an often neglected issue in multi-agent systems is equivalent to a well-known deficiency in multi-time-step learning and lays the basis for solving time-extended multi-agent problems, where both credit assignment problems are present.
    Keywords: Mathematical and Computer Sciences (General)
    Type: Autonomous Agents and Multi-Agent Systems Conference; Jul 19, 2004 - Jul 23, 2004; New York, NY; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2019-07-13
    Description: Reinforcement learning methods perform well in many domains where a single agent needs to take a sequence of actions to perform a task. These methods use sequences of single-time-step rewards to create a policy that tries to maximize a time-extended utility, which is a (possibly discounted) sum of these rewards. In this paper we build on our previous work showing how these methods can be extended to a multi-agent environment where each agent creates its own policy that works towards maximizing a time-extended global utility over all agents actions. We show improved methods for creating time-extended utilities for the agents that are both "aligned" with the global utility and "learnable." We then show how to crate single-time-step rewards while avoiding the pi fall of having rewards aligned with the global reward leading to utilities not aligned with the global utility. Finally, we apply these reward functions to the multi-agent Gridworld problem. We explicitly quantify a utility's learnability and alignment, and show that reinforcement learning agents using the prescribed reward functions successfully tradeoff learnability and alignment. As a result they outperform both global (e.g., team games ) and local (e.g., "perfectly learnable" ) reinforcement learning solutions by as much as an order of magnitude.
    Keywords: Cybernetics, Artificial Intelligence and Robotics
    Type: International Conference on autonomous Agents and Multi-Agent Systems; Jul 19, 2004 - Jul 23, 2004; New York, NY; United States
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2019-07-13
    Description: Sets of multi-agent teams often need to maximize a global utility rating the performance of the entire system where a team cannot fully observe other teams agents. Such limited observability hinders team-members trying to pursue their team utilities to take actions that also help maximize the global utility. In this article, we show how team utilities can be used in partially observable systems. Furthermore, we show how team sizes can be manipulated to provide the best compromise between having easy to learn team utilities and having them aligned with the global utility, The results show that optimally sized teams in a partially observable environments outperform one team in a fully observable environment, by up to 30%.
    Keywords: Mathematical and Computer Sciences (General)
    Type: International Joint Conference on Neural Networks; Jul 01, 2004; Budapest; Hungary
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2019-07-13
    Description: In many multi agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(lambda)/Q-learning. In this paper, we present a new reward evaluation method that allows the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces to be visualized. This method is independent of the learning algorithm and is only a function of the problem domain and the agents reward structure. We then use this reward efficiency visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain.
    Keywords: Numerical Analysis
    Type: Autonomous Agents and Multi-agent Systems; Jul 25, 2005 - Jul 29, 2005; Utrecht; Netherlands
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2019-07-13
    Description: Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems.
    Keywords: Computer Programming and Software
    Type: International Joint Conference9 in Artificial Intelligence; Jul 30, 2005 - Aug 05, 2005; Edinburgh, Scotland; United Kingdom
    Format: application/pdf
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...