Competing Markov decision processes

Glazebrook, K. D.

doi:10.1007/BF02283613

Competing Markov decision processes

Computational Issues
Published: December 1991

Volume 29, pages 537–563, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

K. D. Glazebrook¹

76 Accesses
3 Citations
Explore all metrics

Abstract

A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

L. Benkherouf, K.D. Glazebrook and R.W. Owen, Gittins' indices and oil exploration (1990), submitted for publication.
J. Bruno and M. Hofri, On scheduling chains of jobs on one processor with limited preemption, SIAM J. Comput. 4 (1975) 478–490.
Google Scholar
R.W. Cranston, First experiences with a ranking method for portfolio selection, R & D. Mgmt. 5 (1975) 88–92.
Google Scholar
J.C. Gittins, Bandit processes and dynamic allocation indices (with discussion), J. Roy. Statist. Soc. B41 (1979) 148–177.
Google Scholar
J.C. Gittins,Multi-Armed Bandit Allocation Indices (Wiley, New York, 1989).
Google Scholar
K.D. Glazebrook, Stochastic scheduling with order constraints, Int. J. Syst. Sci. 7 (1976) 657–666.
Google Scholar
K.D. Glazebrook, Some ranking formulae for alternative research projects, OMEGA 4 (1978) 79–83.
Google Scholar
K.D. Glazebrook, Stoppable families of alternative bandit processes, J. Appl. Prob. 16 (1979) 843–854.
Google Scholar
K.D. Glazebrook, On a sufficient condition for superprocesses due to Whittle, J. Appl. Prob. 19 (1982) 99–110.
Google Scholar
K.D. Glazebrook, On a reduction principle in dynamic programming, Adv. Appl. Prob. 20 (1988) 836–851.
Google Scholar
K.D. Glazebrook, Procedures for the evaluation of strategies for resource allocation in a stochastic environment, J. Appl. Prob. 27 (1990) 215–220.
Google Scholar
K.D. Glazebrook, A comparative study of procedures for policy evaluation in stochastic scheduling (1990), submitted for publication.
K.D. Glazebrook and N.A. Fay, Evaluating strategies for Markov decision processes in parallel, Math. Oper. Res. 15 (1990) 17–32.
Google Scholar
K.D. Glazebrook and N.A. Fay, Stochastic scheduling — do we need to model arrivals? (1989), submitted for publication.
K.D. Glazebrook and N.A. Fay, A general model for the scheduling of alternative tasks on a single machine, Prob. Eng. Inf. Sci. 3 (1989) 199–221.
Google Scholar
M.N. Katehakis and A.F. Veinott, The multi-armed bandit problem: decomposition and computation, Math. Oper. Res. 12 (1987) 262–268.
Google Scholar
P. Nash, Optimal allocation of resources between research projects, Ph.D. thesis, Cambridge University (1973).
P. Varaiya, J. Walrand and C. Buyukkoc, Extensions of the multi-armed bandit problem: the discounted case, IEEE Trans. Auto. Control AC-30 (1985) 426–439.
Google Scholar
P. Whittle, Multi-armed bandits and the Gittins index, J. Roy. Statist. Soc. B42 (1980) 143–149.
Google Scholar
P. Whittle, Arm-acquiring bandits, Ann. Prob. 9 (1981) 284–292.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Newcastle upon Tyne, NE1 7RU, UK
K. D. Glazebrook

Authors

K. D. Glazebrook
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glazebrook, K.D. Competing Markov decision processes. Ann Oper Res 29, 537–563 (1991). https://doi.org/10.1007/BF02283613

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02283613

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Competing Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Markov Decision Processes with Discounted Reward: Suboptimal Actions

Constrained Optimality for First Passage Criteria in Semi-Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Competing Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Markov Decision Processes with Discounted Reward: Suboptimal Actions

Constrained Optimality for First Passage Criteria in Semi-Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation