Skip to main content
Log in

Competing Markov decision processes

  • Computational Issues
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

A class of discounted Markov decision processes (MDPs) is formed by bringing together individual MDPs sharing the same discount rate. These are in competition in the sense that at each decision epoch a single action is chosen from the union of the action sets of the individual MDPs. Such families of competing MDPs have been used to model a variety of problems in stochastic resource allocation and in the sequential design of experiments. Suppose thatS is a stationary strategy for such a family, thatS* is an optimal strategy and thatR(S),R(S*) denote the respective rewards earned. The paper extends (and explains) existing theory based on the Gittins index to give bounds onR(S*)-R(S) for this important class of processes. The procedures are illustrated by examples taken from the fields of stochastic scheduling and research planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. Benkherouf, K.D. Glazebrook and R.W. Owen, Gittins' indices and oil exploration (1990), submitted for publication.

  2. J. Bruno and M. Hofri, On scheduling chains of jobs on one processor with limited preemption, SIAM J. Comput. 4 (1975) 478–490.

    Google Scholar 

  3. R.W. Cranston, First experiences with a ranking method for portfolio selection, R & D. Mgmt. 5 (1975) 88–92.

    Google Scholar 

  4. J.C. Gittins, Bandit processes and dynamic allocation indices (with discussion), J. Roy. Statist. Soc. B41 (1979) 148–177.

    Google Scholar 

  5. J.C. Gittins,Multi-Armed Bandit Allocation Indices (Wiley, New York, 1989).

    Google Scholar 

  6. K.D. Glazebrook, Stochastic scheduling with order constraints, Int. J. Syst. Sci. 7 (1976) 657–666.

    Google Scholar 

  7. K.D. Glazebrook, Some ranking formulae for alternative research projects, OMEGA 4 (1978) 79–83.

    Google Scholar 

  8. K.D. Glazebrook, Stoppable families of alternative bandit processes, J. Appl. Prob. 16 (1979) 843–854.

    Google Scholar 

  9. K.D. Glazebrook, On a sufficient condition for superprocesses due to Whittle, J. Appl. Prob. 19 (1982) 99–110.

    Google Scholar 

  10. K.D. Glazebrook, On a reduction principle in dynamic programming, Adv. Appl. Prob. 20 (1988) 836–851.

    Google Scholar 

  11. K.D. Glazebrook, Procedures for the evaluation of strategies for resource allocation in a stochastic environment, J. Appl. Prob. 27 (1990) 215–220.

    Google Scholar 

  12. K.D. Glazebrook, A comparative study of procedures for policy evaluation in stochastic scheduling (1990), submitted for publication.

  13. K.D. Glazebrook and N.A. Fay, Evaluating strategies for Markov decision processes in parallel, Math. Oper. Res. 15 (1990) 17–32.

    Google Scholar 

  14. K.D. Glazebrook and N.A. Fay, Stochastic scheduling — do we need to model arrivals? (1989), submitted for publication.

  15. K.D. Glazebrook and N.A. Fay, A general model for the scheduling of alternative tasks on a single machine, Prob. Eng. Inf. Sci. 3 (1989) 199–221.

    Google Scholar 

  16. M.N. Katehakis and A.F. Veinott, The multi-armed bandit problem: decomposition and computation, Math. Oper. Res. 12 (1987) 262–268.

    Google Scholar 

  17. P. Nash, Optimal allocation of resources between research projects, Ph.D. thesis, Cambridge University (1973).

  18. P. Varaiya, J. Walrand and C. Buyukkoc, Extensions of the multi-armed bandit problem: the discounted case, IEEE Trans. Auto. Control AC-30 (1985) 426–439.

    Google Scholar 

  19. P. Whittle, Multi-armed bandits and the Gittins index, J. Roy. Statist. Soc. B42 (1980) 143–149.

    Google Scholar 

  20. P. Whittle, Arm-acquiring bandits, Ann. Prob. 9 (1981) 284–292.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Glazebrook, K.D. Competing Markov decision processes. Ann Oper Res 29, 537–563 (1991). https://doi.org/10.1007/BF02283613

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02283613

Keywords

Navigation