Summary
This paper considers a random walk type Markov decision process in which the state spaceI is an integer subset of IRm, and the action spaceK is independent ofi εI. The natural order
, overI, and a quasi order,
′, overK, is assumed, together with aconditional convexity assumption on the returns {r ki }, and certain other assumptions about these rewards and the transition probabilities in relationship to the orders
and
′.A negatively isotone policy is one for whichi
i′→δ(i)⊁′)δ(i′) (i.e.δ(i)
′δ(i)′ orδ(i′)
′δi)). It is shown that, under specified conditions, a negatively isotone optimal policy exists. Some consideration is given to computational implications in particular relationship to Howard's policy space method.
Zusammenfassung
Wir betrachten einen Markovschen Entscheidungsprozeß vom random walk Typ. Der ZustandsraumI sei eine Teilmenge des IRm, wobeii εI ganzzahlige Komponenten habe. Die MengeK der zulässigen Aktionen ini εI sei unabhängig voni εI. Sei
die natürliche Ordnung aufI und
′ sei eine Quasiordnung aufK. Die Erträge {r ki }seienbedingt konvex, darüberhinaus seien weitere Voraussetzungen über diese Erträge und die Übergangswahrscheinlichkeiten in Bezug auf die Ordnungen
und
′ erfüllt. Eine Politik δ heißt negativ isoton, falls ausi
i′ folgtδi⊁′δ(i′) (d. h.δ(i)
′δ(i)′ oderδ(i)′
′δ(i)′). Wir zeigen, daß unter gewissen Voraussetzungen einenegativ isotone optimale Politik existiert: Auch diskutieren wir einige Folgerungen für die Numerik, insbesondere hinsichtlich Howards Politikiteration.
Similar content being viewed by others
References
White DJ (1977) Kernels of preference structures. Econometrica 45:91–100
Porteus EL (1975) On the optimality of structured policies in countable stage decision processes. Manag Sci 22:148–157
White CC (1980) The optimality of isotone strategies for Markov decision processes with utility criterion. In: Hartley R, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, London, pp 261–275
White DJ (1981) Isotone optimal policies for structured Markov decision processes. Eur J Oper Res 7:392–402
Kalin D (1978) A note on monotone optimal policies for Markov decision processes. Math Prog 15:220–222
Serfozo RF (1976) Monotone optimal policies for Markov decision processes. Math Prog Study 6:202–215
Howard RA (1960) Dynamic programming and Markov processes. J Wiley, New York
Tijms H (1980) An algorithm for average costs denumerable state semi-Markov decision problems with applications to controlled production and queueing systems. In: Hartley R, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, London, pp 143–179
Bellman R (1957) Dynamic programming. Princeton University Press, New Jersey
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
White, D.J. Negatively isotone optimal policies for random walk type Markov decision processes. OR Spektrum 4, 41–45 (1982). https://doi.org/10.1007/BF01837023
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01837023