MDPs with Energy-Parity Objectives

Energy-parity objectives combine $\omega$-regular with quantitative objectives of reward MDPs. The controller needs to avoid to run out of energy while satisfying a parity objective. We refute the common belief that, if an energy-parity objective holds almost-surely, then this can be realised by some finite memory strategy. We provide a surprisingly simple counterexample that only uses coB\"uchi conditions. We introduce the new class of bounded (energy) storage objectives that, when combined with parity objectives, preserve the finite memory property. Based on these, we show that almost-sure and limit-sure energy-parity objectives, as well as almost-sure and limit-sure storage parity objectives, are in $\mathit{NP}\cap \mathit{coNP}$ and can be solved in pseudo-polynomial time for energy-parity MDPs.


I. INTRODUCTION
Context. Markov decision processes (MDPs) are a standard model for dynamic systems that exhibit both stochastic and controlled behaviour [1]. Such a process starts in an initial state and makes a sequence of transitions between states. Depending on the type of the current state, either the controller gets to choose an enabled transition (or a distribution over transitions), or the next transition is chosen randomly according to a predefined distribution. By fixing a strategy for the controller, one obtains a Markov chain. The goal of the controller is to optimise the (expected) value of some objective function on runs of such an induced Markov chain.
Our Focus and Motivation. In this paper we study MDPs with a finite number of states, where numeric rewards (which can be negative) are assigned to transitions. We consider quantitative objectives, e.g. the total expected reward or the limit-average expected reward [1], [2]. Note that the total reward is not bounded a priori. We also consider ω-regular objectives that can be expressed by parity conditions on the sequence of visited states (subsuming many simpler objectives like Büchi and coBüchi).
When reasoning about controllers for mechanical and electrical systems, one may need to consider quantitative objectives such as the remaining stored energy of the system (which must not fall below zero, or else the system fails), and, at the same time, parity objectives that describe the correct behaviour based on the temporal specification. Thus one needs to study the combined energy-parity objective.
Status Quo. Previous work in [2] (Sec. 3) considered the decidability and complexity of the question whether the energyparity objective can be satisfied almost-surely, i.e. whether there exists a strategy (or: a controller) that satisfies the objective with probability 1. They first show that in the restricted case of energy-Büchi objectives, finite memory optimal strategies exist, and that almost-sure satisfiability is in NP ∩ coNP and can be solved in pseudo-polynomial time.
They then describe a direct reduction from almost-sure energy-parity to almost-sure energy-Büchi. This reduction claimed that the winning strategy could be chosen among a certain subclass of strategies, that we call colour-committing. Such a strategy eventually commits to a particular winning even colour, where this colour must be seen infinitely often almost-surely and no smaller colour must ever been seen after committing.
However, this reduction from almost-sure energy-parity to almost-sure energy-Büchi in [2] (Sec. 3) contains a subtle error (which also appears in the survey in [3] (Theorem 4)). In fact, we show that strategies for almost-sure energy-parity may require infinite memory. Our contributions can be summarised as follows.
1) We provide a simple counterexample that shows that, even for almost-sure energy-coBüchi objectives, the winning strategy requires infinite memory and cannot be chosen among the colour-committing strategies.
2) We introduce an energy storage objective, which requires that the energy objective is met using a finite energy store. The size of the store can be fixed by the controller, but it cannot be changed. We argue that the almost-sure winning sets for energy-Büchi and storage-Büchi objectives coincide. Moreover, we show that the reduction in [2] actually works for storage-parity instead of for energy-parity conditions. I.e. [2] shows that almost-sure storage parity objectives require just finite memory, are in NP ∩ coNP, and can be solved in pseudo-polynomial time.
3) We develop a solution for the original almost-sure energy-parity objective. It requires a more involved argument and infinite-memory strategies that are obtained by composing three other strategies. We show that almost-sure energy-parity objectives are in NP ∩ coNP and can be solved in pseudopolynomial time.
4) We then study the limit-sure problem. Here one asks whether, for every ǫ > 0, there exists a strategy that satisfies the objective with probability ≥ 1 − ǫ. This is a question about the existence of a family of ǫ-optimal strategies, not about a single strategy as in the almost-sure problem. The limitsure problem is equivalent to the question whether the value of a given state and initial energy level (w.r.t. the objective) is 1. For the storage-parity objective, the limit-sure condition coincides with the almost-sure condition, and thus the complexity results from [2] apply. In contrast, for the energyparity objective the limit-sure condition does not coincide with the almost-sure condition. While almost-sure energy-parity implies limit-sure energy-parity, we give examples that show that the reverse implication does not hold. We develop an algorithm to decide the limit-sure energy-parity objective and show that the problem is in NP ∩ coNP and can be solved in pseudo-polynomial time. Moreover, each member in the family of ǫ-optimal strategies that witnesses the limit-sure energyparity condition can be chosen as a finite-memory strategy (unlike winning strategies for almost-sure energy-parity that may require infinite memory).
Related work. Energy games were introduced in [4] to reason about systems with multiple components and bounded resources. Energy objectives were later also considered in the context of timed systems [5], synthesis of robust systems [6], and gambling [7] (where they translate to "not going bankrupt"). The first analysis of a combined qualitativequantitative objective was done in [8] for mean-payoff parity games. Almost-sure winning in energy-parity MDPs was considered in [2] (cf. [3] as a survey). However, it was shown in [2] that almost-sure winning in energy-parity MDPs is at least as hard as two player energy-parity games [9]. A recent paper [10] considers a different combined objective: maximising the expected mean-payoff while satisfying the energy objective. The proof of Lemma 3 in [10] uses a reduction to the (generally incorrect) result on energy-parity MDPs of [2], but their result still holds because it only uses the correct part about energy-Büchi MDPs.
Closely related to energy MDPs and games are one-counter MDPs and games, where the counter value can be seen as the current energy level. One-counter MDPs and games with a termination objective (i.e. reaching counter value 0) were studied in [11] and [12], respectively.
Outline of the Paper. The following section introduces the necessary notations. Section III discusses combined energyparity objectives and formally states our results. In Section IV we explain the error in the construction of [2] (Sec. 3), define the bounded energy storage condition and derive the results on combined storage-parity objectives. Section V discusses the almost-sure problem for energy-parity conditions and provides our new proof of their decidability. The limit-sure problem for energy-parity is discussed in Sections VI and VII. Section VIII discusses lower bounds and the relation between almost/limitsure parity MDPs and mean-payoff games.

II. NOTATIONS
A probability distribution over a set X is a function f : X → [0, 1] such that x∈X f (x) = 1. We write D(X) for the set of distributions over X.

Markov Chains.
A Markov chain is an edge-labeled, directed graph C def = (V, E, λ), where the elements of V are called states, such that the labelling λ : E → [0, 1] provides a probability distribution over the set of outgoing transitions of any state s ∈ V . A path is a finite or infinite sequence ρ def = s 1 s 2 . . . of states such that (s i , s i+1 ) ∈ E holds for all indices i; an infinite path is called a run. We use w ∈ V * to denote a finite path. We write s x − → t instead of (s, t) ∈ E ∧ λ(s, t) = x and omit superscripts whenever clear from the context. We write Runs C w for the cone set wV ω , i.e., the set of runs with finite prefix w ∈ V * , and assign to it the probability space (Runs C w , F C w , P C w ), where F C w is the σ-algebra generated by all cone sets Runs C wx ⊆ Runs C w , for ) for cone sets. By Carathéodory's extension theorem [13], this defines a unique probability measure on all measurable subsets of runs.

Markov Decision Processes. A Markov Decision Process
and probabilistic states (V P ). The set of edges is E ⊆ V × V and λ : V P → D(E) assigns each probabilistic state a probability distribution over its outgoing edges.
A strategy is a function σ : V * V C → D(E) that assigns each word ws ∈ V * V C a probability distribution over the outgoing edges of s, that is σ(ws)(e) > 0 implies e = (s, t) ∈ E for some t ∈ V . A strategy is called memoryless if σ(xs) = σ(ys) for all x, y ∈ V * and s ∈ V C , and deterministic if if σ(w) is Dirac for all w ∈ V * V C . Each strategy induces a Markov chain M(σ) with states V * and where ws We write Runs M w for the set of runs in M (with prefix w), consisting of all runs in Runs M(σ) w for some strategy σ, and Runs M for the set of all such paths.

Objective
Functions. An objective is a subset Obj ⊆ Runs M . We write Obj def = Runs M \ Obj for its complement. It is satisfied surely if there is a strategy σ such that Runs M(σ) ⊆ Obj, almost-surely if there exists a σ such that P M(σ) (Obj) = 1 and limit-surely if sup σ P M(σ) (Obj) = 1. In other words, the limit-sure condition asks that there exists some infinite sequence σ 1 , σ 2 , . . . of strategies such that lim n→∞ P M(σn) (Obj) = 1. We call a strategy ε-safe if P M(σ) (Obj) ≥ 1 − ε. Relative to a given MDP M and some finite path w, we define the value of Obj as Val M w (Obj) def = sup σ P M(σ) w (Obj). We use the following objectives, defined by conditions on individual runs.
A reachability condition is defined by a set of target states T ⊆ V . A run s 0 s 1 . . . satisfies the reachability condition iff there exists an i ∈ N s.t. s i ∈ T . We write ♦T ⊆ Runs for the set of runs that satisfy the reachability condition.
A parity condition is given by a function parity : V → N, that assigns a priority (non-negative integer) to each state. A run ρ ∈ Runs satisfies the parity condition if the minimal priority that appears infinitely often on the run is even. The parity objective is the subset PAR ⊆ Runs of runs that satisfy the parity condition.
Energy conditions are given by a function cost : E → Z, that assigns a cost value to each edge. For a given initial energy value k ∈ N, a run s 0 s 1 . . . satisfies the k-energy condition if, for every finite prefix, the energy level k+ l i=0 cost (s i , s i+1 ) stays greater or equal to 0. Let EN(k) ⊆ Runs denote the k-energy objective, consisting of those runs that satisfy the k-energy condition.
Mean-payoff conditions are defined w.r.t. the same cost function cost : E → Z as the energy conditions. A run s 0 s 1 . . . satisfies the positive mean-payoff condition iff lim inf n→∞ 1 n n−1 i=0 cost (s i , s i+1 ) > 0. We write PosMP ⊆ Runs for the positive mean-payoff objective, consisting of those runs that satisfy the positive mean-payoff condition.

III. PARITY CONDITIONS UNDER ENERGY CONSTRAINTS
We study the combination of energy and parity objectives for finite MDPs. That is, given a MDP and both cost and parity functions, we consider objectives of the form EN(k) ∩ PAR for integers k ∈ N. We are interested in identifying those control states and values k ∈ N for which the combined k-energy-parity objective is satisfied almost-surely and limitsurely, respectively.

+1
From states other than s there is only one strategy. It holds that Val l (PAR) = 1 but Val l (EN(k)) = 0 for any k ∈ N and so Val l (EN(k) ∩ PAR) = 0. For state r we have that Val r (EN(k) ∩ PAR) = Val r (EN(k)) = 1 − (1/2) k , due to the positive drift. For all k ∈ N the state s does not satisfy the k-energy-parity objective almost-surely but limit-surely: Val s (EN(k) ∩ PAR) = 1 (by going ever higher and then right).
Notice that these energy-parity objectives are trivially monotone in the parameter k because EN(k) ⊆ EN(k + 1) holds for all k ∈ N. Consequently, for every fixed state p, if there exists some k ∈ N such that the k-energy-parity objective holds almost-surely (resp. limit-surely), then there is a minimal such value k. By solving the almost-sure/limit-sure problems for these monotone objectives we mean to compute these minimal sufficient values for all initial states.
We now state our two main technical results. We fix a finite MDP M def = (V C , V P , E, λ), a parity function parity : V → N with maximal colour d ∈ N and a cost-function cost : E → Z with maximal absolute value W def = max e∈E |cost (e)|. Let |λ| and |cost | be the size of the transition table λ and the cost function cost , written as tables with valuations in binary. We use O(f (n)) as a shorthand for O(f (n) log k f (n)) for some constant k.
(2) If the k-energy-parity objective holds limit-surely then, for each ε > 0, there exists a finite memory ε-safe strategy.
Remark 4. The claimed algorithms are pseudo polynomial in the sense that they depend (linearly) on the value W . If the cost-deltas are −1, 0, or 1 only, and not arbitrary binary encoded numbers, this provides a polynomial time algorithm.
Part (2) of Theorem 2 was already claimed in [2], Theorem 1. However, the proof there relies on a particular finiteness assumption that is not true in general. In the next section we discuss this subtle error and describe the class of (bounded) storage objectives, for which this assumption holds and the original proof goes through. Our new proof of Theorem 2 is presented in Section V.
The proof of Theorem 3 is deferred to Sections VI and VII. It is based on a reduction to checking almost-sure satisfiability of storage-parity objectives, which can be done in pseudo polynomial time (cf. Theorem 8). We first establish in Section VI that certain limit values are computable for each state. In Section VII we then provide the actual reduction, which is based on precomputing these limit values and produces an MDP which is only linearly larger and has no new priorities.

IV. ENERGY STORAGE CONSTRAINTS
The argument of [2] to show computability of almost-sure energy-parity objectives relies on the claim that the controller, if it has a winning strategy, can eventually commit to visiting an even colour infinitely often and never visiting smaller colours. We show that this claim already fails for coBüchi conditions (i.e. for MDPs that only use colours 1 and 2). We then identify a stronger kind of energy condition-the storage energy condition we introduce below-that satisfies the above claim and for which the original proof of [2] goes through.
Let us call a strategy colour-committing if, for some colour 2i, almost all runs eventually visit a position such that almost all possible continuations visit colour 2i infinitely often and no continuation (as this is a safety constraint) visits a colour smaller than 2i.

Claim 5.
If there exists some strategy that almost-surely satisfies EN(k) ∩ PAR then there is also a colour-committing strategy that does.
Proof (that Claim 5 is false). Consider the following example, where the controller owns states A, B, C (with colour 2) and tries to avoid state B (with colour 1) while maintaining the energy condition. First notice that all states almost-surely satisfy the 0-energy-coBüchi condition EN(0) ∩ PAR. One winning strategy is to This strategy is not colour-committing but clearly energy safe: the only decreasing step is avoided if the energy level is 0.
To see why this strategy also almost-surely satisfies the parity (in this case coBüchi) objective, first observe that it guarantees a positive updrift: from state D with positive energy level, the play returns to D in two steps with expected energy gain +1/3, from state D with energy level 0, the play returns to D in either two or three steps, in both cases with energy gain +1. The chance to visit state C with energy level 0 when starting at state D with energy level k ∈ N is (1/2) k+1 . This is the same likelihood with which state B is eventually visited. However, every time state B is visited, the system restarts from state D with energy level 1. Therefore, the chance of revisiting B from B is merely 1/4. More generally, the chance of seeing state B at least n further times is (1/4) n . The chance of visiting B infinitely often is therefore lim n→∞ (1/4) n = 0. This strategy thus satisfies the parity-in this case coBüchiobjective almost-surely. Consequently, the combined 0-energyparity objective is almost-surely met from all states.
To contradict Claim 5, we contradict the existence of an initial state and a colour-committing strategy that almostsurely satisfies the 0-energy-parity objective. By definition, such a strategy will, on almost all runs, eventually avoid state B completely.
As every run will surely visit state D infinitely often, we can w.l.o.g. pick a finite possible prefix s 1 s 2 . . . s j (i.e. a prefix that can occur with a likelihood δ > 0) of a run that ends in state s j = D and assume that none (or only a 0 set, but these two conditions coincide for safety objectives) of its continuations visits state B again. Let l def = j i=1 cost (s i , s i+1 ) denote the sum of rewards collected on this prefix. Note that there is a (1/3) l+1 > 0 chance that some continuation alternates between states D and C for l + 1 times and thus violates the l-energy condition. Consequently, the chance of violating the 0-energy parity condition from the initial state is at least δ · (1/2) l+1 > 0.
Notice that every finite memory winning strategy for the PAR objective must also be colour-committing. The system above therefore also proves part (1) of Theorem 2, that infinite memory is required for k-energy-parity objectives.
In the rest of this section we consider a stronger kind of energy condition, for which Claim 5 does hold and the original proof of [2] goes through. The requirement is that the strategy achieves the energy condition without being able to store an infinite amount of energy. Instead, it has a finite energy store, say s, and cannot store more energy than the size of this storage. Thus, when a transition would lead to an energy level s ′ > s, then it would result in an available energy of s. These are typical behaviours of realistic energy stores, e.g. a rechargeable battery or a storage lake. An alternative view (and a consequence) is that the representation of the system becomes finite-state once the bound s is fixed, and only finite memory is needed to remember the current energy level.
For the definition of a storage objective, we keep the infinite storage capacity, but instead require that no subsequence loses more than s energy units. The definitions are interchangeable, and we chose this one in order not to change the transitions of the system.

+2
From state q in the middle, one can win with an initial energy level 0 by always going left, provided that one has an energy store of size at least 2. With an energy store of size 1, however, going left is not an option, as one would not be able to return from the state on the left. But with an initial energy level of 1, one can follow the strategy to always turn to the right. So the ST(0, 2) and ST(1, 1) objectives hold almost-surely but the ST(0, 1) objective does not.
We sometimes want to leave the size of the energy store open. For this purpose, we define ST(k) as the objective that says "there is an s, such that ST(k, s) holds" and ST for "there is an s such that ST(s, s) holds". Note that this is not a path property; we rather require that the s is fixed globally. In order to meet an ST(k) property almost-surely, there must be a strategy σ and an s ∈ N such that almost all runs satisfy ST(k, s): ∃σ, s s.t. P M(σ) (ST(k, s)) = 1. Likewise, for limit-sure satisfaction of ST, we require ∃s ∀ε > 0 ∃σ s.t.
We now look at combined storage-parity and storage-meanpayoff objectives. 1) The almost-sure problem for storage-parity objectives is in NP ∩ coNP, and there is an algorithm to solve it in winning strategies. This also bounds the minimal values k, s ∈ N such that ST(k, s) ∩ PAR holds almost-surely.
The proof is provided by Chatterjee and Doyen [2]: they first show the claim for energy-Büchi objectives EN(k) ∩ PAR (where d = 1) by reduction to two-player energy-Büchi games ( [2], Lemma 2). Therefore, almost-sure winning strategies come from first-cycle games and operate in a bounded energy radius. As a result, almost-sure satisfiability for energy-Büchi and storage-Büchi coincide. They then ( [2], Lemma 3) provide a reduction for general parity conditions to the Büchi case, assuming Claim 5. Although this fails for energy-parity objectives, as we have shown above, almost-sure winning strategies for storage-parity can be assumed to be finite memory and therefore colour committing. The construction of [2] then goes through without alteration. The complexity bound follows from improvements for energy parity games [9].

Theorem 9 (Storage-Mean-payoff). For finite MDPs with combined storage and positive mean-payoff objectives:
1) The almost-sure problem is in NP ∩ coNP and can be winning strategies. This also bounds the minimal value k, s ∈ N such that ST(k, s)∩PosMP holds almost-surely.
Proof. We show that, for every MDP M with associated cost function, there is a linearly larger system M ′ with associated cost ′ and parity function -where the parity function is Büchi, i.e. has image {0, 1}-that, for every k ∈ N, PosMP ∩ ST(k) holds almost-surely in M iff PAR ∩ST(k) holds almost-surely in M ′ . For every state q of M, the new system M ′ contains two new states, q ′ and q ′′ , edges (q, q ′ ) and (q, q ′′ ) with costs 0 and −1, respectively. Each original edge (q, r) is replaced by two edges, (q ′ , r) and (q ′′ , r). All original states become controlled, and the primed and double primed copies of a state q are controlled if, and only if, q was controlled in M. The double primed states have colour 0, while all original and primed states have colour 1. See Figure 2 (on the left) for an illustration.
To give the idea of this construction in a nutshell, the Büchi condition in M ′ intuitively sells one energy unit for visiting an accepting state (or: for visiting a state with colour 0, the double primed copy). ST(k) implies that, as soon as s + 1 energy is available, one can sell off one energy unit for a visit of an intermediate accepting state. PosMP implies that this can almost-surely be done infinitely often. Vice-versa, ST(k) implies non-negative mean payoff. ST(k) plus Büchi can always be realised with finite memory by Theorem 8 (2), and such a strategy then implies that PosMP ∩ ST(k) holds almost-surely in M. Now the claim holds by Theorem 8. Remark 10. Note that the order of quantification in the limitsure problems for storage objectives (∃s.∀ε . . .) means that limit-sure and almost-sure winning coincides for storage-parity objectives: if there is an s such that ST(s, s) ∩ PAR holds limit-surely then one can get rid of the storage condition by hardcoding energy-values up to s into the states. The same is true for mean-payoff-storage objectives. The claims in Theorems 8 and 9 thus also hold for the limit-sure problems.
Finally, we restate the result from [2], Theorem 2 (1) on positive mean-payoff-parity objectives and add to it an explicit computational complexity bound that we will need later. Proof. The computation complexity bound follows from the analysis of Algorithm 1 in [2]. It executes d/2 iterations of a loop, in which Step 3.3 of computing the mean-payoff of maximal end components dominates the cost. This can be formulated as a linear program (LP) that uses two variables, called gain and bias, for each state [1]. This LP can be solved using Karmarkar's algorithm [14] in time O(|V | 3.5 · (|λ| + |cost |) 2 ). Note that the complexity refers to all (not each) maximal end-components.
As we do not need to obtain a maximal payoff τ > 0 but can use any smaller value, like τ /2, finite memory suffices.

V. ALMOST-SURE ENERGY-PARITY
In this section we prove Theorem 2. Our proof can be explained in terms of the three basic objectives: storage (ST), positive mean-payoff (PosMP), and parity (PAR). It is based on the intuition provided by the counterexample in the previous section. Namely, in order to almost-surely satisfy the energy-parity condition one needs to combine two strategies: 1) One that guarantees the parity condition and, at the same time, a positive expected mean-payoff. Using this strategy one can achieve the energy-parity objective with some non-zero chance. 2) A bailout strategy that guarantees positive expected mean-payoff together with a storage condition. This allows to (almost-surely) set the accumulated energy level to some arbitrarily high value.
We show that, unless there exist some safe strategy that satisfies storage-parity, it is sufficient (and necessary) that such two strategies exist and that the controller can freely switch between them. I.e. they do not leave the combined almostsure winning set unless a state that satisfies storage-parity is reached.
Recall that the combined positive mean-payoff-parity objective (for case 1 above) is independent of an initial energy level and its almost-sure problem is decidable in polynomial time due to Theorem 11. The mean-payoff-storage objective ST(k)∩PosMP (for case 2 above), as well as the storage-parity objective are computable by Theorems 8 and 9, respectively. See Figure 1.
To establish Theorem 2, we encode the almost-sure winning sets of the storage-parity objective directly into the system (Definition 12 and Lemma 13), in order to focus on the two interesting conditions from above. We then show (Definition 14 and Lemma 15) that the existence of the two strategies for bailout and ST ∩ PosMP, and the minimal safe energy levels can be computed in the claimed bounds. In Lemma 16 we show that these values coincide with the minimal energy levels of the energy-parity objective for the original system, which concludes the proof.
For every state q of M there are two states, q and q ′ in V ′ such that both have the same colour as q in M, every original incoming edge now only goes to q ′ , and every original outgoing edge now only goes from q. Moreover, q ′ is controlled and has an edge to q with cost(q ′ , q) = 0.
Finally, M ′ contains a single winning sink state w with colour 0 and a positive self-loop, and every state q ′ gets an edge to w where the cost of −k q , where k q ∈ N is the minimal value such that, for some s ∈ N, the storage-parity objective ST(k q , s) ∩ PAR holds almost-surely See Figure 2  The relevance of these numbers for us is, intuitively, that if safe(q) is finite, then there exists a pair of strategies, one for the PosMP ∩ PAR and one for the PosMP ∩ ST(k) objective, between which the controller can switch as often as she wants. The set R is in fact the result of a refinement procedure that starts with all states of M ′ . In each round, it removes states that fail either of the two conditions. For every projection M ′ |S, checking Condition 1 takes O(d·|V | 3.5 ·(|λ|+|cost |) 2 ) time by Theorem 11 and Condition 2 can be checked in O(|E| · d · |V | 4 · W ) time by Theorem 9. All in all, this provides a pseudo-polynomial time algorithm to compute R. By another application of Theorem 9, we can compute the (pseudo-polynomially bounded) values safe(q). In order to verify candidates for values safe(q) in NP, and also coNP, one can guess a witness, the sequence of sets R 0 ⊃ R 1 ⊃ . . . ⊃ R j = R, together with certificates for all i ≤ j that R i+1 is the correct set following R i in the refinement procedure. This can be checked all at once by considering the disjoint union of all M ′ |R i . Lemma 16. For every k ∈ N and state q in M ′ , the energyparity objective EN(k) ∩ PAR holds almost-surely from q if, and only if, safe(q) ≤ k.
Proof. ( =⇒ ). First observe that the winning sink w in M ′ is contained in R, and has safe(w) = 0 since the only strategy from that state satisfies ST(0, 0) ∩ PAR ∩ PosMP.
For all other states there are two cases: either there is an s ∈ N such that ST(k, s) ∩ PAR holds almost-surely, or there is no such s. If there is, then the strategy that goes to the sink guarantees the objective ST(k, s) ∩ PAR ∩ PosMP, which implies the claim.
For the second case (there is no s such that ST(k, s) ∩ PAR holds almost-surely) we see that every almost-surely winning strategy for EN(k) ∩ PAR must also almost-surely satisfy PosMP. To see this, note that the energy condition implies a non-negative expected mean-payoff, and that an expected mean-payoff of 0 would imply that the storage condition ST(k, s) is satisfied for some s, which contradicts our assumption. Consequently the PosMP ∩ PAR objective holds almost-surely.
We now show that the ST(k, s) ∩ PosMP objective holds almost-surely in state q, where s > safe(r) for all states r with safe(r) < ∞. We now define a strategy that achieves ST(k, s) ∩ PosMP. For this, we first fix a strategy σ q that achieves EN(h q ) ∩ PAR with h q = safe(q) for each state q with safe(q) < ∞.
When starting in q, we follow σ q until one of the following three events happen. We have (1) sufficient energy to move to the winning sink w. In this case we do so. Otherwise, if we (2) have reached a state r and since starting to follow σ q , the energy balance is strictly greater than 1 h r − h q . Then we abandon σ q and follow σ r as if we were starting the game.
Before we turn to the third event, we observe that, for each strategy σ q , there is a minimal distance 2 d q ∈ N to (1) or (2) and a positive probability p q > 0 that either event is reached in d q steps. The third event is now simply that (3) d q steps have lapsed. When in state r we then also continue with σ r as if we were starting the game.
It is obvious that no path has negative mean payoff. Moreover, as long as the game does not proceed to the winning sink, a partial run starting at a state q and ending at a state r has energy balance ≥ h r − h q , such that the resulting strategy surely satisfies ST. The expected mean payoff is ≥ p q /d q , and PosMP is obviously satisfied almost-surely. Consequently, ST(h q , s) ∩ PosMP holds almost-surely from q.
We conclude that every state for which the EN(k) ∩ PAR objective holds almost-surely must satisfy both criteria of Definition 14 and thus be a member of R. Since almost-sure winning strategies cannot leave the respective winning sets, this means that every winning strategy for the above objective also applies in M ′ |R and thus justifies that safe(q) ≤ k.
( ⇐= ). By definition of R, there are two finite memory strategies σ and β which almost-surely satisfy the PosMP ∩ PAR, and the bailout objective PosMP ∩ ST(k), respectively, from every state q with safe(q) ≤ k. Moreover, those strategies will never visit any state outside of R.
We start with the bailout strategy β and run it until the energy level is high enough (see below). We then turn to σ and follow it until (if ever) it could happen in the next step that a state q is reached while the energy level falls below safe(q). We then switch back to β.
The "high enough" can be achieved by collecting enough energy that there is a positive probability that one does not change back from σ to β. For this, we can start with a sufficient energy level e such that σ never hits an energy ≤ 0 with a positive probability 3 . The sum e + s + W consisting of this energy, the sufficient storage level for PosMP∩ST(k), and the maximal change W of the energy level obtained in a single step suffices.
The constructed strategy then almost-surely satisfies the EN(k q ) ∩ PosMP ∩ PAR objective from every state q and k q def = safe(q). In particular, this ensures that the k-energyparity objective holds almost-surely from q in M ′ |R and therefore also in M ′ .

Proof of Theorem 2.
(1) The fact that infinite memory is necessary follows from our counterexample to Claim 5, and the observation that every finite memory winning strategy for the PAR objective must also be colour-committing.
For parts (2) and (3), it suffices, by Lemma 13(1) and Lemma 16, to construct M ′ and compute the values safe(q) for every state q of M ′ . The claims then follow from Lemma 15.

VI. LIMIT VALUES
Since EN(k) ⊆ EN(k+1) holds for all k ∈ N, the chance of satisfying the k-energy-parity objective depends (in a monotone fashion) on the initial energy level: for every state p we have that Val M p (EN(k) ∩ PAR) ≤ Val M p (EN(k + 1) ∩ PAR). We can therefore consider the respective limit values as the limits of these values for growing k: Note that this is not the same as the value of PAR alone. For instance, the state l from Example 1 has limit value LVal l = 0 = Val l (PAR) = 1.
The states r and s from Example 1 have LVal r =1 and LVal s =1. In fact, for any M, w, k and parity objective PAR it Limit values are an important ingredient in our proof of Theorem 3. This is due to the following property, which directly follows from the definition. Lemma 17. Let M be an MDP and p be a state with LVal M p = 1. Then, for all ε > 0, there exist a k ∈ N and a strategy σ such that P We now show how to compute limit values, based on the following two sets.
The first set, A, contains those states that satisfy the kenergy-parity condition almost-surely for some energy level k ∈ N. The second set, B, contains those states that almostsurely satisfy the combined positive mean-payoff-parity objective. Our argument for computability of the limit values is based on the following theorem, which claims that limit values correspond to the values of a reachability objective with target A ∪ B. Before we prove this claim by Lemmas 22 and 25 in the remainder of this section, we remark that we can compute A ∪ B without constructing A. Let us consider the set and observe that A ′ ⊆ A holds by definition and that the construction of A from Theorem 2 establishes A ⊆ A ′ ∪ B. Thus, A ∪ B = A ′ ∪ B holds, and it suffices to construct A ′ and B, which is cheaper than constructing A and B. We now start with some notation. Proof. (sketch) We consider a strategy σ that follows the optimal (w.r.t. the mean-payoff) strategy most of the time and "moves to" a fixed state p with the minimal even parity 2i only sparsely. Such a strategy keeps the mean-payoff value positive while satisfying the parity condition. We show that σ can be defined to use finite memory or no memory, but randomisation. Either way, σ induces a probabilistic one-counter automata [15], whose probability of ever decreasing the counter by some finite k can be analysed, based on the mean-payoff value, using the results in [16]. The rest of the details can be found in the appendix.

Lemma 22. For every MDP M and state p, LVal
We show that LVal M p is at least τ −2ε for all ε > 0 as follows. We start by choosing k ∈ N big enough so that for every state q ∈ A ∪ B, some strategy satisfies the k-energyparity objective with probability > 1 − ε. We then consider a memoryless strategy (e.g. from solving the associated linear program), which guarantees that the set A ∪ B is reached with likelihood τ , and then determine a natural number l such that it is reached within l steps with probability > τ − ε. This reachability strategy σ can now be combined with an ε-safe strategy for states in A ∪ B: until a state in A ∪ B is reached, the controller plays according to σ and then swaps to a strategy that guarantees the k-energy-parity objective with likelihood > (1 − ε). Such a strategy exists by our assumption on k. This combined strategy will satisfy the EN(k+l)-energy-parity objective with probability > (τ − ε)(1 − ε) ≥ τ − 2ε. If the mean-payoff value is 0, then there exists a bias function b : I → Z that satisfies the following constraints: holds for all controlled states v ∈ I ∩ V C , When adjusting b to b ′ by adding the same constant to all valuations, b ′ is obviously a bias function, too.
We call a transition if it is strongly connected and contains only controlled states with an invariant transition into G and only probabilistic states with only invariant outgoing transitions, which all go to G. We now make the following case distinction.
Case 1: there is a nonempty, invariant set G ⊆ I, such that the state p of G with minimal priority has even priority. First notice that G ⊆ A: if the minimal value of the bias function is b min , then the bias of a state in p minus b min serves as sufficient energy when starting in p: it then holds that P and σ is a memoryless randomised strategy that assigns a positive probability to all transitions into G. Since I is an end-component, it is contained in the attractor of G, which implies the claim, as Att(G) ⊆ Att(A) ⊆ Att(A ∪ B).
Case 2: there is no non-empty invariant set G ⊆ I with even minimal priority. We show that this is a contradiction with the assumption that I is a non-losing set, in particular with condition 3 of Definition 23. We assume for contradiction that there is a strategy σ and an energy level k such that we can satisfy the energy parity condition with a positive probability while staying in I and starting at some state p ∈ I. We also assume w.l.o.g. that all bias values are non-negative, and m is the maximal value among them. We set k ′ = k + m.
The 'interesting' events that can happen during a run are selecting a non-invariant transition from a controlled state or reaching a probabilistic state (and making a random decision from this state), where at least one outgoing transition is noninvariant.
We capture both by random variables, where random variables that refer to taking non-invariant transition from controlled states (are deterministic and) have a negative expected value, while random variables that refer to taking a transition from a probabilistic state where at least one outgoing transition is non-invariant refers to a random variable drawn from a finite weight function with expected value 0 and positive variation. Note that random variables corresponding to probabilistic noninvariant transitions are independent and drawn from a finite set of distributions.
Let α be any infinite sequence of such random variables. From the results on finitely inhomogeneous controlled random walks [17], we can show that almost-surely the sum of some prefix of α will be lower than −k ′ (and in fact lower than any finite number). The proof follows the same reasoning as in Proposition 4.1 of [11], where a sufficient and necessary condition was given for not going bankrupt with a positive probability in solvency games [7].
We now consider the set of runs induced by σ. As we just showed, almost all runs that have infinitely many interesting events (as described above) will not satisfy the k ′ -energy condition. Almost all runs that have finitely many interesting events will have an odd dominating priority, and therefore will not satisfy the parity condition. Thus, the probability that the energy parity condition is satisfied by σ is 0. A ∪ B)).

Lemma 25. For every MDP M and state p, LVal
Proof. Fix p and σ. Every run from p will, with probability 1, eventually reach an end-component and visit all states of the end-component infinitely often [18].
Let C be an end-component such that C forms the infinity set of the runs from p under σ with a positive probability τ > 0. If C does not satisfy the conditions of non-losing end-components, then the probability P M(σ) q (EN(k) ∩ PAR) that the k-energy-parity objective is satisfied from some state q ∈ C is 0, independent of the value k. Thus, the probability of satisfying the k-energy-parity objective from an initial state p is bounded by the chance of reaching a state in some non-losing end-component. These observations hold for every strategy σ and therefore we can bound where NLE ⊆ V denotes the union of all nonlosing end-components. Now Lemma 24 implies that A ∪ B)), which completes the proof. Lemma 26. Determining the limit value of a state p can be done in O(|E| · d · |V | 4 · W + d · |V | 3.5 · (|λ| + |cost |) 2 ) deterministic time. They can also be determined in NP and coNP in the input size when W is given in binary. A ∪ B)) by Theorem 18, that A ∪ B = A ′ ∪ B, and that A ′ and B are the sets of control states that almost-surely satisfy the storageparity and mean-payoff-parity objective, respectively. Using the results of Section VI, the algorithm proceeds as follows.

Proof. Recall that LVal
1) Compute A ′ , which can be done in time O(|E| · d · |V | 4 · W ) by Theorem 8. 2) Compute, for each occurring even priority 2i, the following: a) the set of 2i maximal end-components, which can be computed in O(|E|); and b) the mean payoff value for the 2i maximal endcomponents can be computed using Karmarkar's algorithm [14] for linear programming in time O(|V | 3.5 · (|λ| + |cost |) 2 ) -note that the complexity refers to all (not each) 2i maximal endcomponents. 3) Consider the union of A with all the 2i maximal endcomponents with positive mean payoff computed in Step 2, and compute the maximal achievable probability of reaching this set. (By the results of Section VI, this yields the probability sup σ P M(σ) p (♦ (A ∪ B)).) The last step costs O(|V | 3.5 · |λ| 2 ) [14] for solving the respective linear program [1], which is dominated by the estimation of the cost of solving the linear programs from (2b). Likewise, the cost of Step (2a) is dominated by the cost of Step (2b). This leaves us with once the complexity of (1) and d times the complexity of (2b), resulting in the claimed complexity. Note that it depends on the size of representation (in binary) λ and W (in unary), and the bigger of these values dominates the complexities.
Finally, all steps are in NP and in coNP.

VII. LIMIT-SURE ENERGY-PARITY
In this section we provide the reduction from checking if an energy-parity objective holds limit-surely, to checking if such an objective holds almost-surely. The reduction basically extends the MDP so that the controller may "buy" a visit to a good priority (at the expense of energy) if currently in a state p with limit value LVal p = 1. In the remainder of this section we prove this claim. For brevity, let us write ∆(w) for the cumulative cost Lemma 29. Let M be a MDP with extension M ′ , p be a state of M, k ∈ N and σ ′ a strategy for M ′ such that P Proof. Recall Lemma 17, that states with LVal M s = 1 have the property that, for every ε > 0, there exists n s,ε ∈ N and a strategy σ s,ε such that Consider now a fixed ε > 0 and let n ε def = max{n s,ε | LVal s = 1}. We show the existence of a strategy σ for M that satisfies P We propose the strategy σ which proceeds in M just as σ ′ does in M ′ but skips over "buying" loops (s, s ′ ) followed by (s ′ , s) in M ′ . This goes on indefinitely unless the observed path ρ = s 0 s 1 . . . s l reaches a tipping point: the last state s l has LVal M s l = 1 and the accumulated cost is ∆(ρ) ≥ n ε . At this point σ continues as σ s l ,ε .
We claim that P M(σ) p (EN(k) ∩ PAR) ≥ 1 − ε. Indeed, first notice that for any prefix τ ∈ V * of a run ρ ∈ Runs M(σ) p until the tipping point, there is a unique corresponding path . Moreover, the strategy σ maintains the invariant that the accumulated cost of such prefix τ is , the accumulated cost of the corresponding path τ ′ plus the number of times τ ′ visited a new state in V ′ \ V . In particular this means that the path τ can only violate the energy condition if also τ ′ does.
To show the claim, first notice that the error introduced by the runs in Runs M(σ) p that eventually reach a tipping point cannot exceed ε. This is because from the tipping point onwards, σ proceeds as some σ s,ε and thus achieves the energy-parity condition with chance ≥ 1 − ε. So the error introduced by the runs in Runs M(σ) p is a weighted average of values ≤ ε, and thus itself at most ε. Now suppose a run ρ ∈ Runs M(σ) p never reaches a tipping point. Then the corresponding run ρ ′ ∈ Runs M ′ (σ ′ ) p cannot visit new states in V ′ \ V more than n ε times. Since with chance 1, ρ ′ and therefore also ρ satisfies the k-energy condition it remains to show that ρ also satisfies the parity condition. To see this, just notice that ρ ′ satisfies this condition almost-surely and since it visits new states only finitely often, ρ and ρ ′ share an infinite suffix.
The "only if" direction of Theorem 28 is slightly more complicated. We go via an intermediate finite system B k defined below. The idea is that if EN(k) ∩ PAR holds limitsurely in M then PAR holds limit-surely in B k and since B k is finite this means that PAR also holds almost-surely in B k . Based on an optimal strategy in B k we then derive a strategy in the extension M ′ which satisfies EN(k)∩PAR a.s. The two steps of the argument are shown individually as Lemmas 31 and 32. Together with Lemma 29 these complete the proof of Theorem 28.
Definition 30. Let B k be the finite MDP that mimics M but hardcodes the accumulated costs as long as they remain between −k and |V |. That is, the states of B k are pairs (s, n) where s ∈ V and −k ≤ n ≤ |V |. Moreover, a state (s, n) • is a (losing) sink with maximal odd parity if n = −k or LVal M s < 1, • is a (winning) sink with parity 0 if n = |V |.
We reuse strategies for M in B k and write B k (σ) for the Markov chain that is the result of basing decisions on σ until a sink is reached.
We show that, for every ε > 0, there is a strategy σ such that P B k (σ) s (PAR) ≥ 1 − ε. This would be trivial (by reusing strategies from M) if not for the extra sinks for states with LVal M s < 1. Let's call these states small here and let S be the set of all small states. We aim to show that the kenergy-parity condition can be satisfied and at the same time, the chance of visiting a small state with accumulated cost below |V | can be made arbitrary small. More precisely, define D ⊆ Runs M as the set of runs which never visit a small state with accumulated cost below |V |: We claim that holds. We show this by contradicting the converse that, for Equivalently, we contradict that, for every strategy σ, To do this, we define δ < 1 as the maximum of , that is, the maximal value Val M s (EN(n) ∩ PAR) < 1 for any s ∈ S and n ≤ k + |V |, and 0 if no such value exists. Notice that this is well defined due to the finiteness of V . This value δ estimates the chance that a run that is not in D fails the k-energy-parity condition. In other words, for any strategy σ and value 0 ≤ β ≤ 1, This is because P M(σ) s (D) is the chance of a run reaching a state s with accumulated cost n < |V | and because We pick an ε ′ > 0 that is smaller than (γ/2) · (1 − δ). By assumption of the lemma, there is some strategy σ such that P M(σ) s (EN(k) ∩ PAR) < ε ′ < γ/2. Then by Equation (3), which is a contradiction. We conclude that Equation (2) holds.
To get the conclusion of the lemma just observe that for any strategy σ it holds that P Proof. Finite MDPs have pure optimal strategies for the PAR objective [19]. Thus by assumption and because B k is finite, we can pick an optimal strategy σ satisfying P B k (σ) s (PAR) = 1. Notice that all runs in Runs B k (σ) s according to this optimal strategy must never see a small state (one with LVal M p < 1). Based on σ, we construct the strategy σ ′ for M ′ as follows.
The new strategy just mimics σ until the observed path s 1 s 2 . . . s n visits the first controlled state after a cycle with positive cost: it holds that s n ∈ V C and there are i, j ≤ n with s i = s j and ∆(s i . . . s j ) > 0. When this happens, σ ′ uses the new edges to visit a 0-parity state, forgets about the cycle and continues just as from s 1 s 2 . . . s i s j+1 . . . s n .
We claim that P To see this, just observe that a run of M ′ (σ ′ ) that infinitely often uses new states in V ′ \ V must satisfy the PAR objective as those states have parity 0. Those runs which visit new states only finitely often have a suffix that directly corresponds to a run of B k (σ), and therefore also satisfy the parity objective.
where |V C ′ | = 2 · |V C |, |E ′ | = |E| + 2 · |V C | and the rest is as in M. By Theorem 28, a state p ∈ V C ∪ V P satisfies the k-energy-parity objective limit-surely in M iff it satisfies it almost-surely in M ′ . The claim then follows from Theorem 2.
(3) To see that there are finite memory ε-safe strategies we observe that the strategies we have constructed in Lemma 29 work in phases and in each phase follow some finite memory strategy. In the first state, these strategies follow some almostsurely optimal strategy in the extension M ′ , but only as long as the energy level remains below some threshold that depends on ε. If this level is exceeded it means that a "tipping point" is reached and the strategy switches to a second phase.
The second phase starts from a state with limit value 1, and our strategy just tries to reach a control state in the set A ′ ∪ B from Section VI. For almost-sure reachability, memoryless deterministic strategies suffice. Finally, when ending up in a state of A ′ , the strategy follows an almost-sure optimal strategy for storage-parity (with finite memory by Theorem 8).
Similarly, when ending up in a state of B, the strategy follows almost-sure optimal strategy for the combined positive meanpayoff-parity objective (with finite memory by [2]). A more detailed analysis can be found in Appendix II.

VIII. LOWER BOUNDS
Polynomial time hardness of all our problems follows, e.g., by reduction from REACHABILITY IN AND-OR GRAPHS [20], where non-target leaf nodes are energy decreasing sinks. This works even if the energy deltas are encoded in unary.
If we allow binary encoded energy deltas, i.e. W ≫ 1, then solving two-player energy games is logspace equivalent to solving two-player mean-payoff games ( [5], Prop. 12), a well-studied problem in NP ∩ coNP that is not known to be polynomial [21]. Two-player energy games reduce directly to both almost-sure and limit-sure energy objectives for MDPs, where adversarial states are replaced by (uniformly distributed) probabilistic ones: a player max strategy that avoids ruin in the game directly provides a strategy for the controller in the MDP, which means that the energy objective holds almost-surely (thus also limit-surely). Conversely, a winning strategy for the opponent ensures ruin after a fixed number r of game rounds. Therefore the error introduced by any controller strategy in the MDP is at least (1/d) r , where d is the maximal out-degree of the probabilistic states, which means that the energy objective cannot be satisfied even limit-surely (thus not almost-surely). It follows that almost-sure and limit-sure energy objectives for MDPs are at least as hard as mean-payoff games. The same holds for almost-sure and limit-sure storage objectives for MDPs, since in the absence of parity conditions, storage objectives coincide with energy objectives. Finally we obtain that all the more general almost-sure and limit-sure energyparity and storage-parity objectives for MDPs are at least as hard as mean-payoff games.

IX. CONCLUSIONS AND FUTURE WORK
We have shown that even though strategies for almost-sure energy parity objectives in MDPs require infinite memory, the problem is still in NP ∩ coNP. Moreover, we have shown that the limit-sure problem (i.e. the problem of checking whether a given configuration (state and energy level) in energy-parity MDPs has value 1) is also in NP ∩ coNP. However, the fact that a state has value 1 can always be witnessed by a family of strategies attaining values 1 − ǫ (for every ǫ > 0) where each member of this family uses only finite memory.
We leave open the decidability status of quantitative questions, e.g. whether Val M p (EN(k) ∩ PAR) ≥ 0.5 holds. Energy-parity objectives on finite MDPs correspond to parity objectives on certain types of infinite MDPs where the current energy value is part of the state. More exactly, these infinite MDPs can be described by single-sided vector addition systems with states [22], [23], where the probabilistic transitions cannot change the counter values but only the controlstates (thus yielding an upward-closed winning set). I.e. singlesidedness corresponds to energy objectives. For those systems, almost-sure Büchi objectives are decidable (even for multiple energy dimensions) [23], but the decidability of the limit-sure problem was left open. This problem is solved here, even for parity objectives, but only for dimension one. However, decidability for multiple energy dimensions remains open.
If one considers the more general case of MDPs induced by counter machines, i.e. with zero-testing transitions, then even for single-sided systems as described above all problems become undecidable from dimension 2 onwards. However, decidability of almost-sure and limit-sure parity conditions for MDPs induced by one-counter machines (with only one dimension of energy) remains open. Proof. If the expected mean-payoff value of C is positive, then we can assume w.l.o.g. a pure memoryless strategy σ that achieves this value for all states in C. This is because finite MDPs allow pure and memoryless optimal strategies for the mean-payoff objective (see e.g. [24], Thm. 1). This strategy does not necessarily satisfy the parity objective. However, we can mix it with a pure memoryless reachability strategy ρ that moves to a fixed state p with the minimal even parity 2i among the states in C. Broadly speaking, if we follow the optimal (w.r.t. the mean-payoff) strategy most of the time and "move to p" only sparsely, the mean-payoff of such combined strategy would be affected only slightly. This can be done by using memory to ensure that the mean-payoff value remains positive (resulting in a pure finite memory strategy), or it can be done by always following ρ with a tiny likelihood ε > 0, while following σ with a likelihood of 1 − ε (resulting in a randomised memoryless strategy).
For the pure finite memory strategy, we can simply follow ρ for |C| steps (or until p is reached, whatever happens earlier) followed by n steps of following σ. When n goes to infinity, the expected mean payoff converges to the mean payoff of σ. Since the mean-payoff of σ is strictly positive, the combined strategy achieves a strictly positive mean-payoff already for some fixed finite n, and thus finite memory suffices.
Note that using either of the just defined strategies would result in a finite-state Markov chain with integer costs on the transitions. We can simulate such a model using probabilistic one-counter automata [15], where the energy level is allowed to change by at most 1 in each step, just by modelling an increase of k by k increases of one. Now we can use a result by Brázdil, Kiefer, and Kučera [16] for such a model for the case where it consists of a single SCC (which is the case here, because of the way σ is defined). In particular, Lemma 5.13 in [16] established an upper bound on the probability of termination (i.e. reaching energy level 0) in a probabilistic onecounter automaton with a positive mean-payoff (referred to as 'trend' there) where starting with energy level k. This upper bound can be explicitly computed for any given probabilistic one-counter automaton and energy level k. However, for our purposes, it suffices to note that this bound converges to 0 as k increases. This shows that the probability of winning can be made arbitrarily close to 1 by choosing a sufficiently high initial energy level and using the strategy defined in the previous paragraph. Thus the states in C indeed have limit value 1.

II. MEMORY REQUIREMENTS FOR ε-SAFE STRATEGIES
In this appendix, we discuss the complexity of the strategies needed. First, we show that the strategies for determining the limit values of states are quite simple: they can either be chosen to be finite memory and pure, or randomised and memoryless. For winning limit-surely from a state energy pair, finite memory pure strategies suffice, but not necessarily memoryless ones, not even if we allow for randomisation.
We start by showing the negative results on examples. Figure 4 shows an MDP, where it is quite easy to see that both states have limit value 1. However, when looking at the two memoryless pure strategies, it is equally clear that either (if the choice is to move from s to r) the energy condition is violated almost-surely, or (if the choice is to remain in s), the parity condition is violated on the only run. Nevertheless, the state s satisfies the k-energy-parity objective limit-surely, but not almost-surely, for any fixed initial energy level k. . Energy-parity MDP where a randomised memoryless strategy does not suffice for limit-sure winning for any initial energy level. States s and r have priority 2 and state p has priority 1. Figure 5 shows an energy-parity MDP, where all states have limit value 1, and the two left states have limit value 1 even from zero energy. (They can simply boost their energy level long enough.) Only in the middle state do we need to make choices. For all memoryless randomised strategies that move to the left with a probability > 0, the minimal priority on all runs is almost-surely 1, such that these strategies are almost-surely losing from all states and energy levels. The only remaining candidate strategy is to always move to the right. But, for all energy levels and starting states, there is a positive probability that the energy objective is violated. (E.g. when starting with energy k in the middle state, it will violate the energy condition in k + 1 steps with a chance 4 −⌈k/2⌉ .) To see that finite memory always suffices, we can simply note that the strategies we have constructed work in stages. The 'energy boost' part from Section VII does not require memory on the extended arena (and thus finite memory on the original arena). Further memory can be used to determine when there is sufficient energy to progress to the strategy from Section VI.
The strategy for Section VI consists of reaching A ′ or a positive 2i maximal set almost-surely and then winning limit-surely there. For almost-sure reachability, memoryless deterministic strategies suffice. The same holds for winning in A ′ . For winning in a positive 2i maximal set, the proof of Lemma 21 also establishes that pure finite memory and randomised memoryless strategies suffice.