Edinburgh Research Explorer Dynamic relational contracts under complete information

This paper considers a long-term relationship between two agents who both undertake an action or investment that produces a joint beneﬁt. Agents have an opportunity to expropriate some of the joint beneﬁt for their own use. Agents have quasi-linear preferences. Two cases are considered: where agents are risk averse but where limited liability constraints do not bind, and where agents are risk neutral and subject to limited liability constraints. We ask how to structure the investments and division of the surplus over time to avoid expropriation. In the risk-averse case, the dynamics of actions and surplus may or may not be monotonic depending on whether or not a ﬁrst-best allocation can be sustained. Agents may underinvest but never overinvest. If the ﬁrst-best allocation is not sustainable, there is a trade-off between risk sharing and surplus maximization; surplus may not be at its constrained maximum even in the long run and the “amne-sia” property of pure risk-sharing models fails to hold. In contrast, in the risk-neutral case there may be an initial phase in which one agent overinvests and the other underinvests. Both actions and surplus converge monotonically to a stationary state, where surplus is maximized subject to the self-enforcing constraints.


Introduction
This paper considers a situation where two agents repeatedly engage in joint production. In each period, both agents simultaneously undertake an action or investment that produces a joint output. Agents must also decide how to share the joint output each period. We assume there is a hold-up problem, that is, contracts on actions or the division of the joint output are not enforceable and in addition the outside option of each agent is increasing in the investment of the other agent. We allow joint output and the outside options of the agents to depend on an exogenous state. We consider cases where the agents are risk averse and where they are risk neutral. The only link between periods is a Markov process determining states. There is complete information: apart from the fact that the agents choose their actions simultaneously each period, everything is observable. The only friction is that contracts cannot be enforced. We consider allocations or contracts from which no agent has an incentive to renege by imposing self-enforcing constraints at each date and state. We refer to feasible contracts that satisfy these constraints as dynamic relational contracts. We characterize the Pareto-efficient dynamic relational contracts; we refer to such contracts as optimal contracts.
We impose two simplifying assumptions on our model. First, we assume that agents' preferences are quasi-linear in consumption and actions. This simplifies the problem because with quasi-linear preferences efficient actions (and hence, surplus) are determined independently of the distribution of resources (the marginal rate of substitution between consumption and the action is equal to unity). Second, we impose sufficient conditions such that the constrained Pareto-frontier is concave. This simplifies our problem because it allows us to focus on nonrandom contracts. 1 We examine two main cases: where agents are risk averse but preferences are such that non-negativity constraints on consumption can be ignored, and where agents are risk neutral but consumption is constrained to be non-negative (limited liability).
If agents are risk averse results depend on whether or not it is possible to sustain a first-best allocation for some division of the surplus. If it is possible, convergence to the first best is monotone. Otherwise there might be an initial monotone phase, but in the long-run, when there are two or more states, monotonicity does not generally obtain: when the same state recurs, surplus will sometimes be higher at the later date and sometimes lower. There is also a trade-off between achieving efficient risk-sharing and maximizing current surplus even in the long run. In particular, and in contrast to the risk-neutral case, current surplus is not maximized. Better risk-sharing is achieved by holding the action of one agent inefficiently low because this reduces the outside option of the other agent, that is, it relaxes the latter's self-enforcing constraint. We show that the optimal contract depends on the past history of states and so the "amnesia" property of the risk-sharing limited commitment model does not hold.
When agents are risk neutral, we consider the implications of limited liability constraints and show that optimal contracts involve two phases. In the first phase there is backloading with zero consumption for the constrained agent, who overinvests up to the last period of the backloading phase and the terms of the contract move monotonically in his/her favor. This overinvestment arises because it allows a further transfer of utility to the other agent who consumes the extra output. It occurs despite the hold-up problem, that in a static model would lead to underinvestment. Nevertheless, we demonstrate that because of backloading it is never the case that both agents overinvest-even at different dates-in any optimal contract. The second phase is stationary and independent of the initial conditions. Consumption and investment depend on the state but not on the time period. Each agent has positive consumption and, for a given state, either both invest efficiently or both underinvest. In either case, current surplus is maximized subject to the self-enforcing constraints. Convergence to the stationary phase is monotone in the sense that whenever the same state recurs in the backloading phase, surplus is higher at the later date.

Related literature
A number of results for special or limiting cases of this model are known. First, one-sidedaction versions of this model or variations on it, have been studied by a number of authors (see, e.g., Thomas and Worrall, 1994;Sigouin, 2003;Albuquerque and Hopenhayn, 2004;Kovrijnykh, 2013). Typically, this literature has considered the case where both agents are risk neutral, there is limited liability and the agent taking the action can commit. To prevent the uncommitted agent from taking his/her outside option, actions may be kept low initially. A key insight of this literature is that incentives are improved when payments to the uncommitted agent are backloaded into the future. This provides a growing carrot for adhering to the contract. Consequently, the action or investment of the other agent can be increased in the future. This generates dynamics in the agent's actions as well as in monetary payments. In the long run, actions and transfers converge to a stationary distribution that maximizes the surplus, output less action costs, given the self-enforcing constraints. The speed of backloading is restricted by the limited liability constraints. Ray (2002) has established the most general backloading result of this type. He considers a general, but non-stochastic, principal-agent model in which both parties may take actions. The principal can commit within each period, so the self-enforcing constraint only applies to the agent. He shows that an efficient contract has terms that move in favor of the agent, converging in finite time to the efficient self-enforcing continuation that maximizes the agent's payoff. Our results generalize this backloading result to the case where both agents undertake an action and neither agent can commit. Furthermore, we demonstrate that there may be overinvestment in the risk-neutral case, and in the risk-averse case show that there may be a trade-off between productive efficiency and risk sharing even in the long run. Neither of these properties occur in models where only one agent takes an action.
Second, consider the case where agents have no action to take, or where there is no hold-up problem. In this case, the model involves sharing a stochastic endowment. The case in which agents have their own stochastic endowment and can share risk subject to limited commitment constraints has been widely studied (see, e.g., Kocherlakota, 1996;Ligon et al., 2002;Thomas and Worrall, 1988). A result of this pure risk-sharing case is that a constrained Pareto-efficient allocation evolves toward a stationary distribution, and that, for some parameter values, the distribution of future expected utilities is non-degenerate. Although the distribution is non-degenerate, the solution exhibits an "amnesia" property that once an agent is constrained, the contract from then on is independent of the past history of shocks. With hold-up, the optimal contract depends on the past history of states and does not in general exhibit the amnesia property of the pure risk-sharing model. 2 Furthermore, the pure risk-sharing literature only considers distributional issues and has no implications for the efficiency and dynamics of actions that are the focus of 2 Ábrahám and Lacsó (2018) establish a similar result in a model of risk-sharing model and storage. The absence of the amnesia property is more consistent with the empirical evidence (see Broer, 2013). this paper. Nevertheless, we are able to demonstrate a limit result that as our hold-up problem vanishes, the optimal contract converges to the standard pure risk sharing contract.
Third, there are a very few papers in this limited commitment literature that examine the situation where two or more agents take actions. The most relevant paper to ours is Acemoglu et al. (2011) that considers a model of changes in political power. In Acemoglu et al. (2011) a Markov process determines which risk-averse political party is in power. Political parties take actions that contribute to a common pool of resources whether in power or not, but only the party in power gets to determine the allocation of resources across agents. Therefore, states are identified by the agent in power. It is shown that in a constrained Pareto-efficient allocation, the action of one of the agents (the one in power) is always chosen efficiently and actions of other agents (those not in power) are distorted downward. Furthermore, they establish a convergence result that depends on whether a first-best allocation is sustainable or not: if a first-best allocation is sustainable, then the actions and the division of resources converges to a degenerate (first-best) distribution; otherwise, allocations converge to a non-degenerate distribution (it need not be unique). The twoagent model with quasi-linear utility considered in their paper corresponds to the limiting case of our model where in each state one agent has all the property rights. We also establish convergence results but our results apply for a general distribution of property rights and an arbitrary number of states and may result in the actions of both agents being inefficiently low, even in the long run. Their convergence result, when a first-best allocation is sustainable, corresponds to our Theorem 1(a). In Theorem 1(b), when a first-best allocation is not sustainable, we establish convergence to a unique limiting distribution that is independent of initial conditions. Fourth, our model is related to the broader literature on relational contracting (see, e.g., Levin, 2003;Doornik, 2006;Rayo, 2007) that builds on the work of Macleod and Malcomson (1989). This literature has studied models with more general ingredients (including many-sided actions, enforceable payments, moral hazard, hidden information, and endogenous property rights), but has restricted attention to stationary equilibria, thus, eliminating any interesting dynamics in investments and transfers. The restriction to stationary equilibria is either derived, because stationary contracts are optimal (when agents are risk neutral and in the absence of limited liability), or imposed, because the focus is on organizational structures under which full efficiency can be achieved. Most of this literature is therefore silent on the dynamics of relational contracts that are the main concern of this paper. 3

Illustrative example
To illustrate the model we have in mind, we present a simple example with no uncertainty and risk averse agents. 4 There are two agents with common discount factor δ ∈ (0, 1) and an infinite horizon. In each discrete period the action or effort of agent i is a i and joint output is additive 3 One exception to the focus on stationary contracts is Fong and Li (2017) who introduce limited liability and moral hazard into a risk-neutral model firms and workers based on Levin (2003). They show that if the principal extracts most of the surplus, the backloading of the agent's utility can lead to a probationary contract in which the agent's wage is initially at the lower bound, and incentives are provided by the threat of termination; at some point this threat is removed and the wage increases to a higher level. 4 For the purpose of constructing a simple example that illustrates the solution, we here ignore the non-negativity constraints on consumption. We use parameter values such that the Pareto-frontier is concave. In the Supplementary Material, we show how to fully solve this example using our characterization results and without having to use value function iteration.
Both agents have common preferences satisfying constant absolute risk aversion with coefficient 1/2: where x := c − a, consumption less effort. Actions take place simultaneously at the beginning of each period. At the end of the period, output is realized and it is divided between the two agents. Suppose, that irrespective of how output is divided, agent i can unilaterally get a breakdown consumption of φ i (a 1 , a 2 ) = θ i1 f 1 (a 1 ) + θ i2 f 2 (a 2 ), that depends on the action of the other agent. For parameters θ 11 = θ 22 = 0 and θ 12 = θ 21 = 1, this means that either agent can expropriate all of the other agent's output but if they do so they lose their own output. A relational contract is just an agreed sequence of actions and division of the output from these actions from which no agent has an incentive to deviate. We assume that if a deviation occurs, in each period thereafter the agents revert to short-run Nash equilibrium anticipating the breakdown payoffs φ i (a 1 , a 2 ).
With the specification for θ ij just given, the short-run Nash equilibrium has a i , c i = 0 (and hence, u i = 0) and the discounted payoff from a deviation, the deviation utility, is: We characterize constrained Pareto optimal contracts, that is, within the set from which no agent would deviate. At the first best, a * 1 = a * 2 = 1 and surplus z := y(a * 1 , a * 2 ) −a * 1 −a * 2 = 2 is maximal. This is sustainable provided an equal split of surplus (x * i = 1) is an equilibrium: , so that the first-best allocation is sustainable. Let V i denote the lifetime utility of agent i. For δ > (1 + √ e) −1 surplus will be constant at its efficient level for a range of values for V 1 . Consider starting from a feasible value of V 1 below D 1 (1) = u 1 (2), i.e., worse for agent 1 than the deviation utility at the first-best allocation. If a 2 = a * 2 , then agent 1 would deviate. Therefore, a 2 < a * 2 ; the best contract has a 2 as high as possible, such that agent 1 does not wish to deviate, i.e., V 1 = D 1 (a 2 ). Since we are assuming δ ≥ (1 + √ e) −1 , for the corresponding value of V 2 on the Pareto frontier, V 2 > u 2 (2). That is, agent 2 is unconstrained and a 1 = 1 (is efficient). Since a 2 , and hence, surplus z, is determined by the binding constraint V 1 = D 1 (a 2 ), both can be expressed as functions of V 1 . Hence, in this example: It is easily checked that z(V 1 ) is increasing and concave in this region with z(0) = z (0) = 1 and z(u 1 (2)) = 2 and z (u 1 (2)) = 0. We show below (equation (4.1) in Section 4) that for values of V 1 where the surplus is increasing in V 1 , as here for V 1 < u 1 (2), then V 1 will be higher next period. It follows straightforwardly that V 1 is an increasing sequence converging to u 1 (2). So, the contract converges to the surplus maximizing actions, here the first best. We also show below (see equation (3.2b) in Section 3) that surplus is divided so that u 2 /u 1 is equal to the absolute value of the slope of the (strictly concave) Pareto frontier in the following period. Since V 1 is increasing over time, so too is u 2 /u 1 . 5 That is, the way the surplus is distributed (as well as V 1 ) moves monotonically in favor of agent 1. Thus, backloading of agent 1's utility occurs, and in such a way as to guarantee efficiency in the long-run. The case where δ < (1 + √ e) −1 is similar. There is convergence to a stationary value of V 1 . However, in this case, convergence of V 1 is to a point where the actions maximize the joint surplus subject to the no-deviation constraints. At this value of V 1 , and for some neighborhood around it, both constraints V i = D i (a j ) bind.
Taking both cases together, it can be concluded that with no uncertainty there is convergence to the constrained surplus maximizing actions for any δ.

Plan of paper
The paper extends this example to consider a more general production function and breakdown payoffs. We consider multiple states and quasi-linear preferences including risk-neutrality and non-negativity constraints on consumption. We show that the convergence result of the example generalizes to the case with multiple states when the first-best is sustainable (and also when agents are risk-neutral), but otherwise, with risk aversion and multiple states there is a trade-off between risk sharing and efficiency and convergence to surplus maximization does not occur. In this case we show that u 2 /u 1 converges to a non-degenerate limiting distribution independent of the initial distribution of the surplus.
The paper proceeds as follows. Section 2 describes the model. Section 3 provides some general results that apply to both the risk-neutral and risk-averse cases. Section 4 analyzes the risk-averse case and Section 5 the risk-neutral case. Section 6 concludes. Statements of lemmas and the proofs of theorems are found in Appendix A. Proofs of Propositions and Lemmas are relegated to the Supplementary Material.

Model
We consider a dynamic model of joint production where agents repeatedly undertake an action or investment that generates a joint output. There is no asset accumulation and full depreciation of the investment in each period. Once produced agents have the opportunity to unilaterally expropriate some of the joint output for their own benefit. In this section, we shall describe the economic environment and the set of dynamic relational contracts. We define a game played by the two agents and identify dynamic relational contracts with the subgame perfect equilibria of that game. Our interest is in optimal contracts, that correspond to the set of Pareto-efficient subgame perfect equilibria. 6 5 Convergence of u 2 /u 1 is to e(1 − δ(1 − e)) −2 ≤ 1; convergence is to 1 for δ = (1 + √ e) −1 . 6 More precisely, we focus on efficient pure subgame-perfect equilibria relative to specified "Nash reversion" punishments, although our characterization also applies mutatis mutandis to optimal punishments, should they be different, and hence, to efficient equilibria among the set of all pure strategy equilibria.

Economic environment
Time is discrete and indexed by t = 0, 1, 2, . . . , ∞. At the start of each period, a state of nature s is realized from a finite state space S with n ≥ 1 states. The state evolves according to an irreducible, time homogeneous Markov chain with transition matrix [π sr ], where r∈S π sr = 1, all s ∈ S. The chain starts from an initial state s 0 at date t = 0. We denote the state at date t by s t and the history of states by s t = {s 0 , s 1 , . . . , s t }.
There are two agents, i = 1, 2. At every date t , and after the state at that date is observed, both agents simultaneously choose an action/investment a i ∈ R + . Actions produce an output y s (a) ≥ 0 that depends on the state s and the action pair a := (a 1 , a 2 ) (details are given below in Assumption 2). Having observed actions and output, the agents agree to split output and each consumes non-negative consumption c i , c := (c 1 , c 2 ) ∈ R 2 + . We impose that consumption is nonnegative as a simple way to reflect a limited liability constraint on the transfers one agent can make to the other. Consumption c is feasible if c 1 + c 2 ≤ y s (a). Agent i derives per-period utility u i from net consumption We make the following assumptions on u i and y s : is a twice continuously differentiable, strictly increasing and concave function of net consumption, where x i ≤ 0.
Assumption 2 imposes fairly standard conditions on the production function. The last part of Assumption 2 is a simple way to restrict actions to a compact set A(s). Denote surplus in state s by z s (a) := y s (a) − a 1 − a 2 . Define the first-best action pair a * (s) as the actions that maximize surplus in state s. Given Assumption 2, the first-best action pair exists and is unique. We refer to the surplus z s (a * (s)) as the first-best surplus. Since actions are chosen simultaneously and independently, we also define the conditionally efficient actions a * i (a j , s), i, j = 1, 2, i = j , such that The conditionally efficient actions are single-valued, continuous functions of the other agent's action. 7 The weak complementarity assumption is slightly restrictive but reflects our view that relational contracting framework is most natural when there are complementarities in production. Given the weak complementarity assumption, conditionally efficient action functions are weakly upward sloping. In addition, We now specify what an agent can get if there is no agreement on how to divide up output. If no agreement is reached, agent i gets a breakdown consumption of φ s i (a), and hence, a breakdown utility of u i (φ s i (a) − a i ). An agent can always take the option of receiving her breakdown utility. More formally, we suppose the agents play a Nash demand game to divide output. 8 In this Nash demand game, both agents simultaneously announce consumption claims ( c 1 , c 2 ), c i ≥ 0. If c 1 + c 2 = y s (a), then this determines the division of output: consumption c i = c i . Otherwise, agents receive their breakdown consumption: c i = φ s i (a). The specific assumptions on φ s i (a) are given below, but a simple example with proportional defaults captures what we have in mind. Suppose that each agent can, by defaulting, capture a fraction θ i of the available output y s (a). Here, φ s i (a) = θ i y s (a). We assume that agents cannot obtain more than the available output, so θ 1 + θ 2 ≤ 1. We do not require that the sum exhausts available output. For example, disagreement may incur a cost, such as lawyers' fees or bargaining costs, so that some of output is lost when there is default. In such cases, θ 1 + θ 2 < 1. We assume θ i > 0, so that what an agent gets in the breakdown is increasing in the action of the other agent. This assumption captures the hold-up feature of joint production we wish to model.
As another example, consider the special case with additive production: , 2 (this is very similar to the formulation used by Halonen (2002)). Our hold-up assumption requires θ s ij > 0, i, j = 1, 2, i = j . With this parameterization, assuming 2 i=1 θ s ij = 1 and taking the limit as θ s ij → 0, for i, j = 1, 2, i = j and for all s ∈ S, produces the pure risk sharing model that has been studied by Kocherlakota (1996); Ligon et al. (2002) and others. This is discussed in Section 4.
Analogous to Assumption 2, we shall assume that φ s i (a) satisfies: Assumption 3. For each s ∈ S and i = 1, 2, the function φ s i : R 2 + → R + is continuous, twice continuously differentiable, strictly increasing in both arguments and strictly concave. Moreover, ∂ 2 φ s i (a)/∂a 1 ∂a 2 ≥ 0 (complementarity) and ∂φ s i (0, a j )/∂a i > 1 for all a j ∈ R + , i, j = 1, 2, i = j . In addition, φ s i (0, 0) = 0 for i = 1, 2 and ∂φ s 1 (a) ∂a i + ∂φ s 2 (a) ∂a i ≤ ∂y s (a) ∂a i ∀s and i = 1, 2. (2.1) In the case of proportional defaults, these conditions (apart from ∂φ s i (0, a j )/∂a i > 1) follow directly from Assumption 2. Complementarity in Assumption 3 implies that the reaction functions in the breakdown game are weakly upward sloping, and this simplifies the arguments below. Condition (2.1) requires that the marginal change in the total breakdown consumption from a change in the action of one of the agents cannot exceed the corresponding marginal product. To- for each a and s. Condition (2.1) together with ∂φ s i (0, a j )/∂a i > 1, i = 1, 2 implies that the first-best action pair is strictly positive. The assumption that φ s i is strictly increasing in both its arguments, in particular that ∂φ s i (a)/∂a j > 0 for i = j , captures the hold-up property of the model. Denote the Nash best-response functions (functions because φ s i (a 1 , a 2 ) is strictly concave in a i ) in the breakdown game by The Nash best response function a N i (a j , s) is continuous and weakly increasing in a j . Moreover, we have 0 < a N i (a j , s) < a * i (a j , s) for each a j and every state s ∈ S. It is strictly positive because ∂φ s i (0, a j )/∂a i > 1 and is less than the conditionally efficient action because of the hold-up assumption that ∂φ s i (a)/∂a j > 0. The best-response breakdown utility is . A Nash equilibrium of the breakdown game occurs where the best-response functions intersect (existence follows by standard arguments). Without further assumptions, the Nash equilibrium need not be unique (though it is unique if the defaults are proportional). However, the potential non-uniqueness is not critical because the Nash equilibria can be Pareto-ranked (because the best-response functions are non-decreasing and all Nash equilibria lie below the first-best action pair a * (s)). Henceforth, we let (a NE 1 (s), a NE 2 (s)) denote the dominant Nash equilibrium and all our results apply relative to this dominant Nash equilibrium.

Dynamic relational contracts
We refer to a non-negative action and consumption sequence {a(s t ), c(s t )} t≥0 as a plan. Corresponding to a plan, agent i's lifetime utility is where δ is a common discount factor, 0 < δ < 1, and E denotes expectation. A plan is feasible if i c i (s t ) ≤ y s t (a(s t )) for every history s t and c i (s t ) − a i (s t ) ≥ x i for i = 1, 2 and every history s t .
A dynamic relational contract, or simply contract, is a feasible plan from which neither agent has an incentive to deviate. The incentive to deviate depends on the punishment for deviation. This is given by the breakdown payoffs in the current period (subsequent to the deviation), and by play of the (dominant) equilibrium of the static breakdown game in all future periods. In particular, suppose that a is the current recommended action pair. If agent i is to deviate at t , then the best she can do is to choose a N i (a j (s t ), s t ), which yields a current payoff u N i (a j (s t ), s t ). 9 She is punished from t +1 by "Nash reversion" in which both agents choose their best responses in the breakdown game, that is, both will thereafter play the (dominant) Nash equilibrium of the breakdown game described above. 10 Let D s i (a j ) denote the deviation utility: the best discounted payoff that agent i can get by deviating, given agent j 's putative action a j in state s. It is defined recursively by where D r i (a NE j (r)) is the deviation utility from the play of the Nash equilibrium in state r. Given our hold-up assumption (see Assumption 3), it follows that the deviation utility is continuous, differentiable, strictly increasing and strictly concave in the action of the other agent. 9 Deviation at the output division stage cannot be preferable since breakdown is triggered in either case, and a i may not be optimal in the breakdown. 10 A dynamic relational contract is equivalent to a pure strategy subgame perfect equilibrium relative to future reversion to this Nash equilibrium. Here, strategies are infinite sequences of history-dependent actions and consumption claims. Punishment consisting of immediate triggering of the breakdown, and repeated play of the (dominant) Nash equilibrium of the breakdown game thereafter, is subgame perfect (each agent just demands the whole output after any deviation (i.e., off the equilibrium path), triggering the breakdown game each period).
We stress that replacing the Nash reversion punishments by any state dependent continuation utilities that are no greater than the Nash reversion punishments leaves all the characterization results we derive intact. In particular, optimal punishments satisfy this property. Equally, if agents can take state-dependent outside options at the start of any period, then, provided these outside options satisfy the condition that they are no greater than the Nash reversion punishments, all our results apply. For example, if in periods after a default the breakdown consumptions/utilities were lower than they are in an on-going relationship, then our results still hold.
Since an agent can always take the option of receiving her breakdown utility, the deviation utility provides a lower bound (as a function of the other agent's action) on the discounted utility an agent gets in any dynamic relational contract . Hence, {a(s t ), c(s t )} ∞ t=0 is a dynamic relational contract if it is feasible and if for every s t , and i, j = 1, 2, i = j , ( 2.2) The continuation utility V i (s t ) is the discounted utility that agent i anticipates from the contract after the history s t . The right hand side of (2.2) is the deviation utility agent i gets from deviating from the recommended action after the history s t . We refer to the inequalities (2.2) as the self-enforcing constraints. Whenever (2.2) holds with equality, we say that agent i is constrained.
Otherwise, we say that agent i is unconstrained. Dynamic relational contracts exist. For example, the trivial contract that has a i (s t ) = a NE i (s t ) and c i (s t ) = φ s t i (a NE (s t )) for all s t is both feasible and self-enforcing and therefore a dynamic relational contract. We show below (see Proposition 2) that there exist other non-trivial dynamic relational contracts. 11 Corresponding to any dynamic relational contract, {a(s t ), c(s t )} ∞ t=0 , and initial state s 0 , is a pair of lifetime utilities (V 1 (s 0 ), V 2 (s 0 )). Given the set of dynamic relational contracts, let V s 0 denote the set of the corresponding lifetime utilities. Our objective is to characterize contracts corresponding to the Pareto-frontier of the set V s 0 . We refer to dynamic relational contracts that correspond to this Pareto-frontier as optimal contracts and refer to the corresponding actions as optimal actions. We say that agent i underinvests (or that the action is inefficiently low) at some date t in an optimal contract if the optimal actions are such that a i (s t ) < a * i (a j , s) and say the agent overinvests (or the action is inefficiently high) if a i (s t ) > a * i (a j , s). Given the stochastic history s t , we can treat an optimal contract as a stochastic process for (a, c). We will be interested in the long-run behavior of this process and whether it converges, and if so, whether convergence is dependent on s 0 or V 1 (s 0 ).

Preliminary results
This section establishes some preliminary results on the Pareto-frontier of the set of dynamic relational contracts and optimal actions. Section 4 considers the case where agents are risk averse and Section 5 will consider the case where agents are risk neutral. 11 Intuitively, hold-up creates an inefficiency and provided δ > 0, repeated game arguments allow cooperation to improve on the breakdown Nash equilibrium.

Relationship to the Nash actions
Proposition 1. In any optimal contract (i) actions are never below the Nash reaction functions, a i (s t ) ≥ a N i (a j (s t ), s t ), and a(s t ) ≥ a NE (s t ) > 0 for all s t ; (ii) an agent who is allocated all current output and who is not overinvesting (i.e., a i (s t ) ≤ a * i (a j (s t ), s t )), is unconstrained.
The intuition for (i) is that if the action of one of the agents, say agent 1, were below the Nash reaction function, a Pareto improvement could be found by increasing the action of agent 1 by a small amount. Although the deviation utility of agent 2 increases (by hold-up), his consumption can be increased to prevent a violation of his self-enforcing constraint, and there is sufficient extra output remaining to more than compensate agent 1 for the increase in her action. This property then implies that actions can never be below the Nash equilibrium actions, a(s t ) ≥ a NE (s t ).
Since it can be shown that the Nash equilibrium actions are strictly positive, a N i (a j , s) > 0, it follows that optimal actions are always positive too. Although (ii) is not trivial, it is unsurprising. Suppose, say, that agent 1 is allocated all of the current output. Then, agent 1 is receiving more of output than she would obtain in the breakdown game, if she held her action constant (because, by Assumption 3, agent 2 can claim a positive share of output in the breakdown game). In a deviation, agent 1 will optimize her action, but since she is not overinvesting, reducing her action to the Nash reaction function will only reduce output net of her effort. Hence, she would be worse off than receiving all output at the higher action. The continuation utility cannot be lower than the deviation continuation utility, so a deviation will lead to output being shared and a punishment continuation, worse than the equilibrium path and thus, agent 1 could not be constrained. In fact, we shall show later (see the discussion after the first-order conditions (3.2a)-(3.2c)) that any agent with positive consumption will not overinvest, and therefore the caveat in Proposition 1(ii) about an agent who is not overinvesting can be dispensed with.

Concavity, continuity and differentiability
We define V s 2 (V 1 ) to be the Pareto-frontier of the set V s . It is not necessarily concave; in particular the concavity of the deviation utility D j (a i ) in the action of the other agent implies that the self-enforcing constraints (2.2) may not be satisfied at average actions and hence the constraint set need not be convex. Nevertheless, the Pareto-frontier can be shown to be concave under some additional restrictions. We state and discuss two alternative sufficient conditions for concavity in Appendix A, Assumption A4 and Assumption A5. We use Assumption A4 in Section 4 that considers the case where agents are risk averse. It requires two things: the first is essentially that the curvature of the deviation utility is less than the curvature of surplus as a function of actions. The second is that an optimal contract has x i > 0, i = 1, 2 at every date. The latter follows, for example, for utility functions (such as those with constant relative risk aversion with coefficient of risk aversion greater than or equal to one) where lim x→0 u(x) = −∞. We use Assumption A5 in Section 5 that considers the case where agents are risk neutral. It requires that the production function is more concave than the corresponding deviation utility. It is satisfied in many reasonable examples and Assumption A5 is a generalized version of the condition given in Thomas and Worrall (1994).
In the one-sided action case where only one agent undertakes an action, it is known that the value function can fail to be differentiable (Thomas and Worrall, 1994). It is perhaps surprising, then, that in this two-sided case we are able to establish differentiability. The key observation is that since optimal actions are positive, it is possible to vary both actions simultaneously, holding the future utilities constant, so as to vary V 1 whilst satisfying the self-enforcing and feasibility constraints.

Recursive formulation
We now use a recursive programming approach to examine optimal contracts. It is useful to work with net consumption x i as a choice variable instead of consumption c i . The Markov assumption on the evolution of states and the infinite time horizon, together with the observation that all the self-enforcing constraints are forward looking, means that the set of continuation utilities corresponding to a dynamic relational contract depends only on the state r and is independent of the past history. V s 2 (V 1 ) is characterized as follows: , is a solution to the following program The non-negative Lagrangian multipliers are indicated after each inequality. The expected discounted utility V 1 of agent 1 (in state s) is the state variable in this programming problem. The value function V s 2 (V 1 ) represents the Pareto-frontier of the set of dynamic relational contracts in the space of continuation utilities. It describes how the maximum continuation utility to agent 2 changes as the continuation utility of agent 1 is changed. The inequality (3.1a) is the promise-keeping constraint that requires that the contract delivers at least the current discounted utility. The inequalities (3.1b) and (3.1c) are the self-enforcing constraints corresponding to the inequalities given in (2.2). The constraints (3.1d) and (3.1e) reflect that the continuation utility for agent 1 in state r must lie in the interval [V r 1 , V r 1 ]. Inequalities (3.1f) and (3.1g) are the feasibility constraints.
We denote a solution to [P1] by (a s (V 1 ), x s (V 1 )) and continuation utilities (V s,r 1 (V 1 )). It can be shown that a s (V 1 ) is unique; however, x s (V 1 ) and V s,r 1 (V 1 ) need not be. Corresponding to this solution, and abusing notation, we define the surplus z s (V 1 ) := z s (a s 1 (V 1 ), a s 2 (V 1 )). We discuss the properties of z s (V 1 ) below, but we refer to the maximal value of z s (V 1 ) for V 1 ∈ [V s 1 , V s 1 ] as the constrained maximal surplus and the actions that maximize this surplus as the constrained surplus-maximizing (CSM) actions. Let ā(s) denote the CSM action in state s. 12 If the CSM actions are equal to the first-best actions ā(s) = a * (s) (and hence the constrained maximal surplus equals the first-best surplus), then we say that the first-best is sustainable in state s. We denote the set of states in which the first-best actions are sustainable as S * ⊆ S and denote its complement by S c * (it is possible that S * = ∅ or S c * = ∅). A first-best allocation (FBA) will involve the first best actions, a * (s), in each state and date and complete risk-sharing (that is, net consumption x * (s) with x * 1 (s) + x * 2 (s) = z s (a * (s)) such that u 2 (x * 2 (s))/u 1 (x * 1 (s)) is constant over all states and dates).
An optimal contract is computed recursively. Start from some given initial value for agent 1's lifetime utility, V 1 (s 0 ) in state s 0 . The solution to the programming problem provides optimal values for a(s 0 ) and x(s 0 ) in state s 0 by setting V 1 = V 1 (s 0 ) in [P1]. The solution also determines the continuation utilities for V s 0 ,r 1 (V 1 (s 0 )) in each possible subsequent state r. At date t = 1 and history s 1 = (s 0 , s 1 ), the value for V 1 is determined by the solution for the continuation utility at date t = 0 for the appropriate state and the solution to the date t = 1 programme determines a(s 1 ) and x(s 1 ). The process is repeated to determine {a(s t ), x(s t )} ∞ t=0 . Doing this for each V 1 (s 0 ) ∈ [V s 0 1 , V s 0 1 ] determines the set of optimal contracts.

First-order conditions
From Proposition 2 the Pareto-frontier is continuously differentiable and the range of absolute slopes of the frontier is R + ∪ {∞}. Let σ s (V 1 ) := −V s 2 (V 1 ) and σ + s,r (V 1 ) := −V r 2 (V s,r 1 (V 1 )) be the (absolute) slopes of the Pareto-frontiers, where σ s : [V s 1 , V s 1 ] → R + ∪ {∞} is strictly increasing since the Pareto-frontier is strictly concave. The envelope condition for [P1] is −σ s (V 1 ) = −λ + μ 1 . Using this condition, differentiating with respect to x i , a i and V r in [P1], and rearranging gives the first-order conditions: (3.2c) 12 In principle, there may be dynamic relational contracts in which there are actions that achieve a higher surplus but at the cost of lower future surplus. Our definition considers only optimal contracts. However, in both the risk-neutral and risk-averse cases that we consider below, the two concepts coincide and the CSM actions do maximize z s (d 1 , d 2 ) subject to the self-enforcing constraints. It will also be shown below that in the cases we consider, the CSM actions are unique.
Since the range of absolute slopes of the frontier is R + ∪ {∞}, it is intuitive that σ + s,r (V 1 ) is the same for each future state r ∈ S. To see this first suppose that ν r 1 > 0. In this case V s,r 1 (V 1 ) =V r 1 , σ + s,r (V 1 ) = 0 and by a complementary slackness condition ν r 2 = 0. Then using equation (3.2a), −σ s (V 1 ) − μ 1 = ν r 1 > 0 which gives a contradiction since σ s (V 1 ) and μ 1 are non-negative. A similar argument can be made to show that ν r 2 = 0. Since ν r i = 0, it follows from (3.2a) that σ + s,r (V 1 ) is independent of r and we write σ + s (V 1 ) for this common future value. This property greatly simplifies the dynamics of the contracting problem.
It follows directly from the first-order conditions (3.2c) that in an optimal contract (i) there is only ever underinvestment, a i (s t ) < a * i (a j (s t ), s), if at least one of the agents is constrained; and (ii) if agent i has positive consumption, then he/she does not overinvest, a i (s t ) ≤ a * i (a j (s t ), s). To see the intuition for the first part, suppose that agent 1 is unconstrained. If agent 2 were underinvesting, he could increase investment and generate more surplus. The surplus would be enough to compensate him for the extra investment and agent 1 won't default because she is unconstrained. Thus, it would be possible to find a better contract, yielding a contradiction. Similarly, to see the second part, suppose that agent 1 is overinvesting. Then she could reduce her investment. This relaxes agent 2's self-enforcing constraint (keeping consumptions now and future promises the same). However, output has fallen, so aggregate consumption must fall. If agent 1 has positive consumption, it is possible to keep the consumption of agent 2 the same, while the utility of agent 1 increases because she has cut her investment from above the conditionally efficient level.
There is also a simple corollary to these results: a) both agents cannot be overinvesting (because one agent must have positive consumption); b) an agent cannot be permanently overinvesting because consumption must be positive at some future date -otherwise the self-enforcing constraint would not be satisfied.

Risk aversion
For this section we assume that agents are risk averse: we strengthen Assumption 1 and assume that u i is strictly concave for i = 1, 2, and use Assumption A4 from Appendix A. In particular, it is assumed that net consumption and hence, consumption is strictly positive in an optimal contract. It will follow from this that overinvestment is not a feature of an optimal contract. The allocation of net consumption between agents may vary, potentially considerably, across states even in the long-run. Thus, it is important to examine how allowing for risk aversion affects optimal contracts.

Characterization of optimal contracts
In this sub-section, we consider some properties of the optimal contract and surplus as V 1 varies in a given state, and how the contract is updated period-by-period: in particular, how the ratio of marginal utilities changes from one period to the next. In the following sub-section, we consider the long-run properties of the optimal contract showing that it evolves towards a stationary distribution and study when this stationary distribution does or does not depend on the value of agent 1's lifetime utility V 1 (s 0 ). is maximized, if s ∈ S c * both constraints bind and a s (V 1 ) < a * (s), and if s ∈ S * efficient actions a * (s) are sustainable by definition; (iv)

Proposition 4. With risk-averse agents and under Assumption A4 (i) there is no overinvestment, ∂z s (a(s t ))/∂a
, the (absolute value of the) common slope of the Pareto-frontiers next period, and σ s (V 1 ), the slope of the current Pareto-frontier, satisfy The intuition for (i) was discussed above in Section 3.4 when c i > 0 for i = 1, 2. Properties (ii) and (iii) are illustrated in Fig. 1. 13 Equation (4.1) in part (iv) is fundamental to understanding the dynamics of an optimal contract. It is easy to interpret. Consider a (small) unit increase in V 1 . The effect on agent 2's discounted utility is to change it by approximately V s 2 (V 1 ) = −σ s (V 1 ) units. One way to effect this change (as good as any other at the optimum) is to hold the current utility of agent 1 constant (giving any change in the current surplus to agent 2) and increase V r 1 in each state r, the next-period continuation utilities of agent 1, by 1/δ. The effect on agent 2's current utility is u 2 (x s 2 (V 1 )) × (dz s (V 1 )/dV 1 ). The effect on the discounted continuation utility of agent 2 is to decrease it by σ + s (V 1 ), the same for all future states. The combined effect for agent 2 is u 2 (x s 2 (V 1 )) × (dz s (V 1 )/dV 1 ) − σ + s (V 1 ). Since the overall change in utility for agent 2 is −σ s (V 1 ), we can equate to get equation (4.1).
The implication for the dynamics of optimal contracts is illustrated in Fig. 1. Consider starting from a value of V 1 below the level that maximizes surplus. In this region, agent 1's constraint binds (D s 1 (a s 2 (V 1 )) = V 1 ) and a 2 is kept inefficiently low (a s 2 (V 1 ) < a * 2 (a s 1 (V 1 ), s)) to prevent agent 1 from deviating. In this region, V 2 may be high enough to allow a 1 to be conditionally efficient (a s 1 (V 1 ) = a * 1 (a s 2 (V 1 ), s)) without violating agent 2's constraint, but if s ∈ S c * , then, closer 13 Where χ s 1 and χ s 1 are the values for V 1 such that agent 1's constraint binds for V 1 ≤χ s 1 , while agent 2's constraint binds for V 1 ≥ χ s 1 ; surplus is maximized at χ s 1 in case (b). See the Supplementary Material for further details. Note that both constraints bind for values of V 1 ∈ (χ s 1 , χ s 1 ) in Fig. 1b. This contrasts with pure risk-sharing models with limited commitment, for example, Kocherlakota (1996) or Thomas and Worrall (1988), where at most one self-enforcing constraint binds at any one time in any non-trivial optimum.
to the surplus maximizing value of V 1 , both constraints will bind and a 1 will be inefficiently low. Also, in this region, dz s (V 1 )/dV 1 > 0, so equation (4.1) implies that σ + s (V 1 ) > σ s (V 1 ). In particular, if there is a single state or if the same state recurs, the change in V 1 is as indicated by the arrows in Fig. 1. In this case, surplus will be higher next period as the increase in V 1 allows the extent of agent 2's underinvestment to be reduced, and by enough to offset any increase in underinvestment by agent 1. (We discuss the implications when states switch below.) A symmetric argument applies to the dynamics for high values of V 1 .

Long-run dynamics
To examine long-run convergence, we treat choices at date t as random variables and write x 1 (t) for the random value of net consumption of agent 1 at date t after history s t etc. Define ρ(t) := u 2 (x 2 (t))/u 1 (x 1 (t)) to be the ratio of marginal utilities at date t (ρ(t) = σ (t+1)). In this subsection we focus on the long-run properties of ρ(t).
With more than one state, convergence to constrained surplus maximization may not occur because there is a conflict between risk sharing and surplus maximization. To achieve surplus maximization in state s, the distribution of consumption may differ from that in s = s and therefore, an optimal contract must (dynamically) trade-off risk sharing against surplus maximization.
As already described, there is a (possibly trivial) interval of marginal utility ratios corresponding to maximum surplus in any state s. Let [ρ s , ρ s ] denote this interval in state s. 14 By equation (4.1), the marginal utility ratio is unchanged from the previous period if (and only if) surplus is maximized today (i.e., ρ(t) ∈ [ρ s t , ρ s t ]). Thus, a constant marginal utility ratio requires that := ∩ s∈S [ρ s , ρ s ] = ∅. The set is non-empty when an FBA is sustainable, in which case the ratio is constant. If is not only non-empty but a non-trivial interval, then there are multiple FBAs. Moreover, if an FBA is sustainable, then monotone convergence to an FBA occurs. If however, is empty, or if CSM actions are not always efficient, an FBA is not sustainable and the marginal utility ratio may not converge to a single value. Nevertheless, under a weak regularity condition, it does converge to a unique long-run invariant distribution, independent of the initial conditions.
To describe the evolution of the marginal utility ratio, let F (V 1 (s 0 )) t : R + → [0, 1] denote the distribution function of ρ(t) at date t given the initial value V 1 (s 0 ). This leads us to the following general convergence theorem. 15 Theorem 1. (a) Suppose an FBA is sustainable. Then an optimal contract converges with probability one to an FBA: ||a(t) − a * (s t )|| → 0 and the random sequence {ρ(t)} is (weakly) monotone, with probability one. If there exist multiple FBAs, then the limit FBA depends upon V 1 (s 0 ).
(b) Suppose instead that an FBA is not sustainable. Then, provided π ss > 0 for all s, F (V 1 (s 0 )) t converges weakly to a unique distribution independent of V 1 (s 0 ). Either (i) this distribution is degenerate, in which case dynamics are as in part (a), with stationary limit contract with CSM actions ā(s) in each state, or otherwise (ii) this distribution is non-degenerate, and current surplus is not maximized in the long run: ||a(t) −ā(s t )|| → 0 with probability zero.
In part (a) of Theorem 1, there is convergence to an FBA. There is a (possibly trivial) interval of the ratio of marginal utilities given by the set that are compatible with efficient actions 14 For s ∈ S * , ρ s = σ s (χ s 1 ) and ρ s = σ s (χ s 1 ) and for s ∈ S c * , ρ s =ρ s = σ s (χ s 1 ) (see Fig. 1). 15 We use ||·|| to denote the Euclidean norm. and a constant marginal utility ratio. Convergence will be to the lower endpoint of if the initial marginal utility ratio is below the interval; to the upper endpoint of if initial marginal utility ratio is above the interval; and the sequence of marginal utility ratios will be constant if the initial marginal utility ratio belongs to . The dynamics are similar in Part (b)(i), which considers the case where there is a marginal utility ratio consistent with CSM actions in each state. Convergence is to the CSM actions and to this (unique marginal utility) ratio. This case arises if there is a single state but CSM actions are not efficient. 16 If there are multiple states and CSM actions in each state are inefficient, then this case is possible but not generic in the sense that a small perturbation of either φ s i or y s in any state s will lead to the case of Part (b)(ii). Part (b)(ii) of Theorem 1 provides a description of what happens when there is a conflict between surplus maximization and risk sharing. The optimal contract exhibits a second-best property. The marginal utility ratio ρ(t) does not settle down to a single value, and whenever it differs across two dates t −1 and t , actions at date t will not be CSM. 17 By contrast, in the risk-neutral case, we show that once the stationary phase is reached surplus is maximized in each state by varying the continuation utility to allow the constrained maximal surplus to be achieved (Theorem 2). For example, if the state changes from one in which agent 1 can claim most of output to one in which roles are reversed, sufficient surplus and future utility is reallocated to agent 2 to satisfy his self-enforcing constraint at the CSM actions for that state. However, in the risk-averse setting of part (b)(ii) of Theorem 1, risk-sharing considerations make such an immediate step change undesirable. It is better to hold agent 1's action at the later date inefficiently low, keeping agent 2's default payoff from rising too much, thereby relaxes the latter's self-enforcing constraint meaning that the share going to agent 2 does not rise to that consistent with the CSM actions.
To better understand this dynamic trade-off between surplus maximization and risk sharing suppose to the contrary that the ratio of marginal utilities differs across two dates t − 1 and t , but actions at date t are CSM. Then a simple change in the contract at t −1 and t can produce a Pareto-improvement. Consider the case where ρ(t−1) > ρ(t). Initially hold actions fixed at both dates and increase x 1 (t) by a small amount, but reduce x 1 (t −1) to leave V 1 (t −1) unchanged. If surplus were unchanged at t , this would improve risk sharing and lead to a Pareto-improvement because V 2 (t−1) would increase. However, because x 2 (t), and hence V 2 (t), have fallen, agent 2's self-enforcing constraint may be violated at the initial actions (and will be, if the CSM actions are below the first-best). In order not to violate agent 2's self-enforcing constraint, agent 1's action at date t can be reduced. Correspondingly, agent 2's action can be increased because V 1 (t) has risen. Critically, although this change may reduce surplus at date t , it does so only by a second-order amount since, by assumption, the original actions at date t were CSM. 18 Consequently, a Pareto-improvement results, contradicting the supposed optimality of the original situation. 16 For the single state case irrespective of whether s ∈ S * or not, {ρ(t)} monotone implies {z s (t)} is monotone increasing, converging to constrained maximal surplus, as indicated by the arrows in Fig. 1. 17 Formally, ρ(t −1) = ρ(t) corresponds to σ (t) = σ (t +1), and thus, from (4.1), dz s t (V 1 (t))/dV 1 = 0. Hence, actions at date t are not CSM, as claimed. 18 The change in surplus would be second order when V 1 and V 2 are varied according to the Pareto frontier at t starting from maximum surplus; because the frontier's slope is −ρ(t) at maximum surplus, the change we construct also only has a second-order effect. Also, note that, by construction, the self-enforcing constraints hold at t , and since V 1 (t −1) is unchanged and V 2 (t −1) is increased, they also hold at t −1.

Pure risk-sharing
We now compare our results to the standard limited commitment, two-agent, pure risk-sharing model of Thomas and Worrall (1988); Kocherlakota (1996); Ligon et al. (2002). To do this, for simplicity we consider a special case of our hold-up model with additive production, y s (a) = f s 1 (a 1 ) + f s 2 (a 2 ), and proportional defaults, φ s i (a) = θ s i1 f s 1 (a 1 ) + θ s i2 f s 2 (a 2 ) where θ s ij ≥ 0, i, j = 1, 2, and 2 i=1 θ s ij = 1, j = 1, 2. Our hold-up assumption requires θ s ij > 0, i, j = 1, 2, i = j , all s. Holding technology and preferences fixed, consider the limit case where hold-up vanishes: θ s ij = 0, i, j = 1, 2, i = j , all s. This corresponds to the pure-risk sharing model. In any optimal contract of this limit model actions are clearly efficient, as are actions in the breakdown, so only efficient levels play any role. Agent i's "endowment" in state s is f s i (a * i (s)) − a * i (s) and breakdown utility is u(f s i (a * i (s)) − a * i (s)). We establish that the dynamics of the hold-up model converge to that of the risk-sharing model. In the latter, as is well known, dynamics are summarized in a simple updating rule for ρ(t) (which fixes surplus division given surplus depends only on s). We characterize how the corresponding updating rule in the hold-up model converges to the risk-sharing one as hold-up disappears. One application of this is that it allows us to characterize general properties of the hold-up dynamics for cases where hold-up is low.
From Ligon et al. (2002), the updating rule in the pure risk-sharing case, which we write Moreover, whenever optimal contracts that improve on autarky exist (if there is more than one distinct state, and δ is close enough to 1), each [ρ RS s , ρ RS s ] is non-degenerate (Proposition 2(iv) in Ligon et al., 2002). Likewise, in the hold-up model we can also use (ρ(t − 1), s t ) as the state variable. (By ρ(t −1) = σ (t), this is equivalent to (σ (t), s t ).) Thus, the evolution of the contract can be represented by ρ(t) = h(ρ(t −1), s t ), where h: R + ∪ {∞} × S → R + (see Appendix A for details and characterization). The updating functions h(ρ, s) converge to those of the pure risk-sharing model as the hold-up problem diminishes. Moreover, for ρ(t −1) within the interior of the interval [ρ RS s t , ρ RS s t ], when hold-up is small enough, optimal actions at t are at the first-best levels and so ρ(t) = ρ(t −1). An illustration of this convergence for two states is depicted in Fig. 2. 19 Proposition 5. For each state s ∈ S, (i) for all ρ ∈ R + , h(ρ, s) → h RS (ρ, s) as θ ij → 0, i, j = 1, 2, i = j , all s. (ii) If optimal contracts in the risk-sharing problem improve upon autarky, then for any η satisfying One well-known feature of the pure risk sharing model is the "amnesia" property that once one of the agents is constrained, then the previous history is irrelevant to the future evolution of the optimal contract. This property no longer applies in our model of risk averse agents with actions. Suppose that agent 2's self-enforcing constraint binds at date t . In the risk-sharing problem, this fixes his continuation utility and there is a unique optimal way of delivering this continuation utility independently of past history and, in particular, independently of the previous ratio of marginal utilities. This can be seen in the flat sections of the functions h RS (ρ, s) in Fig. 2. In the hold-up problem, by contrast, agent 2's self-enforcing constraint can be relaxed by cutting agent 1's action. Although this change may reduce surplus, sacrificing surplus can be offset by improved risk sharing and the incentive to do this will vary with the lagged marginal utility ratio. The logic of trading off surplus to improve risk sharing is similar to the explanation given above for why the partial insurance case involves optimal actions that are not CSM, even in the long run. This result is illustrated in Fig. 2 by the fact that the functions h(ρ, s) are upward sloping even away from the 45 • line. Thus, even when an agent is constrained, past history affects the current actions and consumption and the future evolution of the optimal contract. The amnesia property fails.

Risk neutrality
For this section, we use Assumption A5 and suppose that both agents are risk neutral, in particular, that u i (x) = x and that x i = −∞ for i = 1, 2. In this case, the non-negativity constraint on consumption (limited liability) plays a key role. We show that an optimal contract exhibits a two-stage property. It starts with a backloading phase in which one of the agents consumes all of the output. This agent never overinvests, while the other agent overinvests. The second phase is stationary and actions are CSM. Therefore, if s ∈ S * , actions are at the first-best for both agents. If s ∈ S c * , both agents underinvest and have positive consumption. Depending on the initial division of surplus however, the optimal contract might start off in the stationary phase in which case the first backloading phase does not exist.
The lower bound for the deviation utility is strictly positive. Therefore, the Pareto-frontier is defined on s := [V s 1 , V s 1 ] ⊂ R ++ . It can be shown that the frontier is strictly concave if at least one of the self-enforcing constraints is binding. If V 1 is in an interval where the efficient actions are sustainable (such values may not exist), then the frontier is linear with slope of −1 in this interval. In either case, CSM actions are unique.
Consider three (not necessarily disjoint) subsets of s : represents an optimal value for consumption at V 1 . Note that A s ∪ B s ∪ C s = s . Also note that A s can be non-empty and C s empty or vice-versa (examples of this type can be constructed). We know from our previous discussion that if agent 1 overinvests, this can only occur for V 1 ∈ A s , and if agent 2 overinvests, this occurs for V 1 ∈ C s . Also, since optimal actions are positive, output and aggregate consumption is positive, and consequently, it is not possible that both γ i > 0 for the same V 1 . Equally, for V 1 ∈ A s , c 2 > 0, and hence, the multiplier γ 2 = 0. 20 We also know from Proposition 1 that if c 1 = 0, and therefore, that agent 2 gets all the consumption, then agent 2 is unconstrained, and hence, μ 2 = 0. Likewise, for V 1 ∈ C s , γ 1 = μ 1 = 0. Consumption for both agents is positive for V 1 ∈ B s , so that γ 1 = γ 2 = 0.
Consider the subset A s . Using γ 2 = μ 2 = 0, we have from the first-order conditions (3.2a)-(3.2c) that: 1a) it follows that if σ + s,r (V 1 ) < 1, then γ 1 > 0, and ∂y s (a 1 , a 2 )/∂a 1 < 1, so that agent 1 is overinvesting. From equation (5.1b) it follows that agent 2 doesn't overinvest and may underinvest. A similar set of conditions apply for V 1 ∈ C s and imply 1 ≤ σ + s (V 1 ) ≤ σ s (V 1 ) so that agent 1 doesn't overinvest and if σ + s (V 1 ) > 1, then agent 2 overinvests. For V 1 ∈ B s , the first-order conditions show that σ + s (V 1 ) = 1, so there is no overinvestment. As a measure of the extent of overinvestment let ζ s i := max{0, −ln(∂y s (a 1 , a 2 )/∂a 1 )} and ζ s := max{ζ s 1 , ζ s 2 }. Hence, ζ s > 0 if there is overinvestment and is a measure of the distortion of the marginal product below the efficient level. 21 We now state our two-phase characterization theorem. Here, for convenience, we also treat contracts as sequences of random variables, writing a i (t) rather than a i (s t ) etc.

Theorem 2.
In an optimal contract, there is a random time t, 0 ≤t < ∞ with probability one, such that: Stationary phase (t ≥t ): Optimal actions maximize the surplus z s (a 1 , a 2 ) subject to the self-enforcing constraints, and hence, are CSM. The optimal actions depend only on the state s t and are therefore independent of the initial conditions. There is no overinvestment: For s t ∈ S * , therefore, optimal actions and the corresponding surplus are first best: a(t) = a * (s t ) and z s t (a(t)) = z s t (a * (s t )). For s t ∈ S c * , the self-enforcing constraints bind for both agents, c i > 0 for i = 1, 2, and there is underinvestment: a i (t) < a * i (a j (t), s t ) ≤ a * i (s t ) for i, j = 1, 2, i = j . Backloading phase (t <t): Overinvestment declines during the backloading phase: in particular, ζ(t) is weakly decreasing with ζ(t − 1) = 0. Backloading only applies to one agent, i, whose identity depends on the initial surplus split: this agent overinvests and has zero consumption 20 Since the multiplier is unique, the conclusion that γ 2 = 0 is valid even if V 1 also belongs to B s or to C s . The same argument can be made for the other subsets and multipliers. 21 In subset A s , ζ s = − ln σ + s (V 1 ) and in subset C s , ζ s = ln σ + s (V 1 ).
at each t <t − 1. In the final period of backloading, at date t − 1, there is no overinvestment: a i (t − 1) ≤ a * i (st −1 ), but a j (t − 1) < a * j (st −1 ) for j = i. Moreover, if at any two dates t and t > t the same state s occurs, then underinvestment diminishes and surplus increases: ∂y s (a(t))/∂a j ≥ ∂y s (a(t ))/∂a j ≥ 1 for t − 1 ≥ t > t and z s (a(t )) ≥ z s (a(t)) for t ≥ t > t.
For a given value of agent 1's lifetime utility V 1 (s 0 ), there corresponds a value σ 0 . From Theorem 2, we can describe a typical path as follows. Suppose σ 0 < 1 (a symmetric argument applies if σ 0 > 1). Then one of two possible scenarios applies. Either V 1 (s 0 ) ∈ B s 0 or V 1 (s 0 ) ∈ A s 0 . In the former case, t = 1 and the contract moves to the stationary phase in each state at the next period. There is no overinvestment in this case. In the latter case, either ζ 1 (0) = 0 and t = 1 as in the previous case, or ζ 1 (0) > 0 in which case t > 1 and there is a backloading phase in which c 1 (t) = 0 and agent 1 overinvests. Correspondingly, V 1 is sufficiently low that agent 1's selfenforcing constraint binds and agent 2 underinvests to avoid violating agent 1's self-enforcing constraint; by contrast V 2 is high enough that agent 2's self-enforcing constraint is slack. 22 The basic intuition for the backloading result is familiar from other dynamic contracting models. The claim is that if agent 2 is unconstrained and underinvesting, then agent 1 has zero consumption at all previous dates, her payments are optimally backloaded into the future. The idea is that if agent 1 has positive consumption, then backloading her consumption allows her later constraints to be relaxed, which in turn means agent 2 can increase his future investment level without violating agent 1's constraint. Since agents are risk neutral they do not care about the timing of consumption flows (keeping the action plans fixed) if the expected discounted value is the same, but the backloading will permit future surplus to be increased, leading to a Pareto-improvement. Consumption is backloaded to the maximum extent possible, c 1 (t) = 0 throughout the phase, allowing maximum surplus to be achieved as quickly as possible. Furthermore, by increasing a 1 (t) above a * 1 (a 2 (t), s), with the extra output being allocated to agent 2, additional backloading can be achieved, and for a small amount of overinvestment, the reduction in surplus is secondorder. 23 Two novel results in the two-sided environment concerning the backloading phase are the over-investment by the agent whose utility is backloaded (although over-investment does not persist into the stationary phase), and the fact that despite the possibility that property rights might vary radically and persistently between states, only one of the agents will ever be subject to backloading.
That there is overinvestment in the backloading phase is perhaps surprising given the holdup problem and given that the literature, mentioned in the Introduction, that considers the case where only one agent takes an action finds that there is never any overinvestment. Consider the one-sided case with only agent 1 taking an action. If agent 2 gets sufficient of the surplus to allow a 1 to be more than a * 1 without agent 2 wanting to deviate, then the optimal contract will be stationary with a 1 = a * 1 . The benefit from overinvestment is that it allows more backloading of agent 1's utility when c 1 = 0. In this case however there is no benefit, but an efficiency 22 This characterization applies so long as ζ 1 (t) > 0 and assuming agent 1's self-enforcing constraint binds with a positive multiplier. With more than one state, we cannot rule out the possibility that in some states deviation utilities are so low that the self-enforcing constraints may not bind even when σ (t) < 1. In this latter case, from (3.2c) and (3.2a), a 2 (t) = a * 2 (a 1 (t), s) and σ + (t) = σ (t). 23 The incentive to overinvest diminishes over time (as can be seen from (5.1a), σ + (t) approaches 1). Equally, if the same state recurs along the path, underinvestment diminishes as the self-enforcing constraint is relaxed. The combined effect is that surplus z s (a(t)) increases, and reaches a maximum when σ (t) = 1. cost, and backloading can only increase agent 2's incentive to renege in the future, potentially necessitating lower (inefficient) future actions by agent 1. Thus, there is no overinvestment.

Conclusion
In this paper, we have analyzed the dynamic properties of a relational contract between two agents both of whom undertake a costly investment or action that yields joint benefits. We have shown that optimal contracts exhibit different properties depending on whether agents are risk neutral or risk averse. In the risk-neutral case, actions may be either above or below the efficient level and actions and the division of the surplus converge monotonically to a stationary solution at which actions are constrained surplus maximizing (either both are first-best or both are below the first-best level). In the risk-averse case, we also establish a convergence result but convergence may or may not be monotonic depending on whether it is possible to sustain a first-best allocation or not. We have demonstrated that the optimal contract converges to the pure-risk sharing results of Kocherlakota (1996) as our hold-up problem vanishes.
In the risk-averse case there is an interesting trade-off between hold-up and risk-sharing. The hold-up problem creates an opportunity to relax the default constraint by lowering actions. This in turn allows more risk-sharing to be achieved without leading to default. It would be interesting to evaluate whether the gain in risk-sharing would ever be sufficient to offset the loss in surplus created by the original hold-up problem. This is a difficult question because without additional structure to the model little can be said about the long run distribution of the optimal contract. It is convenient in analyzing the recursive problem to change variables and use the deviation utilities of the two agents instead of actions. Let d j := D s i (a j ); by Lemma 3 dD s i (a j )/da j > 0, and we let g s i (d j ) := (D s i ) −1 (d j ). Abusing notation, surplus is z s (d 1 , d 2 ) := z s (g s 2 (d 1 ), g s 1 (d 2 )), with output y s (d 1 , d 2 ) defined similarly. Given the properties of D s i (a j ) (Lemma 3), the functions g s j (d i ) are continuously differentiable, strictly increasing and strictly convex. Let D(s) := {(d 1 , d 2 ) = (D s 2 (a 1 ), D s 1 (a 2 )) | (a 1 , a 2 ) ∈ R 2 + }. The contract {d(s t ), x(s t )} ∞ t=0 is feasible if i x i (s t ) ≤ z s t (d(s t )) (total consumption does not exceed output) for every history s t , and for actions and consumption to be non-negative, it must also satisfy d(s t ) ∈ D(s t ) for every history s t and x i (s t ) + g s t j (d i (s t )) ≥ 0 for i, j = 1, 2, i = j and every history s t . We define d * i (s) = D s j (a * i (s)), i = j , d * i (d j , s) = D s j (a * i (g s i (d j ), s)), etc. Problem [P1] can be reformulated with d ∈ D(s) replacing a ≥ 0 as a choice variable, the RHS of (3.1b) and (3.1c) being d 2 and d 1 respectively, a i in (3.1f) being g s j (d i ) and the RHS of (3.1g) being z s (d 1 , d 2 ), with solution denoted by (d s (V 1 ), x s (V 1 )). 24 With the change in variables, the first-order condition (3.2c) becomes μ j 1 + μ 2 = ∂z s ∂d i u 2 (·) + γ 2 1 + μ 2 + g s j (d i ) γ i 1 + μ 2 i, j = 1, 2 and i = j. (A.1) To establish concavity of V r 2 (·) we give two alternative assumptions. Under either Assumption A4 or Assumption A5, the Pareto-frontier is concave on [V s 1 , V s 1 ] (Proposition 2(i)). Under Assumption A4, it is easily checked that the constraint set is also convex. 25 Although Assumptions A4 and A5 are not directly on primitives of the model (because they are specified in terms of the deviation utility and an endogenous variable for Assumption A4), it is easily checked that there are natural parameterizations of the model where these assumptions are satisfied. For example, Assumption A4 is satisfied provided that agents are not too risk averse. For example, consider the case where preferences exhibit constant absolute risk aversion with coefficient α > 0, the same for both agents, and the production function is separable and given by y s (a 1 , a 2 ) = (β) −1 ((a 1 ) β + (a 2 ) β ) where β ∈ (0, 1). Furthermore, suppose each agent can 24 The linear independence constraint qualification holds unless the constraints (3.1f) are inactive and u 2 (∂z s /∂a 1 )(dg s 2 (d 1 )/dd 1 0 = 1. This constraint qualification can fail, but it only fails at V 1 =V s 1 where the slope of the Pareto-frontier is infinite (examples where the constraint qualification fails at this point can be constructed). Thus, apart from V 1 =V s 1 , the linear independence constraint qualification holds and the Lagrangian multipliers in the firstorder conditions (reported in sub-section 3.4) exist and are unique. We can also ignore points V 1 =V s 1 without loss of generality: if V 1 (s 0 ) <V s 0 1 , then we will show that V 1 =V s 1 for any state s; if V 1 (s 0 ) =V s 0 1 , then it will be possible to reformulate the problem maximizing the utility of agent 1 for a given V 2 for agent 2 and the relevant constraint qualification will be satisfied. 25 It can also be checked that if [P1] is written with c and d as choice variables, then a sufficient condition for convexity of the constraint set is that y s (d) is concave in d. This condition is more stringent than concavity of z s (d) and will fail in a number of natural cases. expropriate a proportion θ of output in the case of default. Then a sufficient condition for the assumption to be satisfied for θ ∈ (1/e, 1/2] is if α < −eθ(1 − θ) −1 log θ , and for θ ∈ (0, 1/e] if α < (1 − θ) −1 . Equally, suppose that agents are risk neutral with u i (x) = x, production is additive and the breakdown consumption in each state is φ i (a) = θ i1 f 1 (a 1 ) + θ i2 f 2 (a 2 ), where for notational simplicity the dependence of θ , f etc. on s is suppressed. With this specification for φ i (a), D j /D j = f i /f i , and it can be checked that Assumption A5 is satisfied.
Lemma 5. Under Assumptions 1-3 and under either Assumption A4 or Assumption A5, d s i (V 1 ) is a continuous function of V 1 for each s ∈ S and i = 1, 2.
Lemma 6. Under Assumptions 1-3, and for i, j = 1, 2, i = j , for any history s t , (i) if V i (s t ) > d j (s t ), then a j (s t ) ≥ a * j (a i (s t ), s t ); (ii) if c i (s t ) > 0, then a i (s t ) ≤ a * i (a j (s t ), s t ).