Abstract

This article considers a stationary economy populated with overlapping generations that reproduce identically in continuous time. Each dynasty has a productivity and an opportunity cost of going to work that vary with age. Labor supply is extensive. At each date, the typical agent can either work full time or not work at all. The decision to work is based on a comparison between after tax income and the privately known opportunity cost of work. We assume that the utilitarian government, which aims at redistributing lifetime utility across dynasties, has a single policy instrument, a stationary income tax schedule function of current income. The article develops a method to study the government problem. This technique is applied to derive the properties of the optimal income tax schedule in a number of examples.

1. Introduction

We study income taxation in a dynamic stationary economy with overlapping generations, made of dynasties with fixed lifetime that reproduce identically. Time is continuous. Labor supply is extensive. At each date, one can either work full time or not work at all. The typical agent is characterized by an invariant instantaneous utility function for consumption and deterministic profiles of productivity, that is, production when working full time, and pecuniary cost of going to work. For simplicity, we assume that these profiles are deterministic. The government aims at redistributing lifetime welfare across dynasties, using the income tax. We put a number of restrictions on the government instrument, based on casual observation of developed economies: we suppose that the tax schedule is invariant over time and that tax depends only on current labor income. It cannot depend on the pecuniary cost of going to work, which is implicitly not verifiable by the tax authority. All non-workers are identically treated (they pay the same amount or receive the absolute value of the transfer if tax is negative). This transfer is independent of the (potential) productivity, supposed not to be verifiable for non-working individuals. We derive the properties of the optimal tax system in this setup.

Its dynamic structure makes the article unlike the bulk of the optimal taxation literature that takes place in a static model; see, for example, the survey of Piketty and Saez (2013). Also, it is different from the intensive labor supply models that follow Mirrlees, where the agents only differ by their productivities along a single dimension of heterogeneity. It shares features of the extensive model, notably the heterogeneity in the cost of going to work, which may generate an optimal subsidy of the labor supply of the low-skilled agents as in Choné and Laroque (2005).

The dynamic setup is similar (overlapping generations, deterministic trajectories,…) to that used by a number of recent works, Rogerson (2011), Shourideh and Troshkin (2012), or Weinzierl (2011). Our contribution with respect to these studies is in our focus on the extensive margin and the treatment of the two-dimensional heterogeneity on productivity and the cost of going to work.

To be able to derive the properties of the optimal tax system, we rule out tax schemes that depend on the history of earnings, an assumption which we feel justified by casual observation. This makes our analysis very different from that of Brito et al. (1991) and more recently the dynamic public finance literature; see, for example, Kocherlakota (2005, 2010). These works are interested in the dynamic revelation of information. We stick to a simpler redistributive problem where the nonlinear tax is constrained to be a function of current income. We also do not allow income tax to depend on age, contrary to Weinzierl (2011). Again our justification is casual observation. A natural way to have transfers conditional on age would be to introduce pensions. This is outside the scope of the current article and should be the subject of future work.

The main contribution of the article is our characterization of the optimal tax scheme in this dynamic extensive labor supply environment. Indeed, the optimal tax can be described as the result of a balance between two forces: a redistributive force, holding labor supply constant, and an efficiency force, which comes from changes in labor supply and production. This generalizes the first-order conditions from Laroque (2011). There is a lot of bunching at the optimum: the tax scheme often is piecewise constant, and there are regions where the marginal tax rate is 100%. This is due to the extensive character of labor supply: what is important is whether to work or not to work, the average and not the marginal tax rate. Otherwise the model is too general to lead by itself to definite policy implications. Given the trajectories of the productivity and opportunity cost of work of the dynasties in the economy, given also the tastes for redistribution of the government, our analysis allows to compute the optimal tax schedule. Without further restrictions on the parameters of the economy, the tax schedule however is largely unconstrained. An interesting feature, for policy purposes, is the treatment of low-skilled agents whose labor supply may be distorted downward as in the traditional Mirrleesian intensive models or distorted upward, justifying the American Earned Income Tax Credit (EITC) and the introduction of negative taxes.

We have investigated in more depth, economies with two types of agents. We show that the optimal allocation dramatically changes depending on the source of the heterogeneity. When the agents primarily differ by their opportunity costs of work, the income tax schedule solves a standard equity/redistribution trade-off with each of the two types of agents having their labor supply distorted downward. By contrast, when the agents primarily differ through their productivity, the tax schedule is used to subsidize low-skilled work, and hence generates upward labor supply distortions. The logic at work is reminiscent of Choné and Laroque (2011). Here, however, the mechanism comes from the low-skilled agent being the only one working at low productivities, due to the form of her life cycle trajectory, rather than to the shape of the distribution of the social weights in the population.

The article is organized as follows. Section 2 presents the model. Section 3 deals with optimal taxation, introducing the efficiency and redistributive forces. Finally, Section 4 studies in detail the properties of optimal tax in the case where there are two types of agents in the economy.

2. Model

We consider an economy in continuous time. All agents have the same life length, normalized to one. The types i of the agents belong to a set I. Agent of type i has a lifetime utility function of the form
where ui is an increasing concave function and ci(a) denotes consumption at age a. Agent i, if she works at age a, produces at most wi(a) units of a single homogeneous good. Going to work on the market, and therefore producing wi(a), has an opportunity cost for the agent, for instance because it takes time otherwise devoted to family gardening or to childcare. This cost varies with age along the life of agent i. We assume that this dependence with respect to age is deterministic and known. Formally the opportunity cost to work of agent i is pecuniary, measured in units of good, and represented with the function δi(a).

The type of an agent is thus characterized by a couple of exogenous, nonnegative functions (wi(·),δi(·)) defined on [0,1] and by the instantaneous utility index ui(·). The pair (wi(a),δi(a)) as the age a varies determines a curve in the (w,δ) space, which we call a ‘trajectory’. We assume that the functions wi,δi, and ui are differentiable. The economy potentially exhibits a lot of heterogeneity.

At each date t, for each i in I, the economy contains a continuum of agents of type i of all ages a in [0,1]; overtime the older agents die and are replaced by newborns of the same type. Cohort i has size ni, with Ini normalized to 1, and the economy is stationary. An ‘allocation’ specifies the nonnegative consumption ci(a) and the labor supply i(a) in {0, 1} of all types i along their lives.

Furthermore, we assume that there are perfect financial markets for transferring wealth across time, with a zero interest rate. The agents use these markets to smooth their consumption overtime, ci(a)=ci independent of age. From now on, we restrict our attention to allocations where consumption is constant and equal to its aggregate value over the lifetime ci.

2.1 Feasibility

An allocation is ‘feasible’ if and only if total consumption does not exceed total output net of production cost:
(1)
An allocation is ‘efficient’ whenever output net of production costs is maximized, that is, any agent works whenever her opportunity cost of work is lower than or equal to her productivity, i(a)=1 if δi(a)<wi(a) and i(a)=0 if δi(a)>wi(a).

2.2 Utilitarian optimum (first best)

The utilitarian optimum is the allocation that maximizes Iniui(ci) subject to the feasibility constraint (1). It is the feasible efficient allocation such that marginal utilities are equal:
for all i in I.

2.3 Laissez-faire

The agents maximize their lifetime consumption
They decide to work whenever their productivity is larger than their opportunity cost of work,1 so the laissez-faire equilibrium is efficient. In general, laissez-faire yields an allocation that differs from the utilitarian optimum.

‘In all of the article we suppose that the utilitarian government observes the employment status of the agents and, when they work, their productivity w. It never observes the pecuniary cost δ, which is private information.’

2.4 Income tax

We study redistributive taxation in a setup where the tax schedule is assumed to be age independent and time invariant. The tax schedule is made of a function R(w), the after-tax income of a worker with before tax wage w, and of a scalar s equal to the subsistence income of the non-workers.

2.5 Second-best program

Facing the tax schedule (R(·),s), the consumer chooses her labor supply (a), so as to maximize her lifetime utility, that is,
where (a) belongs to {0, 1}.
Feasibility then can be written in two equivalent ways, either as a balanced government budget:
(2)
or as the equality of aggregate production and aggregate consumption:
(3)
The second best allocation maximizes the sum of utilities under the above constraints.

3. Optimal Income Tax

When an agent has productivity w at some date, her financial incentive to work is equal to R(w)s, which is to be compared with the opportunity cost of work δ. It is useful to represent the financial incentive to work in the same plan as the individual trajectories (w(a),δ(a)). Hereafter, the ‘incentive schedule’ is the curve (w,R(w)s) as productivity varies. An agent works in regions where her trajectory is located below the incentive schedule, that is, her opportunity cost of work δ is smaller than the financial incentive to work R(w)s. Her work status changes at points where her trajectory crosses the incentive schedule.

Assuming that the agents can choose occupations requiring skills below their own ability, no one would choose an occupation whose required productivity belongs to a decreasing part of the function R, preferring to produce less and to earn a higher after-tax income. Formally, we can replace any function R with R(w)=maxwwR(w). It follows that, without loss of generality, we limit our attention to functions R that are nondecreasing and assume that workers work at full productivity. Lifetime consumption of agent i is therefore given by
(4)
with
For notational simplicity, we do not mention the policy instruments R and s in the arguments of the labor supply functions i. The Lagrangian of the problem reduces to
where λ is the multiplier of the government budget and Yi(i) is agent i’s lifetime net output:
The problem is to find the tax instruments (R(·),s) which maximize the Lagrangian subject to the constraint that R(·) be nondecreasing. An equal translation of R(·) and of the subsistence income s, which does not alter labor supply, yields the first-order necessary condition: Iniui(ci)=λ.

The Lagrangian depends on the tax schedule through two channels: consumption levels ci and labor supplies i. Hereafter, we label ‘redistribution force’ and ‘efficiency force’ the effect of R through these respective channels. The first force is present at all productivity levels, while the second is active only at points w where an agent is indifferent between working and not working. Formally, we compute the Fréchet derivatives of the Lagrangian of the government problem, seen as a functional that maps the set of functions R into R. To this aim, we evaluate the Lagrangian at a slightly perturbed function R+ɛh, compute the ratio [(R+ɛh)(R)]/ɛ, and let ɛ tend to zero. A mathematical derivation of the limit can be found in Appendix A. Here we present a heuristic approach of the differentiation.

3.1 The redistribution force

This force comes from the dependence of lifetime consumptions on the after-tax schedule. Suppose we replace the after-tax income R with R+ dR on the interval [w,w+ dw], with  dw>0. This change in after-tax income translates into a change in consumption for the agents who work at productivity levels in [w,w+ dw]. The change in agent i’s lifetime consumption is given by
(5)
where Ti(w;i) denotes the time spent by agent i with worktime profile i working in a productivity lower than or equal to w
(6)
and, accordingly, its derivative  dTi(w;i) represents the time spent by agent i working in a productivity between w and w+ dw.

By construction, Ti(w;i) is a nondecreasing function of w. The limit of Ti(w;i) as w goes to infinity is the total time agent i works over her life cycle, hereafter denoted Li.

The derivative of Ti(w;i) with respect to w,  dTi(w;i), is a positive measure which is almost everywhere continuous, possibly having mass points at productivity levels where agent i spends non-infinitesimal periods of time. If we think of agent i’s productivity when she works as a random variable, the probability measure  dTi(w;i)/Li can be thought of as the distribution of that random variable. Suppose agent i’s trajectory crosses the incentive schedule from below at w0, that is, the agent works for ww0 and does not work for ww0 along the trajectory. Then  dTi has a downward discontinuity at w0, and Ti has a concave kink at w0. If the trajectory crosses the schedule from above, then the kink of Ti is convex.

By the chain rule, the variation of the Lagrangian coming from the changes in lifetime consumptions is given by  dℒ= dΦ(w;), where Φ(w;) is the social marginal utility of income (net of the cost of public funds) for workers with productivity below w:
(7)
The term  dΦ(w;) reflects the redistributive force. Redistribution induces the government to raise (lower) after-tax income in regions where  dΦ(w;)>0 ( dΦ(w;)<0). The observation that λ is the average of marginal utilities yields the following result. 

Lemma 1. The net social marginal utility of income of workers with productivity below w, Φ(w;), has the same sign as the correlation between marginal utilities ui(ci) and working times Ti(w;i).

3.2 Labor supply elasticity

A change in the tax schedule may also affect labor supply. We say that there is ‘indifference’ at w if there exists an agent i, having productivity w at some age ai,w=wi(ai), who is indifferent between working and not working at this age, that is, R(w)δi(ai)=s. A ‘switch point’ is an indifference point such that the work status of the indifferent agent changes in a neighborhood of w, that is, the trajectory of agent i crosses the incentive schedule at w. When the slopes of the tax schedule and of the trajectory are different, R(w)δi/wi, the quantity
(8)
is positive and finite.
Consider a switch point w and replace R with R+ dR on the interval [w,w+ dw], with  dR=(δi/wiR) dw, as shown on Figure 1. (In the represented example, the trajectory is decreasing in the (w,δ) space; specifically the agent’s productivity and cost of work, respectively, decline and rise with age.) The perturbation changes the status of the agent on the interval from working to non-working. The time spent in the interval is
hence, ηi is the absolute value of the derivative of labor supply with respect to the tax schedule R. When Rs increases by 1%, the time agent i spends working at a productivity below w is increased by ɛi(w;R) percent, where ɛi(w;R) denotes the elasticity of agent i’s labor supply, Ti(w;i), with respect to financial incentives to work:
(9)
The labor supply elasticity depends on both the gradient of the trajectory and the slope of the after-tax schedule at the switch point. In particular, the steeper the tax schedule at w, the lower the elasticity, because the agent spends less time in the region affected by the perturbation.
Labor supply elasticity. Original (perturbed) schedule: solid (dashed) line.
Figure 1

Labor supply elasticity. Original (perturbed) schedule: solid (dashed) line.

The above formula is readily adapted if agent i’s trajectory crosses the tax schedule more than once. Formally, the Fréchet-derivative of Ti(w;i) with respect to the tax schedule R is a positive measure made of mass points at agent i’s switch points below w, see equation (16) in Appendix A.2. Similarly, the elasticity of the aggregate labor supply, T(w;)=IniTi(w;i), is given by
(10)
where S(w) is the set of agents who switch at w.

3.3 The efficiency force

A marginal change of the incentives to work,  d(Rs), on a small interval around a switch point w of agent i has only a second-order effect on her permanent income because she is indifferent between working and not working at this point. Such a change, however, affects the net output she produces over her life cycle:
(11)
where δ=R(w)s is agent i’s cost of work at the switch point. At the same time, the change affects the government revenue. For instance, if  dR>0 and w>Rs, the variation of the tax schedule induces the agent to switch from not working to working on a short period of her life, which raises the government revenue. We define the efficiency force as
We show formally in Appendix A that this force is a discrete measure concentrated on the set of all switch points
(12)
where S is the set of all agents’ switch points, wσ is the productivity level at σ, and ɛ(w;R) is the total labor supply elasticity given by (10). The previous analysis is summarized in the following proposition. 
Proposition 1. The Lagrangian is differentiable at any point (w,R(w)s) where no trajectory is tangent to the incentive schedule. Its derivative can be written as the sum
(13)
where the almost everywhere continuous measure  dΦ(w;) given by (7) and the discrete measure  dΨ(w;) given by (12) represent the redistribution and efficiency forces.

Raising R at an indifference point increases labor supply, which alleviates the government budget constraint if w>Rs and makes it more stringent if w<Rs. Hence, income maximizing pushes the government to raise (lower) after-tax income in regions where w>Rs (w<Rs). This force translates into mass points in the derivative of the Lagrangian or even into discontinuity points in the Lagrangian function.

On the other hand, the redistributive force, expressed in the term (7), is absolutely continuous (except at productivity levels where some workers spend a finite time): the redistributive effect of an increase in the after-tax income on an interval of productivities is the integral on the interval of the net social marginal utility of income  dΦ.

3.4 Finite number of types

The above analysis allows to concentrate attention on a particular class of tax schedules when I is finite:

‘When the number of types is finite, the second-best optimum may be implemented with an incentive schedule that is piecewise either constant or coincident with an increasing trajectory.’

The proof is in Appendix B. When the tax schedule coincides with an increasing trajectory, the government faces a particularly strong efficiency force.2 Otherwise, the monotonicity constraint binds. Putting the signs of  dΦ(w;l) and  dΨ(w;l) on the diagram of trajectories allows to qualitatively separate intervals of productivities where the redistribution and efficiency forces tend to push R up from those where these forces are downward.

Since we expect bunching to be the norm, it is worthwhile to spell out the form of first-order conditions under bunching. Consider a bunching interval [w0,w1]. We can raise or lower R on the whole bunching interval, raise it on right subintervals [w,w1], and lower it on left subintervals [w0,w]. None of these variations should increase the Lagrangian, which yields the first-order conditions:
(14)
for all w in the interval, with equality for w = w0. This implies in particular that  dℒ is nonnegative at w0 and non-positive at w1.

4. An Example: Two Types and Decreasing Trajectories

The results of the previous analysis greatly simplify the computation of the optimal tax schedule in economies covered by our assumptions. We can derive stronger analytical results when the environment is simpler. In practice, in this last section, we consider economies with two types of agents, a high type H and a low type L, endowed with the same utility function u. To adapt arguments based on incentives, we need the trajectories to be single crossing, which pushes us to focus on the second parts of lives, when productivity decreases with age while the opportunity cost of work increases with age. We also need the two types to be unambiguously ranked, the high type being more productive on the market and with a smaller opportunity cost of work than the low type, at all ages: wH(a)>wL(a) and δH(a)<δL(a) for all a in [0,1]. We also suppose that there is a natural retirement age: the trajectories intersect the 45 degree line.3
The three cases of Proposition 2.
Figure 2.

The three cases of Proposition 2.

This set of assumptions is consistent with many different patterns. If the agents’ productivities are very close while their opportunity costs of work are very different, agent L’s trajectory is above agent H’s in the (w,δ) plan; see Figure 3. In the opposite case, agent H trajectory lies at the right of that of agent L; see Figure 4. The trajectories may very well cross, possibly many times, meaning that the same characteristics (productivity, cost) are reached by the two agents at different ages. Formally, the following properties hold. 

Assumption 4.1 (Decreasing trajectories). The two agents have the same utility functions u. Their productivities, wH(a)>wL(a), decrease with age and their pecuniary costs of work, δH(a)<δL(a), increase with age. There exist ages aL* and aH* in (0, 1) such that wL(aL*)=δL(aL*) and wH(aH*)=δH(aH*).

Two types, same productivities.
Figure 3.

Two types, same productivities.

Two types, same opportunity costs of work.
Figure 4.

Two types, same opportunity costs of work.

The fact that type H dominates pointwise type L implies that its consumption and welfare are at least as large, whatever the tax schedule, cHcL. Any nondecreasing tax schedule crosses each trajectory only once, respectively, at ages aH and aL, aHaL, with associated wages wH(aH) and wL(aL) and opportunity costs of work δH(aH) and δL(aL). The wages wH(aH) and wL(aL) represent the lowest productivities at which the agents work. The following proposition, whose proof is in Appendix C, provides the list of all possible configurations at the second-best optimum. Then we present two examples that illustrate how unobserved heterogeneity affects the labor supply distortions. 

Proposition 2. Under Assumption 4.1, the following properties hold:

  • There exists an optimal tax schedule with at most two values;

  • Agent H has her labor supply distorted downward;

  • Agent L labor supply can be distorted in any direction or undistorted;

  • Agent H retires later and enjoys higher lifetime consumption than agent L:

(15)
We now illustrate the impact of the heterogeneity on labor supply distortions, see point (ii) and (iii) of Proposition 2. We use two examples where the agents differ only in one dimension, either productivity or opportunity cost of work. These examples, therefore, are at the limit of what is permitted by Assumption 4.1. Again, we focus on the second part of the agents’ lives where productivity decreases and opportunity cost of work increases, as in the solid lines of the left panels of Figures 3 and 4. 

Example 1 (Same productivities, different opportunity costs of work). In addition to Assumption 4.1, suppose that the agents are equally productive, wH(a)=wL(a) for all a, while agent H has a lower pecuniary cost of work: δH(a)<δL(a) for all a. Then at the optimum, both agents have their labor supply distorted downward.

In Example 1, we must be in case (1) of the proof of Proposition 2. Moreover, the configuration with δL(aL)=δH(aL) is not possible here. Indeed, as the two agents would work exactly the same time at productivities above any threshold w, an increase of the tax schedule above w(aL)ɛ for a small ɛ>0 would have no redistributive effect and a positive efficiency effect at w(aL)—a contradiction. The optimal schedule is therefore discontinuous at w(aL) as shown on the left panel of Figure 5. 

Example 2 (Different productivities, same opportunity costs of work). In addition to Assumption 4.1, suppose that the agents have the same pecuniary costs of work, δH(a)=δL(a) for all a, while agent H is more productive: wL(a)<wH(a). Then at the optimum, agent L has her labor supply distorted upward.

The optima with same productivities (left), same opportunity costs of work (right).
Figure 5

The optima with same productivities (left), same opportunity costs of work (right).

In Example 2, we must be in case (2) of the proof of Proposition 2. Moreover, the configuration where the tax schedule is flat is not possible here. Indeed, the equality δL(aL)=δH(aL) would imply aL = aH, meaning that the two agents would have the same total working time: a uniform increase of R − s would thus have no redistributive effect and a positive efficiency effect at wH(aH)—a contradiction. The optimal schedule is therefore discontinuous at wL(aL) as shown on the right panel of Figure 5.

In Example 1, the heterogeneity primarily comes from the opportunity cost of work, while in Example 2, it comes from the productivity. In both cases, the government cannot implement the first best in these two-type economies. The direction of the distortions, however, is sensitive to the source of the heterogeneity.

1 Suppose that δi(a) is a disutility cost instead of a pecuniary one, i.e. agent i, when working, produces wi(a) and has instantaneous utility u(ci(a))δi(a), while she has instantaneous utility u(ci(a)) when not working. Then agent i works at age a under laissez-faire if and only if ui(ci)wi(a)>δi(a), where ci is her constant, instantaneous consumption level. Hence, this specification entails an ‘income effect’ in labor supply: participation decreases with ci. Using Pareto-optimality conditions, it can be checked that laissez-faire is efficient. The pecuniary model adopted in this article avoids these complications.

2 See the last paragraph of Appendix B.

3 A referee has asked us to what extent the results below can be generalized. We certainly can have more than two types. But the monotony and ranking of the dynasties are an essential element of the argument.

4 In other words, the Lagrangian is locally discontinuous in productivity regions where the tax schedule is locally tangent to a trajectory, see Appendix A.2.

5 The first-order conditions imply that the net social marginal utility of income,  dΦ, is identically zero on [w,w] and that R(w)s=w at any switch point in this region.

Appendix

A. Proof of Proposition 1
A.1 Derivative of lifetime consumption and redistribution force
We first compute the Fréchet-derivative of lifetime consumption levels with respect to the tax schedule R. We consider a perturbation R+ɛh of the tax schedule, where h is a nonnegative test function h with compact support. Using the expression of ci, equation (4), and the change of variables w=wi(a), we find that the ratio [ci(R+ɛh)ci(R)]/ɛ tends to
as ɛ goes to zero, meaning that the positive measure  dTi(w;i) is the Fréchet-derivative of ci. This is the formal statement corresponding to equation (5). Ti(w)/Li is the cumulative distribution function of wi seen as a random variable. If we think of agent i’s productivity when she works as a random variable and denote that variable by Wi, the above integral can be seen as the expectation of h(Wi), multiplied by Li.
The chain rule then yields the redistribution force (7). Keeping labor supply constant, the ratio [(R+ɛh)(R)]/ɛ tends to
as ɛ goes to zero, which yields (7).
A.2 Labor supply elasticity and efficiency force

Labor supply is changed under the perturbed schedule R+ɛh only if the support of h contains switching points. For ease of exposition, we assume that the support contains only one switching point, that we denote by w¯. We denote by i the switching agent and by ai the age at which agent i switches at w¯. We have: wi(ai)=w¯ and R(w¯)s=δi(ai). To fix ideas, we suppose that both δ(ai) and w(ai) are positive and that the slope of the indifferent agent’s trajectory is larger than the slope of the schedule: δi(ai)/w(ai)>R(w¯).

The perturbed schedule R+ɛh crosses agent i’s trajectory at points w such that there exists a with w=w(a) and I(a,ɛ)=0, where
As I/ɛ(ai,0)=h(w¯) and I/a(ai,0)=δi(ai)+R(w¯)wi(ai), the ratio [Ti(w;R+ɛh)Ti(w;R)]/ɛ tends to
as ɛ goes to zero. If the slope of the tax schedule is larger than that of the trajectory, δi(ai)/w(ai)<R(w¯), replacing R with R+ɛh changes labor supply on the left of w¯ and the ratio [Ti(w;R+ɛh)Ti(w;R)]/ɛ tends to
as ɛ goes to zero. This yields expression (9) for the elasticity of agent i’s labor supply. The Fréchet derivative of Ti(w;R) is thus given by
(16)
where Si(w) is the set of agent i’s switch points σ located below w, wσw is the agent’s productivity at σ, and ζ(wσ) denotes the mass point at wσ. The Fréchet derivative of the total labor supply T has the same expression as above, replacing Si(w) with S(w), the set of ‘all’ agents’ switch points located below w.

We use the same method to compute the Fréchet derivative of the term 01[wi(a)δi(a)]i(a) da. The only difference with the above analysis is the presence of the multiplicative term wi(a)δi(a), which, at a=ai, is equal to w¯R(w¯)+s, given that w¯ is a switch point. This yields (12) and (13).

Discontinuous Lagrangian

Consider an indifference point w such that the incentive schedule is locally tangent to the indifferent agent’s trajectory. (In other words, we have: σ=R.) Then the Lagrangian is discontinuous at w, as an infinitesimally small increase in R implies a non-infinitesimal change in the Lagrangian. In other words, the efficiency force is particularly strong, creating a discontinuity in the Lagrangian, whose sign is the same as that of wR+s. This is in particular the case where the tax schedule locally coincides with an agent trajectory.

B. From Increasing to Piecewise Constant Schedules

 

Lemma B.1 Let R be any nondecreasing tax schedule. Let w̲<w¯ be such that none of the agents’ trajectories (wi(a),δi(a)),a[0,1], iI, intersects the rectangle [w̲,w¯]×[R(w̲),R(w¯)]. Assume that the functions Ti have at most finitely many discontinuity points.

Then there exists a nondecreasing tax schedule R¯, such that R¯ is piecewise constant, with finitely many pieces, on [w̲,w¯],R¯ takes its values in [R(w̲),R(w¯)] on this interval, and
for iI. If all the functions Ti are continuous on [w̲,w¯], then the schedule R¯ has at most #I+1 pieces on [w̲,w¯]. 
Proof. By assumption, labor supply is not affected as long as the schedule remains between R(w̲) and R(w¯). We can therefore drop the second argument in the functions Ti, writing Ti(w) rather than Ti(w,R). Let w1,,wN be the discontinuity points of the functions Ti. Let w0=w̲ and wN+1=w¯. We have:
It is sufficient to prove the result on each interval [wj,wj+1]. Integrating by parts yields:
We now apply Lemma A.1 (p. 1260) of Ghosal and Van der Vaart (2001) with the compact set K=[wj,wj+1], the probability measure F0= dR(w)/[R(wj+1)R(wj)] and the functions Ψi(w)=Ti(w)Ti(wj+),i=1,,I, which are continuous on K. The Lemma yields a discrete probability measure ν on K with at most I + 1 support points such that
for all i=1,,I. Integrating again by parts yields
with R¯(w)=R(wj)+[R(wj+1)R(wj)]ν(w). The schedule R¯ is nondecreasing and piecewise constant, with at most I + 1 pieces. It takes its values in [R(wj),R(wj+1)]. □ 

Properties of the optimum when there is a finite number of types Consider an interval where the schedule is increasing. The schedule can locally coincide with an increasing trajectory, in which case efficiency and redistribution play in opposite directions: the schedule is slightly below the trajectory if  dΦ>0 and w<Rs, slightly above if  dΦ<0 and w>Rs. For instance, in the former case, lowering (raising) R entails an infinitesimal (a non-infinitesimal) fall in the Lagrangian through the redistribution (efficiency) effect.4

Now consider an interval [w,w] where the schedule is increasing and does not coincide with an increasing trajectory.5 By compactness of [w,w], there exists a finite sequence w=w1<<wh=w such that no trajectory crosses the rectangles [wj,wj+1]×[R(wj),R(wj+1)]. On each interval [wj,wj+1], we apply Lemma B.1 and replace R with a piecewise constant schedule that takes its values in [R(wj),R(wj+1)] and leaves the government revenue and the agents’ lifetime consumption and labor supply unchanged.□

C. Proof of Proposition 2

 

Proof. The inequality aH > aL follows from the ordering and the monotonicity of the functions R(wi(a))sδi(a),i=H,L. This inequality, in turn, implies cH > cL and u(cH)<u(cL).

Consider any productivity threshold w greater than w¯=max(wH(aH),wL(aL)). Only the redistribution force is present above w and its integral over the set [w,wH(0)],
is strictly negative because u(cH)<u(cL) and agent H spends strictly more time working in that region than agent L, wH1(w)>wL1(w). The function r(w)=min(0,R(w¯)R(w)) is non-increasing, and is an admissible variation of the tax schedule, as R+ɛr is nondecreasing for small values of ɛ. We must therefore have:
Since the bracketed term in the above inequality is negative and r0, we get r=0 and thus R(w¯)=R(w). The after-tax schedule, therefore, is flat above max(wH(aH),wL(aL)).

We now consider three cases in turn:

  1. Agent H is the only one working at low productivities, wH(aH)<wL(aL);

  2. Agent L is the only one working at low productivities, wL(aL)<wH(aH);

  3. Both agents work at low productivities, wH(aH)=wL(aL).

In case (1), only agent H works at productivities lying between wH(aH) and wL(aL). Since u(cL)>u(cH), the redistribution force in that interval pushes downward, so the financial incentive to work R − s equals δH(aH) in that interval. From the bunching conditions, it must be the case that the efficiency force is active and upward at wH(aH), hence wH(aH)>δH(aH): agent H’s labor supply is distorted downward.

To examine agent L’s labor supply, we suppose first the tax schedule is continuous at wL(aL), that is, δL(aL)=δH(aH). In this case, the after-tax schedule takes only one value for wwH(aH), namely the common value of δL(aL)=δH(aH), and agent L’s labor supply is distorted downward because wL(aL)δL(aL) is larger than wH(aH)δH(aH)>0. We consider now the case where the schedule is discontinuous at wL(aL). We know that R − s is flat and equal to δL(aL) above that point. We can therefore consider a transformation that pushes R − s down from δL(aL) to δH(aH) just above w(aL). The redistribution and efficiency effects of the transformation respectively bear on type H and type L. The former is positive and of the sign of (uHλ)(δLδH) since it takes δLδH from type H while leaving L’s lifetime consumption level unaffected. The latter is of the sign of (w(aL)δL). For them to sum up to zero, we must have w(aL)>δL(aL): agent L’s labor supply, again, is distorted downward.

In case (2), we have δH(aH)=R(wH(aH))sR(wL(aL))s=δL(aL). We deal separately with the situation where the two agents have the same opportunity costs when they stop working, and when that of H is larger than that of L. Suppose first that δL(aL)=δH(aH). Then the financial incentive to work R(w)s is equal to that common opportunity cost for all wwL(aL). The efficiency force wL(aL)δL(aL) cannot be downward at wL(aL) as this would violate the first-order condition on the bunching interval starting at wL(aL) (in practice, the government would slightly decrease the after-tax income at wL(aL)), hence wH(aH)>wL(aL)δL(aL)=δH(aH): agent H’s labor supply is distorted downward.

Suppose now that δL(aL)<δH(aH). Since u(cL)>u(cH) and only agent L works at productivities lying between wL(aL) and wH(aH), the redistribution force pushes upward and the financial incentive to work R − s equals δH(aH) in that interval. The tax schedule, therefore, is discontinuous at wL(aL) and equal to δH(aH) above that point. Consider the perturbation that moves the discontinuity point wL(aL) in the tax schedule slightly to the left while maintaining Rs=δH(aH). This perturbation, which does not affect agent H, increases agent L’s labor supply and consumption. Consumption is increased by a first-order quantity because the agent receives positive extra income δH(aH)δL(aL)>0 during a small time interval, hence a positive redistributive effect. The efficiency part of the perturbation is a change in the Lagrangian of the sign of wL(aL)δ(aL). Expressing that the latter must outweigh the former, the first-order condition on R in the bunching interval yields wL(aL)<δ(aL), an upward distortion in L’s labor supply.

Finally, we consider case (3), denoting by w̲ the common value of wH(aH) and wL(aL). We first show that the tax schedule necessarily intersects the two trajectories at the same point: δH(aH) and δL(aL) must be equal. Suppose for instance that δH(aH)<δL(aL). A small increase  dRH in after-tax income below w̲ would put agent H to work on a small time interval of length  dTH=ηH dRH. Similarly, a small decrease  dRL in after-tax income above w̲ would put agent L out of work on a small time interval of length  dTL=ηL dRL. These transformations have redistribution effects that are of the second order. Choosing  dRH and  dRL such that  dTH= dTL, we find by (12) that the associated changes in the Lagrangian would be respectively λ(w̲δH(a)) dT and λ(w̲δL(a)) dT. The sum of these two quantities would be of the sign of δL(a)δH(a), therefore positive, implying that one of the above changes would increase the Lagrangian through the efficiency force—a contradiction. A similar contradiction is found if δH(aH)>δL(aL), hence the announced equality.

The tax schedule is flat above w̲. A slight decrease of its constant level has a positive redistribution effect, and must therefore have a negative efficiency effect, implying that both agents have their labor supply distorted downward.

Collecting the results obtained in the three cases, we directly get parts (1), (2), and (3) of the proposition. We have also seen that aH > aL and can therefore compute
In each of three cases studied above, the tax schedule is flat over [wL(aL),wH(0)], and hence R(wL(a)))=R(wH(a)) for all aaL, which yields (15).

Acknowledgements

This work has benefited from the financial support of the European Research Council under grant WSCWTBDS. It has been presented at numerous conferences, in Amsterdam, Koln, London, Lugano, Marseille, Munich, Nashville, and Uppsala. We are most grateful for the comments of Felix Bierbrauer, Sören Blomquist, Monika Buetler, Mikhail Golosov, Jean-Baptiste Michau, Panu Poutvaara, Nicola Pavoni, Richard Rogerson, Tom Sargent, Laurent Simula, and Eytan Sheshinski.

References

Brito
D.
,
Hamilton
J.
,
Slutsky
S.
,
Stiglitz
J.
(
1991
), “
Dynamic Optimal Income Taxation with Government Commitment
”,
Journal of Public Economics
44
,
15
35
.

Choné
P.
,
Laroque
G.
(
2005
), “
Optimal Incentives for Labor Force Participation
”,
Journal of Public Economics
89
,
395
425
.

Choné
P.
,
Laroque
G.
(
2011
), “
Optimal Taxation in the Extensive Model
”,
Journal of Economic Theory
146
,
425
53
.

Ghosal
S.
,
der Vaart
A. Van
(
2001
), “
Entropies and Rates of Convergence for Maximum Likelihood and Bayes Estimation for Mixtures of Normal Densities
”,
The Annals of Statistics
29
,
1233
63
.

Kocherlakota
N.
(
2005
), “
Zero Expected Wealth Taxes: A Mirrlees Approach to Dynamic Optimal Taxation
”,
Econometrica
73
,
1587
621
.

Kocherlakota
N.
(
2010
),
The New Dynamic Public Finance
,
Princeton University Press
,
Princeton
.

Laroque
G.
(
2011
), “
On Income and Wealth Taxation in a Life-Cycle Model with Extensive Labour Supply
”,
Economic Journal
121
,
F144
61
.

Piketty
T.
,
Saez
E.
(
2013
), “
Optimal Labor Income Taxation
”, in
Handbook of Public Economics
,
vol. 5
,
Elsevier
, pp.
391
474
.

Rogerson
R.
(
2011
), “
Individual and Aggregate Labor Supply with Coordinated Working Times
”,
Journal of Money, Credit and Banking
43
,
7
37
.

Shourideh
A.
,
Troshkin
M.
(
2012
), “
Providing Efficient Incentives to Work: Retirement Ages and the Pension System
”,
Discussion paper, Yale University, New Haven
.

Weinzierl
M.
(
2011
), “
The Surprising Power of Age-Dependent Taxes
”,
Review of Economic Studies
78
,
1490
518
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.