-
PDF
- Split View
-
Views
-
Cite
Cite
Fabian Dunker, Stefan Hoderlein, Hiroaki Kaido, Nonparametric identification of random coefficients in aggregate demand models for differentiated products, The Econometrics Journal, Volume 26, Issue 2, May 2023, Pages 279–306, https://doi.org/10.1093/ectj/utad002
- Share Icon Share
Summary
This paper studies nonparametric identification in market-level demand models for differentiated products with heterogeneous consumers. We consider a general class of models that allows for the individual-specific coefficients to vary continuously across the population and give conditions under which the density of these coefficients, and hence also functionals such as the fractions of individuals who benefit from a counterfactual intervention, is identified.
1. INTRODUCTION
Modelling consumer demand for products that are bought in single or discrete units has a long and colourful history in applied economics, dating back to at least the foundational work of McFadden (1974; 1981). While allowing for heterogeneity, much of the earlier work on this topic, however, was not able to deal with the fact that in particular the own price is endogenous. In a seminal paper that provides the foundation for much of contemporaneous work on discrete choice consumer demand, Berry et al. (1995; BLP) have proposed a solution to the endogeneity problem. Indeed, this work is so appealing that it is not just applied in discrete choice demand and empirical industrial organization (IO), but also increasingly in many adjacent fields, such as health, urban or education economics, and many others. From a methodological perspective, this line of work is quite different from traditional multivariate choice, as it uses data on the aggregate level and integrates out individual characteristics1 to obtain a system of nonseparable equations. This system is then inverted for unobservables for which in turn a moment condition is then supposed to hold.
Descending in parts from the parametric work of McFadden (1974; 1981), market-level demand models share many of its features, in particular (parametric) distributional assumptions, but also a linear random coefficients structure for the latent utility. Not surprisingly, there is increasing interest in the properties of the model, in particular which features of the model are nonparametrically point identified, and how the structural assumptions affect identification of the parameters of interest. Why is the answer to these questions important? Because an empiricist working with this model wants to understand whether the results they obtained are a consequence of the specific parametric assumptions they invoked, or whether they are at least qualitatively robust. In addition, nonparametric identification provides some guidance on essential model structure and on data requirements, in particular about instruments. Finally, understanding the basic structure of the model makes it easier to understand how the model can be extended. Extensions of the BLP framework that are desirable are, in particular, to allow for consumption of bundles and multiple units of a product without modelling every choice as a new, separate alternative.
We are not the first to ask the nonparametric identification question for market demand models. In a series of elegant papers, Berry and Haile (2014; 2020; BH henceforth) provide important answers to many of the identification questions. In particular, they establish conditions under which the demand function, structural errors (or unobserved product characteristics), and the distribution of a heterogeneous utility index are nonparametrically identified.
Our work complements this line of work in that we follow more closely the original BLP specification, and assume in addition that the utility index has a random coefficients structure. More specifically, we show how to nonparametrically identify the distribution of random coefficients in this framework. This result addresses an open problem of identification of preference distributions in the nonparametric version of the aggregate BLP model. Furthermore, it is also important for applications because the distribution of random coefficients allows us to characterize the distribution of the changes in welfare due to a change in observable characteristics, in particular the own price (to borrow an analogy from the treatment effect literature, if we think of a price as a treatment, BH recover the treatment effect on the distribution, while we recover the distribution of treatment effects). For example, consider a change in the characteristics of a good. The change may be due to a new regulation, an improvement of the quality of a product, or an introduction of a new product. Knowledge of the random coefficient density allows the researcher to calculate the distribution of the welfare effects. This allows one to answer various questions. For example, one may investigate whether the change gives rise to a Pareto improvement. Another example is to compute the fraction of individuals who would benefit from a change in a product characteristic.2 These analyses are possible because, with the distribution of the random coefficients being identified, one can track each individual’s welfare before and after the change. Identification of the random coefficient distribution allows one to conduct various types of welfare analysis that are not possible by only identifying the demand function. Our focus therefore will be on the set of conditions under which one can uniquely identify the random coefficient distribution from the observed demand.
Naturally, identification will depend crucially on the specific model at hand. As it turns out, there are important differences between the classical BLP and the pure characteristics model (see Berry and Pakes, 2007; PCM henceforth) that stem from the presence of an alternative, individual, and market specific error, typically assumed to be logistically distributed, and hence called ‘logit error’ in the following. A lucid discussion about the pros and cons of both approaches can be found in Berry and Pakes (2007). One advantage of the PCM we would like to emphasize at this point is that it is well-suited for the analysis of welfare changes when a new product with a particular characteristic is introduced to the market. Moreover, the pure characteristics model also predicts a reasonable substitution pattern when the number of products is large, while the BLP-type model may give counter-intuitive predictions. In addition to these important economic differences, the identification strategies, including the sufficient conditions, also differ significantly across the two models. In particular, in the BLP model we rely on an identification at infinity argument to isolate the unobservable for each product. In contrast, in the PCM one can achieve identification without relying on such an argument (and therefore do not have to employ some restrictive assumptions). Instead, in the PCM, we demonstrate that one may combine demand on products across different markets to construct a function that depends on the random coefficients through a single index, so that we can recover the distribution of unobserved heterogeneity without relying on identification at infinity. We call this construction marginalization(or aggregation)of demand. This is possible due to the unique structure of the PCM in which only the product characteristics (but not the tastes for products) determine the demand. To our knowledge, this identification strategy is novel. The arguments in establishing nonparametric identification of these changes are constructive and permit the construction of sample counterparts estimators, using the theory in Hoderlein et al. (2010) or Dunker et al. (2021). This theory reveals that the random coefficients density is only weakly identified.
Another contribution in this paper is that we use the insights obtained from the identification results to extend the market demand framework to cover bundle choice (i.e., consume complementary goods together). Note that bundles can in principle be accommodated within the BLP framework by treating them as separate alternatives. However, this is not parsimonious, as the number of alternatives increases rapidly, and with it the number of unobserved product characteristics, making the system quickly intractable. To fix ideas, suppose there were two goods, say goods A and B. First, we allow for the joint consumption of goods A and B, and, second, we allow for the consumption of several units of either A and/or B, without labelling it a separate alternative. We model the utility of each bundle as a combination of the utilities for each good and an extra utility from consuming the bundle. This structure in turn implies that the dimension of the unobservable product characteristic equals the number of goods |$J$| instead of the number of bundles. There are three conclusions we draw from this contribution: first, depending on the type of model, the data requirements vary. In particular, to identify all structural parts of the model, in, say, the model on bundle choice, market shares are not the correct dependent variable any more. Second, depending on the object of interest, the data requirements and assumptions may vary depending on whether we want to just recover demand elasticities, or the entire distribution of random coefficients. Third, the parsimonious features of the structural model result in significant overidentification of the model, which opens up the way for specification testing, and efficient estimation. As in the classical BLP setup, in all setups we may use the identification argument to propose a nonparametric sample counterpart estimator.3
as discussed above, this paper is closely related to both the original BLP line of work (Berry et al., 1995; 2004), as well as to the recent identification analysis of Berry and Haile (2014; 2020). Because of its generality, our approach also provides identification analysis for the ‘pure characteristics’ model of Berry and Pakes (2007), see also Ackerberg et al. (2007) for an overview. Other important work in this literature that is completely or partially covered by the identification results in this paper include Petrin (2002) and Nevo (2001). Moreover, from a methodological perspective, we note that BLP continues a line of work that emanates from a broader literature, which in turn was pioneered by McFadden (1974; 1981); some of our identification results therefore extend beyond the specific market demand model at hand. Other important recent contributions in discrete choice demand include Gowrisankaran and Rysman (2012), Armstrong (2016), and Moon et al. (2018). Less closely related is the literature on hedonic models, see Heckman et al. (2010) and references therein.
In addition to this line of work, we also share some commonalities with the work on bundle choice in IO, most notably Gentzkow (2007) and Fox and Lazzati (2017). For some of the examples discussed in this paper, we use Gale and Nikaido (1965) inversion results, which are related to arguments in Berry et al. (2013). Because of the endogeneity, our approach also relates to nonparametric instrumental variables (IV), in particular to Newey and Powell (2003), Dunker et al. (2014a), and Andrews (2017). Finally, our arguments are related to the literature on random coefficients in discrete choice models, see Ichimura and Thompson (1998), Matzkin (2012), Gautier and Kitamura (2013), Fox and Gandhi (2016), and Dunker et al. (2018). Since we use the Radon transform introduced by Hoderlein et al. (2010; HKM) into econometrics, this work is particularly close to the literature that uses the Radon transform, in particular HKM and Gautier and Hoderlein (2015). Finally, the class of models we consider is related to, but differs from, the mixed logit model (without endogeneity) analysed by Fox et al. (2012), who established the identification of the distribution of the random coefficients from micro-level data while maintaining the logit assumption on the tastes for products. Our focus here is on market-level models with endogeneity, with the main goal being the identification of the distribution of all random coefficients without any parametric assumption. As such, our identification strategy differs significantly from theirs. Finally, after the original version of this paper, there have been recent developments on the nonparametric identification of aggregate demand models. Allen and Rehbeck (2022) study partial identification of latent complementarity in an aggregate demand model of bundles. Their focus is on what can be learned about latent complementarity when the variation of demand shifters is limited. Lu et al. (2019) study identification and semiparametric estimation of random coefficient logit demand models in a related but different environment, in which consumers face a growing number of products.
Structure of the paper: the second section lays out preliminaries we require for our main result: we first introduce the class of models and detail the structure of our two main setups. In the same section, for completeness, we quickly recapitulate the results of Berry and Haile (2014) concerning the identification of structural demands, adapted to our setup. The third section contains the key novel result in this paper, the nonparametric (point-)identification of the distribution of random coefficients in the class of discrete choice demand model with endogeneity, which includes the BLP and PCM models. In the fourth section we discuss the identification in the bundles case, including how the structural demand identification results of Berry and Haile (2014) have to be adapted, but again focusing on the random coefficients density. We then end with an outlook.
2. PRELIMINARIES
2.1. Model
We begin with a setting where a consumer faces |$J\in \mathbb {N}$| products and an outside good which is labelled good 0. Throughout, we index individuals by |$i$|, products by |$j$|, and markets by |$t$|. We use upper-case letters, e.g., |$X_{\textit {jt}}$|, for random variables (or vectors) that vary across markets and lower-case letters, e.g., |$x_{j}$|, for particular values the random variables (vectors) can take. In addition, we use letters without a subscript for products, e.g., |$X_{t}$| to represent vectors, e.g., |$(X_{1t},\cdots ,X_{\textit {Jt}})$|. For individual |$i$| in market |$t$|, the (indirect) utility from consuming good |$j$| depends on its (log) price |$P_{\textit {jt}}$|, a vector of observable characteristics |$X_{\textit {jt}}\in \mathbb {R}^{d_X}$|, and an unobservable scalar characteristic |$\Xi _{\textit {jt}}\in \mathbb {R}$|. We model the utility from consuming good |$j$| using the linear random coefficient specification:
where |$(\alpha _{\textit {it}},\beta _{\textit {it}})^{\prime }\in \mathbb {R}^{d_X+1}$| is a vector of random coefficients representing the tastes for the product characteristics. For each |$j$|, |$\epsilon _{\textit {ijt}}$| represents the ‘taste for the product’ itself. The models with tastes for products include the random coefficient logit model used in BLP, in which case |$\epsilon _{\textit {ijt}},j=1,\cdots ,J$| are i.i.d. Type-I extreme value random variables.
Following Berry and Pakes (2007), we also consider a class of market-level demand models without tastes for products.
The model is called the pure characteristic model (PCM). The two models are known to have different theoretical properties. For example, the BLP model predicts that even with a large number of products, the mark-up remains positive, implying there is always an incentive to develop a new product. As the number of new products grows, each individual’s utility tends to infinity. On the other hand, in PCM, the model approaches competitive equilibrium, and the incentive to develop a new product diminishes as the number of products increases.4 As we will show below, the two models also differ in terms of empirical contents.
Throughout, we assume that |$X_{\textit {jt}}$| is exogenous, while |$P_{\textit {jt}}$| can be correlated with the unobserved product characteristic |$\Xi _{\textit {jt}}$| in an arbitrary way. Without loss of generality, we normalize the utility from the outside good to 0. This mirrors the setup considered in Berry and Haile (2014).
We think of a large sample of individuals as |$iid$| copies of this population model. The random coefficients |$\theta _{\textit {it}}\equiv (\alpha _{\textit {it}} ,\beta _{\textit {it}},\epsilon _{i1t},\cdots ,\epsilon _{\textit {iJt}})^{\prime }$| vary across individuals in any given market (or, alternatively, have a distribution in any given market in the population), while the product characteristics vary solely across markets. These coefficients are assumed to follow a distribution with a density function |$f_{\theta }$| with respect to Lebesgue measure, i.e., be continuously distributed.5 This density is assumed to be common across markets, and is therefore not indexed by |$t.$| As we will show, an important aspect of our identification argument is that, once the demand function is identified, one may recover |$\Xi _{t}$| from the market shares and other product characteristics |$(X_{t},P_{t})$|. Then, by creating exogenous variations in the product characteristics and exploiting the linear random coefficients structure, one may trace out the distribution |$f_\theta$| of the preference that is common across markets. We note that we can allow for the coefficients |$(\alpha _{\textit {it}} ,\beta _{\textit {it}})$| to be alternative |$j$| specific, and will do so in the Online Appendix. However, parts of the analysis will subsequently change.
Having specified the model on the individual level, the outcomes of individual decisions are then aggregated in every market. The econometrician observes exactly these market-level outcomes |$S_{l,t}$|, where |$l$| belongs to some index set denoted by |$\mathbb {L}$|. Below, we give two examples. The first example is the setting of the BLP and pure characteristics models, where individuals choose a single good out of multiple products, while the second is about the demand for bundles.
For the PCM, the random coefficients vector is |$\theta _{\textit {it}}=(\beta _{\textit {it}},\alpha _{\textit {it}})$|, and the market-level demand |$\varphi _j,j=0,\dots , J$| is given as above, but without any |$e_j$|s.
The second example considers discrete choice, but allows for the choice of bundles.
Bundles can be modelled in the PCM setup in the same way. The formulas are the same just without any |$\epsilon _{\textit {ijt}}$| or |$e_j$|.
In Example 2, we assume that the econometrician observes the aggregate demand for all the respective bundles. We emphasize this point as it changes the data requirement, and an interesting open question arises about what happens if these requirements are not met. Examples of data sets that would satisfy these requirements are when: 1. individual observations are collected through direct survey or scanner data on individual consumption (in every market); 2. aggregate variables (market shares) are collected, but augmented with a survey that asks individuals whether they consume each good separately or as a bundle; 3. another possible data source are producers’ direct record of sales of bundles, provided bundles are recorded separately (e.g., when they are sold through promotional activities). When discussing Example 2, we henceforth tacitly assume to have access to such data in principle.
2.2. Structural demand
The first step toward identification of |$f_\theta$| is to use a set of moment conditions generated by instrumental variables to identify the aggregate demand function |$\varphi$|. This section summarizes the identification result obtained by Berry and Haile (2014). Following Berry and Haile (2014), we partition the covariates as |$X_{\textit {jt}}=(X_{\textit {jt}}^{(1)},X_{\textit {jt}}^{(2)})\in \mathbb {R}\times \mathbb {R}^{d_X-1},$| and make the following assumption.
The coefficient |$\beta ^{(1)}_{\textit {it}}$| is nonrandom and is normalized to 1.
Assumption 2.1 requires that at least one coefficient on the covariates is nonrandom. Since we may freely choose the scale of utility, we normalize the utility by setting |$\beta ^{(1)}_{\textit {it}}=1$| for all |$i$| and |$t$|. Under Assumption 2.1, we may write
where |$D_{\textit {jt}}\equiv X_{\textit {jt}}^{(1)}+\Xi _{\textit {jt}}$| is the part of the utility that is common across individuals. Assumption 2.1 is arguably strong, but will provide a way to obtain valid instruments required to identify the structural demand (see Section 7 in Berry and Haile (2014) for details). Under this assumption, |$U^{*}_{\textit {ijt}}$| is strictly increasing in |$D_{\textit {jt}}$|, but unaffected by |$D_{kt}$| for all |$k\ne j.$| In Example 1, together with a mild regularity condition, this is sufficient for inverting the demand system to obtain |$\Xi _t$| as a function of the market shares |$S_t$|, price |$P_t$|, and exogenous covariates |$X_t$| (Berry et al., 2013). In what follows, we redefine the aggregate demand as a function of |$(X_t^{(2)},P_t,D_t)$| instead of |$(X_t,P_t,\Xi _t)$| by
where |$X_t=(X_t^{(1)},X_t^{(2)})$| and |$D_t=\Xi _t+X^{(1)}_t$|, and make the following assumption.
For some subset |$\tilde{\mathbb {L}}$| of |$\mathbb {L}$| whose cardinality is |$J$|, there exists a unique function |$\psi :\mathbb {R}^{J\times (d_X-1)}\times \mathbb {R}^J\times \mathbb {R}^{J}\rightarrow \mathbb {R}^{J}$| such that |$ D_{\textit {jt}}=\psi _j(X_t^{(2)},P_t,\tilde{S}_t)$| for |$j=1,\cdots , J$|, where |$\tilde{S}_t$| is a subvector of |$S_t$|, which stacks the components of |$S_t$| whose indices belong to |$\tilde{\mathbb {L}}$|.
Under Assumption 2.2, we may write
This can be used to generate moment conditions in order to identify the aggregate demand function.
In the BLP setting, the inversion discussed above is the standard Berry inversion. A key condition for the inversion is that the products are connected substitutes, see Berry et al. (2013). The linear random coefficient specification as in (2.1) is known to satisfy this condition. Then, Assumption 2.2 follows.
A similar result exists for the pure characteristics model. Sufficient conditions for Assumption 2.2 in this model were presented in Berry and Pakes (2007).
In Example 2, one may employ an alternative inversion strategy to obtain |$\psi$| in (2.8) using only subsystems of demand such as |$\tilde{\mathbb {L}}=\lbrace (1,0),(1,1)\rbrace$| or |$\tilde{\mathbb {L}} =\lbrace (0,0),(0,1)\rbrace$|. We defer details on this case to Section 3.3.
The inverted system in (2.8), together with the following assumption, yields a set of moment conditions the researcher can use to identify the structural demand.
Assumption 2.3 (i) is a mean independence assumption on |$\Xi _{\textit {jt}}$| given a set of instruments |$Z_{t}$|, which also normalizes the location of |$\Xi _{\textit {jt}}$|. Assumption 2.3 (ii) is a completeness condition, which is common in the nonparametric IV literature, see Berry and Haile (2014) for a detailed discussion. However, the role it plays here is slightly different, as the moment condition leads to an integral equation which is different from nonparametric IV (Newey and Powell, 2003), and more resembles the generalized method of moments (GMM). We use bounded completeness, which is sufficient when |$\psi _j$| is bounded. In applications where boundedness of |$\psi _j$| cannot be assumed, Assumption 2.3 (ii) can be replaced by regular completeness, i.e., it must hold any function |$B$| instead of bounded |$B$|. Completeness is known to be a significantly stronger condition then bounded completeness, see D’Haultfoeuille (2011).
In the Online Appendix Section S2.2, we discuss an approach based on a strengthening of the mean independence condition to full independence. In case such a strengthening is economically palatable, we still retain the sum |$X_{\textit {jt}}^{(1)}+\Xi _{\textit {jt}}$|, where |$X_{\textit {jt}}^{(1)}$| is similar to a dependent variable in nonparametric IV.
Given Assumption 2.3 and (2.8), the unknown function |$\psi$| can be identified through the following conditional moment restrictions:
We here state this result as a theorem. It is essentially theorem 3.1 of Berry and Haile (2014).
Lemma 2.1 (EssentiallyBerryandHaile, 2014). Suppose Assumptions2.1–2.3hold. Then, |$\psi$| is identified.
Once |$\psi$| is identified, the structural demand |$\phi$| can be identified nonparametrically in Examples 1 and 2.
3. IDENTIFICATION OF THE RANDOM COEFFICIENT DENSITY
This section contains the main innovation in this paper: We establish that the density of random coefficients in the market-level demand models is nonparametrically identified. Our strategy for identification of the random coefficient density is to construct a function from the structural demand, which is related to the density through an integral transform known as the Radon transform. More precisely, we construct a function |$\Phi (w,u)$| such that
where |$f$| is the density of interest, |$w$| is a vector in |$\mathbb {R}^{q}$| (with |$q$| the dimension of the random coefficients), normalized to have unit length, and |$u\in \mathbb {R}$| is a scalar. In what follows, we let |$\mathbb {S }^{q}\equiv \lbrace v\in \mathbb {R}^{q}:\Vert v\Vert =1\rbrace$| denote the unit sphere in |$\mathbb {R}^{q}$|. |$\mathcal {R}$| is the Radon transform defined point-wise by
where |$P_{w,u}$| denotes the hyperplane |$\lbrace v\in \mathbb {R}^{q}:v^{\prime }w=u\rbrace ,$| and |$\mu _{{w,u}}$| is the Lebesgue measure on |$P_{w,u}$|. See, for example, Helgason (1999) for details on the properties of the Radon transform including its injectivity. Our identification strategy is constructive and will therefore suggest a natural nonparametric estimator. Applications of the Radon transform to random coefficients models have been studied in Beran and Hall (1992), Beran et al. (1996), Hoderlein et al. (2010), and Gautier and Hoderlein (2015).
Throughout, we maintain the following assumption.
(i) For all|$j\in \lbrace 1,\cdots , J\rbrace$|, |$( X^{(2)}_{\textit {jt}}, P_{\textit {jt}}, D_{\textit {jt}})$|are absolutely continuous with respect to Lebesgue measure on|$\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$|; (ii) the random coefficients|$\theta$|are independent of|$(X_{t},P_t,D_t)$|.
Assumption 3.1 (i) requires that |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}},D_{\textit {jt}})$| are continuously distributed for all |$j$|. By Assumption 3.1 (ii), we assume that the covariates |$(X_{t},P_{t},D_{t})$| are exogenous to the individual heterogeneity. These conditions are used to invert the Radon transform.
Before proceeding further, we overview our identification strategy in relation to the key differences between the BLP and pure characteristics models. Heuristically, for a given |$(w,u)\in \mathbb {S}^q\times \mathbb {R}$|, the Radon transform aggregates individuals whose coefficients are on the hyperplane |$P_{w,u}$|. For each |$(w,u)$|, we relate this aggregate value to a feature of the demand with a specific product characteristics. By varying |$(w,u)$| and inverting the map |$\mathcal {R}$| in (3.1), we may then recover the distribution of the random coefficients. A key step in this identification argument is the construction of a function |$\Phi$| satisfying (3.1). The two demand models suggest different strategies to construct |$\Phi .$| In the BLP model, we construct |$\Phi$| for each product |$j$| and recover the joint distribution of the coefficients |$(\beta ^{(2)}_{\textit {it}}, \alpha _{\textit {it}},\epsilon _{\textit {ijt}})$|. We take this approach because the presence of the tastes for products requires us to isolate the demand for each product from the rest. On the other hand, the pure characteristics model does not require such an approach. Furthermore, both models allow the researcher to combine demand across different markets to construct |$\Phi$|.
3.1. BLP model
Throughout this section, we assume the utility function contains the taste for products and is given as in (2.1). Here, the scale of the unobservable variables, including the taste for product |$\epsilon _{\textit {ijt}}$|, is normalized relative to the scale of |$X^{(1)}_{\textit {jt}}$| as we set the coefficient on |$X^{(1)}_{\textit {jt}}$| to 1 in Assumption 2.1. For example, suppose the original utility specification is |$X^{(1)}_{\textit {jt}}\tilde{\beta }^{(1)}+X_{\textit {jt}}^{(2)}{}^{\prime }\tilde{\beta }^{(2)} _{\textit {it}}+\tilde{\alpha }_{\textit {it}}P_{\textit {jt}}+\tilde{\Xi }_{\textit {jt}}+\tilde{\epsilon }_{\textit {ijt}}$|, for some constant |$\tilde{\beta }^{(1)}\gt 0$|. Then, the re-scaled utility is given by |$X^{(1)}_{\textit {jt}}+X_{\textit {jt}}^{(2)}{}^{\prime }\beta ^{(2)} _{\textit {it}}+\alpha _{\textit {it}}P_{\textit {jt}}+\Xi _{\textit {jt}}+\epsilon _{\textit {ijt}}$|, where |$\beta ^{(2)} _{\textit {it}}=\tilde{\beta }^{(2)} _{\textit {it}}/\tilde{\beta }^{(1)}$|, |$\alpha _{\textit {it}}=\tilde{\alpha }_{\textit {it}}/\tilde{\beta }^{(1)}$|, |$\Xi _{\textit {jt}}=\tilde{\Xi }_{\textit {jt}}/\tilde{\beta }^{(1)}$|, and |$\epsilon _{\textit {ijt}}=\tilde{\epsilon }_{\textit {ijt}}/\tilde{\beta }^{(1)}.$|
Recall that the demand for good |$j$| with the product characteristics |$(X_{t},P_{t},\Xi _{t})$| is as given in (2.4). Since |$D_{t}=X_{t}^{(1)}+\Xi _{t}$|, the demand in market |$t$| with |$(X_{t}^{(2)},P_{t},D_{t})=(x^{(2)},p,\delta )$| is given by:
Suppose the vertical characteristics |$\lbrace D_{kt},k\ne j\rbrace$| (for products other than |$j$|) have a large enough support so that |$(X_{\textit {jt}}^{(2)}-X_{kt}^{(2)})^{\prime }{\beta }^{ (2)}_{\textit {it}}+\alpha _{\textit {it}}(P_{\textit {jt}}-P_{kt})+(\epsilon _{\textit {ijt}}-\epsilon _{ikt})-D_{\textit {jt}}\gt D_{kt}$| for all |$k\ne j$| for some values of |$D_{kt},k\ne j$|. The demand for good |$j$| for such values of |$D_{kt},k\ne j$| is then
where |$f_{\vartheta _j }$| is the joint density of the subvector |$\vartheta _{\textit {ijt}}\equiv (\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{\textit {ijt}} )$| of the random coefficients. Let |$w\equiv (x_{j}^{(2)},p_{j},1)/\Vert (x_{j}^{(2)},p_{j},1)\Vert$| and |$u\equiv \delta _j/\Vert (x_{j}^{(2)},p_{j},1)\Vert$|. Define
where the second equality holds because normalizing the scale of |$(x_{j}^{(2)},p_{j},\delta _{j})$| does not change the value of |$\tilde{\Phi }_j$|. |$\Phi$| then satisfies
Hence, by taking a derivative with respect to |$u$|, we may relate |$\Phi$| to the random coefficient density through the Radon transform:
Note that, since the structural demand |$\phi$| is identified by Lemma 2.1, |$\Phi$| is nonparametrically identified as well. Hence, equation (3.7) gives an operator that maps the random coefficient density to an object identified by the moment condition studied in the previous section. To construct |$\Phi$| described above and to invert the Radon transform, we formally make the following assumptions. Below, for each |$1\le ,j,k\le J$|, we let |$V_{jk} =(X_{\textit {jt}}^{(2)}-X_{kt}^{(2)})^{\prime }{\beta } ^{ (2)}_{\textit {it}}+\alpha _{\textit {it}}(P_{\textit {jt}}-P_{kt})+(\epsilon _{\textit {ijt}}-\epsilon _{ikt})-D_{\textit {jt}}$| and make the following assumptions on the support of the product characteristics.6
Let |$\mathcal {J}$| be a nonempty subset of |$\lbrace 1,\cdots ,J\rbrace .$| For each |$j\in \mathcal {J}$|, let |$\mathrm{supp}\, (V_{jk},k\ne j)\subset \mathrm{ supp}\, (D_{kt},k\ne j)$|.
If Assumption 3.2 holds, one may vary the characteristics of the alternative products |$\lbrace D_{kt},k\ne j\rbrace$| on a large enough support, so that the demand for product |$j$| is determined through its choice between product |$j$| and the outside good. To simplify the argument, consider a case with |$\mathcal {J}=\lbrace j\rbrace$|. Recall that each |$D_{kt}$| is a combination of observed |$X^{(1)}_{kt}$| and unobserved factors |$\Xi _{kt}$| contributing to a characteristic that consumers as a whole like, but some consumers dislike. Berry and Haile (2014) take the acceleration capacity of a car as an example of such a characteristic. Assumption 3.2 states that, at least for some |$j$|, the variation of such a characteristic dominates the utility difference |$V_{jk}$| for all |$k\ne j$|, which is arguably a strong requirement. This identification argument therefore uses a ‘thin’ (lower-dimensional) subset of the support of the covariates, which is due to the presence of the tastes for products. This is in remarkable contrast with the identification of the random coefficients density in the PCM (analysed in the next section), which does not rely on thin sets.
One of the following conditions hold:
|$\bigcup _{j\in \mathcal {J}}\mathrm{supp}\, (X_{\textit {jt}}^{(2)},P_{\textit {jt}},D_{\textit {jt}})$|has full support in|$\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$|.
- |$\bigcup _{j\in \mathcal {J}}\mathrm{supp}\, (X_{\textit {jt}}^{(2)},P_{\textit {jt}})$|contains an open ball|$B_\mathcal {J} \subset \mathbb {R}^{d_X-1}\times \mathbb {R}$|. For every|$(x,p) \in B_{\mathcal {J}}$|and every|$(b^{(2)},a,e_j) \in {\rm {supp}}\, (\vartheta _j)$|it holds thatFurthermore, all the absolute moments of each component of |$\theta _{\textit {it}}$| are finite, and for any fixed |$z\in \mathbb {R}_+$|, |$\lim _{l\rightarrow \infty }\frac{z^l }{l!} E[(|\theta _{\textit {it}}^{(1)}|+\cdots +|\theta ^{(d_\theta )}_{\textit {it}}|)^l]=0$|.(3.8)$$\begin{eqnarray} \big (x,p,-x^{\prime } b^{(2)} - ap - e_j\big ) \in \bigcup _{j\in \mathcal {J}} \mathrm{supp}\, \left(X_{\textit {jt}}^{(2)},P_{\textit {jt}},D_{\textit {jt}}\right). \end{eqnarray}$$
Assumption 3.3 (i) is our benchmark assumption. Under this assumption, no restrictions on |$\theta _{\textit {it}}$| are necessary for identification. In fact, the identification strategy would be valid for arbitrary Borel measures, and may also be applied to settings where |$\theta _{\textit {it}}$| does not have a density.7 However, this large support assumption is stringent and may be violated by various product characteristics and prices used in practice. Hence, it should be viewed as a benchmark to understand what the model requires to identify the distribution |$f_\theta$| of |$\theta _{\textit {it}}$| if one does not impose any restriction on it.
Assumption 3.3 (ii) is an alternative condition, which relaxes the support requirement significantly. Instead of a large support, it is enough for the product characteristics to have a properly combined support that contains a (possibly small) open ball |$B_{\mathcal {J}}$| in it. This includes as a special case where a single product’s characteristics |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}})$| contains an open ball, which can be met in various applications. For example, suppose |$X_{\textit {jt}}^{(2)}$| is a scalar continuous variable, such as fuel efficiency of a car that varies across markets with different climate or road conditions, the average speed of an internet service, or computational performance of a cloud service. Then, the condition is satisfied as long as |$X_{\textit {jt}}^{(2)}$| and |$P_{\textit {jt}}$| vary over a two-dimensional open set. Even if such a product does not exist, identification of the random coefficient density is possible as long as the required support condition is met by combining the supports of multiple products belonging to |$\mathcal {J}$|. This means that our identification strategy may use variations of |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}})$| across products. To illustrate, consider three products |$J=3$|. If |$(D_{2t},D_{3t})$| have a large support in the sense of Assumption 3.2 (|$\mathcal {J}=\lbrace 1\rbrace$| in this case), identification of the random coefficient density is possible as long as the characteristics of good 1 contain an open ball. If all |$\lbrace D_{\textit {jt}}\rbrace _{j=1}^3$| jointly have a large support (this implies |$\mathcal {J}=\lbrace 1,2,3\rbrace$|), our requirement on |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}})$| becomes even milder as we only need to construct an open ball by combining the characteristics of all three products.
The condition in (3.8) allows for a bounded support of |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}})$|. Further, if |$\vartheta _{j}$| has a bounded support, Assumption 3.3 (ii) will allow for a bounded support of |$D_{j}$|. The price to pay for this relaxation of the support requirement is a regularity assumption on the moments of |$\theta _{\textit {it}}$|. This rules out heavy tailed distributions that are not determined by their moments. A sufficient, yet stronger than necessary, condition for this assumption is a compact support of |$f_{\theta }$|. Under Assumption 3.3 (ii), the characteristic function |$w\mapsto \varphi _{\vartheta _{j}}(tw)$| of |$\vartheta _{\textit {ijt}}$| (a key element of the Radon inversion) is analytic, and thereby uniquely determined by its restriction to a nonempty full dimensional subset of its domain.8 Hence, |$f_{\vartheta _{j}}$| can be identified if one varies |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}})$| on a full dimensional subset.
Under the conditions given in the theorem below, the Radon inversion identifies |$f_{\vartheta _j}$|. If one is interested in the joint density of the coefficients on the product characteristics |$(\beta ^{(2)}_{\textit {it}}, \alpha _{\textit {it}})$|, one may stop here as marginalizing |$f_{\vartheta _j}$| gives the desired density. The joint distribution of the coefficients, including the tastes for products, can be identified under an additional independence assumption. We state this result in the following theorem.
Suppose Assumptions2.1–3.3hold. Suppose the conditional distribution of|$\epsilon _{\textit {ijt}}$|given|$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|is identical for all|$j\in \mathcal {J}$|. Then, (i) for each|$j\in \mathcal {J}$|, the density|$f_{\vartheta _j}$|is identified, where|$\vartheta _{\textit {ijt}}=(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}}, \epsilon _{\textit {ijt}})$|; (ii) if, in addition, |$\lbrace \epsilon _{\textit {ijt}},j\in \mathcal {J}\rbrace$|are independently distributed (across|$j$|) conditional on|$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|, the joint density|$f_{\theta _{\mathcal {J}}}$|of|$\theta _{\mathcal {J}}=(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\lbrace \epsilon _{\textit {ijt}}\rbrace _{j \in \mathcal {J}})$|is identified.
An immediate corollary is the following.
Suppose Assumptions2.1–3.3hold. Let|$\lbrace \epsilon _{\textit {ijt}}\rbrace _{j=1}^J$|be i.i.d. (across|$j$|) conditional on|$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|. Then, the joint density|$f_\theta$|of all random coefficients|$\theta _{\textit {it}}=(\beta ^{(2)}_{\textit {it}}, \alpha _{\textit {it}},\lbrace \epsilon _{\textit {ijt}}\rbrace _{j=1}^J)$|is identified.
Lemma 2.1 and Theorem 3.1 shed light on the roles played by the key features of the BLP-type demand model: the invertibility of the demand system, instrumental variables, and the linear random coefficients specification. In Lemma 2.1, the invertibility and instrumental variables play key roles in identifying the demand. Once the demand is identified, one may ‘observe’ the vector |$(X^{(2)}_t,P_t,D_t)$| of product characteristics. This is possible because the invertibility of demand allows one to recover the unobserved product characteristics |$\Xi _t$| from the market shares |$S_t$| (together with other covariates). One may then vary |$(X^{(2)}_t,P_t,D_t)$| across markets in a manner that is exogenous to the individual heterogeneity |$\theta _{\textit {it}}$|. Theorem 3.1 and Corollary 3.1 show that this exogenous variation combined with the linear random coefficients specification allows us to trace out the distribution of |$\theta _{\textit {it}}$|.
If for each |$j$|, |$(X_{\textit {jt}}^{(2)},P_{\textit {jt}},D_{\textit {jt}})$| fulfills the support condition in Assumption 3.3 (i) or Assumption 3.3 (ii), one can drop the identical distribution assumption. This is because one can identify |$f_{\vartheta _{j}}$| for all |$j$| by inverting the Radon transform in (3.7) repeatedly. This in turn implies that the distribution of |$\epsilon_{\textit {ijt}}$| conditional on |$(\beta_{\textit {it}}^{(2)},\alpha_{\textit {it}})$| is identified for each |$j$|. If the tastes for products |$\lbrace \epsilon _{\textit {ijt}}\rbrace _{j=1}^J$| are mutually independent (conditional on |$(\beta _{\textit {it}}^{(2)},\alpha_{\textit {it}})$|), as is commonly assumed in BLP, the joint density |$f_{\theta ~}$| is identified.
Finally, we comment on what an additional parametric assumption may add to our result. If one assumes that the tastes for products are i.i.d. and follow a parametric distribution, equation (3.3) reduces to |$\phi_{j}(x^{(2)},p,\delta ~)=\int ~L(x^{(2)}{}^{\prime }b^{(2)}+ap+\delta )f_{(\beta ,\alpha )}(b,a)dbda,$| for some function |$L$|, e.g., |$L$| is the logit function when |$\lbrace \epsilon _{\textit {ijt}}\rbrace$| follows a Type-I extreme value distribution. This type of integral equation is considered in Fox et al. (2012) in the context of individual-level demand model without endogeneity. Given that |$\phi$| is identified, we believe that it is possible to extend their framework to the market-level demand model with endogeneity and identify |$f_{(\beta ,\alpha )}$| semiparametrically. This approach may allow us to relax some of the support conditions. To keep a tight focus on nonparametric identification, we leave this extension for future work.
Our identification result reveals the nature of the BLP-type demand model. A positive aspect of our result is that the preference is nonparametrically identified if one observes full dimensional variations in the consumers’ choice sets (represented by |$(X^{(2)}_{\textit {jt}},P_{\textit {jt}},D_{\textit {jt}})$|) across markets. The identifying power is quite strong, if the product characteristics jointly span a full support, i.e., |$\bigcup _{j\in \mathcal {J}}(X^{(2)}_{\textit {jt}},P_{\textit {jt}},D_{\textit {jt}})=\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}.$| On the other hand, if the product characteristics have limited variations, the identifying power of the model on the distribution of preferences may be limited. In particular, identification is not achieved only with discrete covariates. Identification in such settings is an open problem. Here, we discuss a few possibilities to make progress. Hermann and Holzmann (2021) show one can identify a finite number of moments with discrete covariates with finite support in a related, but distinct class, of linear models with random coefficients. Therefore, if a similar result holds for our setting, one way to achieve identification is to augment the model structure with a parametric specification and match the moments. Another interesting direction would be to conduct partial identification analysis on a functional |$\varphi (f_\theta )$| of the random coefficient density while imposing weak support restrictions. Suppose discrete covariates restrict a finite number of moments |$m(f_\theta )=0$|, where |$m$| maps the random coefficient density to a finite number of moments. Then, finding upper and lower bounds on the functional of interest is equivalent to maximizing (or minimizing) |$\varphi (f_\theta )$| subject to the identifying restrictions |$m(f_\theta )=0$| and a priori restrictions imposed through the parameter space for |$f_\theta$|.9 We leave this possibility for future research.
3.2. Pure characteristics demand models
Throughout this section, we consider the following utility specification, where each product’s utility is fully determined by the tastes for the product characteristics:
For this model, we employ a different, and arguably less restrictive, strategy from the one adopted in the previous section to construct |$\Phi$| in (3.1). Below, we maintain Assumptions 2.1–2.3, which ensure the identification of demand by Lemma 2.1. The demand for good |$j$| with the product characteristics |$(X_{t},P_{t},\Xi _{t})$| is as given in (2.4), but without any |$e_j$|. Since |$D_{t}=X_{t}^{(1)}+\Xi _{t}$|, the demand in market |$t$| with |$(X_{t}^{(2)},P_{t},D_{t})=(x^{(2)},p,\delta )$| is given by:
For any subset |$\mathcal {J}$| of |$\lbrace 1,\cdots ,J\rbrace \setminus \lbrace j\rbrace$|, let |$\mathcal {M}_{\mathcal {J}}$| denote the map |$(x^{(2)},p,\delta )\mapsto ( \acute{x}^{(2)},\acute{p},\acute{\delta })$| that is uniquely defined by the following properties:
In other words, for a given product |$j$| and product characteristics |$(x^{(2)},p,\delta )$|, this map finds another value |$( \acute{x}^{(2)},\acute{p},\acute{\delta })$| of product characteristics such that, for products |$i$| belonging to |$\mathcal {J}$|, the difference in the product characteristics (e.g., |$\acute{x}_{j}^{(2)}-\acute{x}_{i}^{(2)}$|) coincides with the original value (e.g., |$x_{j}^{(2)}-x_{i}^{(2)}$|) in terms of magnitude, but has an opposite sign. For products |$i$| not belonging to |$\mathcal {J}$|, the map sets their product characteristics to the original value |$(x^{(2)}_i,p_i,\delta _i)$|.
Consider the composition |$\phi _{j}\circ \mathcal {M}_{\mathcal {J}}(x^{(2)},p,\delta )$|. If |$( \acute{x}^{(2)},\acute{p},\acute{\delta })$| is in the support, this corresponds to the demand of product |$j$| in some market (say |$t^{\prime }$|) with |$(X_{t^{\prime }}^{(2)},P_{t^{\prime }},D_{t^{\prime }})=( \acute{x} ^{(2)},\acute{p},\acute{\delta })$|. We then define
Equation (3.12) aggregates the structural demand function for good |$j$| in different markets to define a function, which can be related to the random coefficient density in a simple way. This operation can be easily understood when |$J=2$|, where, for example, demand for product 1 is given by
Then, |$\tilde{\Phi }_{1}$| is given by
This shows that aggregating the demand in the two markets with |$(X_{t}^{(2)},P_{t},D_{t})=(x^{(2)},p,\delta )$| and |$(X_{t^{\prime }}^{(2)},P_{t^{\prime }},D_{t^{\prime }})=(\acute{x}^{(2)},\acute{p},\acute{ \delta })$| yields a function |$\tilde{\Phi }_{1}$| that depends only on product 1’s characteristic |$(x_{1}^{(2)},p_{1},\delta _{1})$| through a single index in (3.13). This then allows us to trace out the random coefficients density by varying product 1’s characteristic as done in the BLP model. Since the operation above yields a function that depends only on the characteristic of a single product, we call it marginalization of demand.10
Equation (3.12) generalizes this argument to settings with |$J\ge 2$|. For the marginalization of demand to work, the product characteristic |$(\acute{x}^{(2)},\acute{p},\acute{\delta })=\mathcal {M}_{\mathcal {J}}(x^{(2)}, p,\delta )$| needs to be an observable value, meaning it must be in the support. Formally, a value of the product characteristic |$(x^{(2)}, p,\delta )\in \text{supp}( X^{(2)}_t, P_{t}, D_{t})$| is said to permit marginalization of demand with respect to product |$j$| if
As done in the BLP setting, we will only require that a rich enough set to recover |$f_\theta$| can be constructed by combining the supports of multiple products’ characteristics. Toward this end, for each |$j\in \lbrace 1,\cdots ,J\rbrace$|, let |$\pi _j$| be the projection map such that |$(x^{(2)}_{j},p_j,\delta _j)= \pi _j(x^{(2)}, p,\delta )$|, and define the following sets:
In other words, |$\mathcal {H}_j$| is the set of the entire product characteristic vectors for which marginalization with respect to product |$j$| is permitted. |$\mathcal {S}_j$| is the coordinate projection of |$\mathcal {H}_j$| onto the space of product |$j$|’s characteristics. We then make the following assumption.
One of the following conditions hold:
|$\bigcup _{j=1}^J\mathcal {S}_j=\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$|;
|$\bigcup _{j=1}^J\mathcal {S}_j=\mathbb {E}\times \mathbb {D}$|, where|$\mathbb {E}$|contains an open ball|$B \subset \mathbb {R}^{d_X-1}\times \mathbb {R}$|, and|$\mathbb {D}\subseteq \mathbb {R}$|. For every|$(x,p) \in B$|and every|$(b^{(2)},a) \in \mathrm{supp}\, (\theta _{\textit {it}})$|, it holds that|$(x,p,-x^{\prime } b^{(2)} - ap) \in \bigcup _{j=1}^J\mathcal {S}_j$|.
Furthermore, all the absolute moments of each component of|$\theta _{\textit {it}}$|are finite, and for any fixed|$z\in \mathbb {R}_+$|, |$0=\lim _{l\rightarrow \infty }\frac{z^l}{l!}(E[|\theta _{\textit {it}}^{(1)}|^l]+\cdots +E[|\theta ^{(d_\theta )}_{\textit {it}}|^l])$|.
The idea behind Assumption 3.4 is as follows. For the moment, suppose we don’t impose any moment condition on the random coefficient density. Also, fix a benchmark product |$j$|. For any |$(x^{(2)}_{j},p_j,\delta _j)\in \mathcal {S}_j$|, one may find a vector |$(x^{(2)}, p,\delta )$| of all product characteristics for which marginalization of demand is allowed. Then, one would wish to vary |$(x^{(2)}_{j},p_j,\delta _j)$| to trace out the random coefficient density. This is possible, of course, if |$\mathcal {S}_j=\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$|, meaning that marginalization is possible everywhere with respect to product |$j.$| However, this assumption may be too strong in empirical applications. One may not be able to find any single product for which this condition is satisfied. Assumption 3.4 (i) relaxes this requirement substantially using the structure of the model. Observe that the identification argument is symmetric across products because only the characteristics matter. Hence, the argument is valid as long as, for each |$(\mathbf {x}^{(2)},\mathbf {p}, \mathbf {d})\in \mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$|, one can find some product for which marginalization is permitted. This is the reason why it is enough to ‘patch’ |$\mathcal {S}_j$|s together to |$\mathbb {R}^{d_X-1}\times \mathbb {R}\times \mathbb {R}$| in Assumption 3.4 (i). This condition can be made even weaker with the help of an additional moment condition. In Assumption 3.4 (ii), we only require that |$\mathcal {S}_j$|s combined together contain an open ball (in terms of |$(\mathbf {x}^{(2)},\mathbf {p})$|). This support requirement is quite mild, and hence it can be satisfied even if each product’s characteristic has limited variation across markets. Note also that, if |$\mathrm{supp}\, (\theta _{\textit {it}})$| is compact, the support of |$D_{\textit {jt}}$| can be compact as well.
It is important to note that we construct |$\tilde{\Phi }_j$| without relying on any ‘thin’ (lower-dimensional) subset of the support of the product characteristics as done in the BLP model. Instead, we construct |$\tilde{\Phi }_j$| in (3.12) by combining the demand in different markets. This is desirable, as estimators that rely on thin or irregular identification may have a slow rate of convergence (Khan and Tamer, 2010). In the pure characteristics model, the individuals have varying tastes (random coefficients) over the product characteristics, but not over the products themselves. This is the key feature of the model that allows us to identify the random coefficients through the variation of the product characteristics |$( X^{(2)}_t, P_{t}, D_{t})$|. In contrast, in the BLP model there was an additional taste for the product itself, which was the main reason for using the thin set to isolate the demand for each product.
Given Assumption 3.4, we now construct |$\Phi$| in equation (3.1). For each |$(\mathbf {x}^{(2)},\mathbf {p},\mathbf {d})\in \bigcup _{j=1}^J\mathcal {S}_j$|, let |$w\equiv (\mathbf {x}^{(2)},\mathbf {p} )/\Vert (\mathbf {x}^{(2)},\mathbf {p})\Vert$| and |$u\equiv \mathbf {d}/\Vert ( \mathbf {x}^{(2)},\mathbf {p})\Vert$|. Define
Here, for each |$(\mathbf {x}^{(2)},\mathbf {p},\mathbf {d})$|, any |$j$| can be used to construct |$\tilde{\Phi }_j$| through marginalization as long as |$\mathcal {S}_j$| contains |$(\mathbf {x}^{(2)},\mathbf {p},\mathbf {d})$|. Then |$\Phi$| is defined on a set that is rich enough to invert the Radon (or limited angle Radon) transform. The rest of the analysis parallels our analysis of the BLP model.11 We therefore obtain the following point identification result.
Suppose Assumptions2.1–3.1and3.4hold. Then, |$f_\theta$| is identified in the pure characteristics demand model, where|$\theta _{\textit {it}}=(\beta ^{(2)}_{\textit {it}}, \alpha _{\textit {it}}).$|
3.3. Bundle choice (Example 2)
In this section, we consider a model with the taste for products.12 We consider an alternative procedure for inverting the demand in Example 2. This is because this example (and also the example in Section 3.4 in the Online Appendix) has a specific structure. We note that the inversion of Berry et al. (2013) can still be applied to bundles if one treats each bundle as a separate good, and recasts the bundle choice problem into a standard multinomial choice problem. However, as can be seen from (2.5), Example 2 has the additional structure that the utility of a bundle is the combination of the utilities for each good and extra utilities, and hence the model does not involve any bundle-specific unobserved characteristic. This structure in turn implies that the dimension of the unobservable product characteristic |$\Xi _t$| equals the number of goods |$J$|, while the econometrician in general observes |$\dim (S)= 2^J$| aggregate choice probabilities over bundles, which leads to a system of equations whose number of restrictions exceeds the number of unknown quantities. This suggests that (i) using only a part of the demand system is sufficient for obtaining an inversion, which can be used to identify |$f_\theta$|, and (ii) using additional subcomponents of |$S$|, one may potentially overidentify the parameter of interest. We therefore consider an inversion that exploits a monotonicity property of the demand system that follows from this structure.13 For this, we assume that the following condition is met.
The random coefficient density |$f_{\theta }$| is continuously differentiable. In addition, |$(\epsilon _{i1t},\epsilon _{i2t})$| and |$(D_{1t},D_{2t})$| have full supports in |$\mathbb {R}^2$|, respectively.
Let |$\tilde{\mathbb {L}}=\lbrace (1,0),(1,1)\rbrace .$| From (2.7) it is straightforward to show that |$\varphi _{(1,0)}$| is strictly increasing in |$D_{1t}$|, but is strictly decreasing in |$D_{2t}$|, while |$\varphi _{(1,1)}$| is strictly increasing both in |$D_{1t}$| and |$D_{2t}$|. Hence, the Jacobian matrix is nondegenerate. Together with a mild support condition on |$(D_{1t},D_{2t})$|, this allows us to invert the demand (sub)system and write |$\Xi _{\textit {jt}}=\psi _j(X^{(2)}_t,P_t,\tilde{S}_t)-X^{(1)}_{\textit {jt}},$| where |$\tilde{S}_t=(S_{(1,0),t},S_{(1,1),t})$|. This ensures Assumption 2.2 in this example (see Lemma S.1 given in the Online Appendix). By Lemma 2.1, one can then nonparametrically identify subcomponents |$(\varphi _{(1,0)},\varphi _{(1,1)})$| of the demand function |$\varphi$|.
One may alternatively choose |$\tilde{\mathbb {L}}=\lbrace (0,0),(0,1)\rbrace$|, and the argument is similar, which then identifies |$(\varphi _{(0,0)}, \varphi _{(0,1)})$|, and hence all components of the demand function |$\varphi$| are identified. This inversion is valid even if the two goods are complements. This is because the inversion uses the monotonicity property of the aggregate choice probabilities on bundles (e.g., |$\phi _{(1,0)}$| and |$\phi _{(1,1)}$|) with respect to |$(D_{1t}, D_{2t})$|. Hence, even if the aggregate share of each good (e.g., aggregate share on good 1: |$\sigma _1=\phi _{(1,0)}+\phi _{(1,1)}$|) is not invertible in the price |$P_t$| due to the presence of complementary goods, one can still obtain a useful inversion provided that aggregate choice probabilities on bundles are observed.
Given the demand for bundles, we now analyse identification of the random coefficient density. By (2.5), the demand for bundle (0, 0) is given by
Given product |$j\in \lbrace 1,2\rbrace$|, let |$-j$| denote the other product. We then define |$\tilde{\Phi }_l$| with |$l=(0,0)$| as in the BLP example by letting |$D_{-jt}$| take a large negative value. For each |$(x^{(2)},p,\delta )$|, let
We then define |$\Phi _{(0,0)}$| as in (3.5).14 Consider for the moment |$j=1$| in (3.17). Then, |$\Phi _{(0,0)}$| is related to the joint density |$f_{\vartheta _1}$| of |$\vartheta _{i1t}\equiv (\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{i1t})$| through a Radon transform.15 Arguing as in (3.6), it is straightforward to show that |$\partial \Phi _{(0,0)}(w,u)/\partial u=\mathcal {R}[f_{\vartheta _1}](w,u)$| with |$w\equiv (x^{(2)}_1,p_1,1)/\Vert (x^{(2)}_1,p_1,1)\Vert$| and |$u\equiv \delta _1/\Vert (x^{(2)}_1,p_1,1)\Vert$|. Hence, one may identify |$f_{\vartheta _1}$| by inverting the Radon transform under Assumptions 3.1 and 3.2 with |$J=2$|.
If the researcher is only interested in the distribution of |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{\textit {ijt}})$|, but not in the bundle effect, the demand for (0, 0) is enough for recovering their density. However, |$\Delta _{\textit {it}}$| is often of primary interest. The demand on (1, 1) can be used to recover its distribution by the following argument.
The demand for bundle (1, 1) is given by
Note that |$\Delta _{\textit {it}}$| can be viewed as an additional random coefficient on the constant whose sign is fixed. Hence, the set of covariates includes a constant. Again, conditioning on an event where |$D_{-jt}$| takes a large negative value and normalizing the arguments by the norm of |$( x^{(2)}_j,p_j,1)$| yield a function |$\Phi _{(1,1)}$| that is related to the density of |$\eta _{\textit {ijt}}\equiv (\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\Delta _{\textit {it}}+ \epsilon _{\textit {ijt}})$| through the Radon transform in (3.2). Note that the last component of |$\eta _j$| and |$\vartheta _j$| differ only in the bundle effect |$\Delta _{\textit {it}}$|. Hence, if |$\epsilon _{\textit {ijt}}$| is independent of |$\Delta _{\textit {it}}$| conditional on |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|, the distribution of |$\Delta _{\textit {it}}$| can be identified via deconvolution. For this, let |$\Psi _{\epsilon _{j}|(\beta ^{(2)},\alpha )}$| denote the characteristic function of |$\epsilon _{\textit {ijt}}$| conditional on |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|. We summarize these results in the following theorems.
Suppose Assumptions2.1–3.2, 3.4, and Condition3.1hold with|$J=2$|and|$\theta _{\textit {it}}=(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\Delta _{\textit {it}},\epsilon _{i1t}, \epsilon _{i2t})$|. Suppose the conditional distribution of|$\epsilon _{\textit {ijt}}$|given|$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|is identical for|$j=1,2$|.
Then, (a)|$f_{\vartheta _j},f_{\eta _j}$|are nonparametrically identified in Example 2; (b) if, in addition, |$\Delta _{\textit {it}}\perp \epsilon _{\textit {ijt}}|(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$| and |$\Psi _{\epsilon _{j}|( \beta ^{(2)},\alpha )}(t)\ne 0$| for almost all |$t\in \mathbb {R}$| and for some |$j$|, and |$\epsilon _{\textit {ijt}},j=1,2$| are independently distributed (across |$j$|) conditional on |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|, then |$f_\theta$| is nonparametrically identified in Example 2.
The identification of the distribution of the bundle effect requires the characteristic function of |$\epsilon _{\textit {ijt}}$| to have isolated zeros (see, e.g., Devroye, 1989; Carrasco and Florens, 2011). This condition can be satisfied by various distributions, including the Type-I extreme value distribution and normal distribution.
Note that the conditions of Theorem 3.3 do not impose any sign restriction on |$\Delta _{\textit {it}}$|. Hence, the two goods can be substitutes |$(\Delta _{\textit {it}}\lt 0)$| for some individuals and complements (|$\Delta _{\textit {it}}\gt 0$|) for others. This feature, therefore, can be useful for analyzing bundles of goods whose substitution pattern can significantly differ across individuals (e.g., e-books and print books).
We note that the utility specification adopted in the pure characteristics model can also be combined with the bundle choice and multiple units of consumption studied in Section 3.4 in the Online Appendix. The identification of the random coefficients can be achieved using arguments similar to the ones in Section 3.2.
3.4. Multiple units of consumption (Example 1)
We consider settings where multiple units of consumption are allowed. For simplicity, we consider the simplest setup where |$J=2$| and |$Y_{1}\in \lbrace 0,1,2\rbrace$| and |$Y_{2}\in \lbrace 0,1\rbrace .$| The utility from consuming |$y_{1}$| units of product 1 and |$y_{2}$| units of product 2 is specified as follows:
where |$\Delta _{i,(y_{1},y_{2}),t}$| is the additional utility (or disutility) from consuming the particular bundle |$(y_{1},y_{2})$|. This specification allows, e.g., for decreasing marginal utility (with the number of units), as well as interaction effects. We assume that |$\Delta _{(1,0)}=\Delta _{(0,1)}=0$| as |$U_{i1t}^{\ast }$| and |$U_{i2t}^{\ast }$| give the utility from consuming a single unit of each of the two goods. Throughout this example, we assume that |$U_{i,(y_{1},y_{2}),t}^{\ast }$| is concave in |$(y_{1},y_{2})$|. Then, a bundle is chosen if its utility exceeds those of the neighbouring alternatives. For example, bundle (2, 0) is chosen if it is preferred to bundles (1, 0), (1, 1), and (2, 1). That is,
The aggregate structural demand can be obtained as
The observed aggregate demand for the bundles are defined in a similar way for |$S_{l,t}=\varphi _{l}(X_{t},P_{t},\Xi _{t})$|, |$l\in \mathbb {L}$| where |$\mathbb {L} \equiv \lbrace (0,0),(1,0),(0,1),(1,1),(2,0),(2,1)\rbrace .$|
Let |$\tilde{\mathbb {L}}=\lbrace (2,0),(2,1)\rbrace .$| From (3.20), |$\varphi _{(2,0)}$| is increasing in |$D_1$|, but is decreasing in |$D_2$|. Similarly, |$\varphi _{(2,1)}$| is increasing in both |$D_1$| and |$D_2$|. The rest of the argument is similar to Example 2. This ensures Assumption 2.2 in this example, and, by Lemma 2.1, one can then nonparametrically identify subcomponents |$\lbrace \varphi _l,l\in \tilde{\mathbb {L}} \rbrace$| of the demand function |$\varphi$|. One may alternatively take |$\tilde{ \mathbb {L}}=\lbrace (0,0),(0,1)\rbrace$| and use the same line of argument. Note, however, that (1, 0) or (1, 1) cannot be included in |$\tilde{\mathbb {L}}$| as |$\phi _{(1,0)}$| and |$\phi _{(1,1)}$| are not monotonic in one of |$(D_1,D_2)$|. This is because increasing |$D_1$| while fixing |$D_2$|, for example, makes good 1 more attractive and creates both an inflow of individuals who move from (0, 0) to (1, 0) and an outflow of individuals who move from (1, 0) to (2, 0). Hence, the demand for (1, 0) does not necessarily change monotonically.
The nonparametric IV step identifies |$\phi _{l}$| for |$l\in \lbrace (0,0),(0,1),(2,0),(2,1)\rbrace$|. Using them, we may first recover the joint density of some of the random coefficients: |$\theta _{\textit {it}}=(\beta _{\textit {it}}^{(2)}, \alpha _{\textit {it}},\epsilon _{i1t},\epsilon _{i2t}, \Delta _{i,(1,1),t}, \Delta _{i,(2,0),t},\Delta _{i,(2,1),t})^{\prime }.$| We begin with the demand for (0, 0), (0, 1), (2, 0), and (2, 1) given by
Hence, if |$D_{2t}$| has a large support, by taking |$\delta _2$| sufficiently small or sufficiently large, we may define
For each |$l\in \lbrace (0,0),(0,1),(2,0),(2,1)\rbrace$|, define |$\Phi _{l}$| as in (3.5). Arguing as in Example 2, |$\Phi _l$| is then related to the random coefficient densities by
where |$w\equiv -(x^{(2)}_1,p_1,1)/\Vert (x^{(2)}_1,p_1,1)\Vert$| and |$u\equiv \delta _1/\Vert (x^{(2)}_1,p_1,1)\Vert$|. Here, for each |$l$|, |$f_{\vartheta _l}$| is the joint density of a subvector |$\vartheta _{i,l,t}$| of |$\theta _{\textit {it}}$|, which is given by16
The joint density of |$\theta _{\textit {it}}$| is identified by making the following assumption.
(i) Assume that |$(\Delta _{i,(1,1),t},\Delta _{i,(2,0),t}, \Delta _{i,(2,1),t})\perp \epsilon _{\textit {ijt}}|(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$| and |$\Psi _{\epsilon _{j}|(\beta ^{(2)},\alpha )}(t)\ne 0$| for almost all |$t\in \mathbb {R}$| and for some |$j\in \lbrace 1,2\rbrace$|; (ii) |$\epsilon _{\textit {ijt}},j=1,2$| are independently and identically distributed (across |$j$|) conditional on |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$|; (iii) |$(\Delta _{i,(1,1),t}, \Delta _{i,(2,0),t},\Delta _{i,(2,1),t})$| are independent of each other conditional on |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}})$| and |$\Psi _{\Delta _{(1,1)}|( \beta ^{(2)},\alpha )}(t)\ne 0$| for almost all |$t\in \mathbb {R}$|.
Assumption 3.5 (iii) means that, relative to the benchmark utility given as an index function of |$(X^{(2)}_t,P_t,D_t)$|, the additional utilities from the bundles are independent of each other. Assumption 3.5 (iii) also adds a regularity condition for recovering the distribution of |$\Delta _{i,(2,1),t}$| from those of |$\Delta _{i,(2,1),t}-\Delta _{i,(1,1),t}$| and |$\Delta _{i,(1,1),t}$| through deconvolution.
Identification of the joint density |$f_{\theta }$| allows one to recover the demand for the middle alternative: (1, 0), which remained unidentified in our analysis in the nonparametric IV step. To see this, we note that the demand for this bundle is given by
Since the previously unknown density |$f_{\theta }$| is identified, this demand function is identified. This and |$\phi _{(1,1)}=1-\sum _{l\in \mathbb {L}~\setminus ~\lbrace (1,1)\rbrace }\phi _l$| further imply that all components of |$\phi$| are now identified. We summarize these results below as a theorem.17
Suppose |$U_{(y_1,y_2),t}$| is concave in |$(y_1,y_2)$|. Furthermore, we set |$\theta _{\textit {it}}=(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{i1t},\epsilon _{i2t}, \Delta _{i,(1,1),t},\Delta _{i,(2,0),t},\Delta _{i,(2,1),t})$|. Suppose Condition 3.1 and Assumptions 2.1, 2.3–3.1, and 3.4 hold with |$J=2$| and |$\theta _{\textit {it}}=(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{i1t},\epsilon _{i2t}, \Delta _{i,(1,1),t},\Delta _{i,(2,0),t},\Delta _{i,(2,1),t})$|. Suppose that |$(X_{1t},P_{1t},D_{1t})$| has a full support. Then, (a) all densities |$f_{\vartheta _l}$| for |$l\in$||$\lbrace (0,0),(0,1),(2,0),(2,1)\rbrace$| are nonparametrically identified in Example 1; (b) suppose further that Assumption 3.5 holds. Then, |$f_{\theta }$| is identified in Example 1. Furthermore, all components of the structural demand |$\phi$| are identified.
4. CONCLUSION
This paper is concerned with the nonparametric identification of models of market demand. It provides a general framework that nests several important models, including the workhorse BLP model, and provides conditions under which these models are point identified. Important conclusions include that the sufficient assumption to recover various objects differ; in particular, it is easier to identify demand elasticities and more difficult to identify the individual-specific random coefficient densities. Moreover, the data requirements are also shown to vary with the model considered. The identification analysis is constructive, extends the classical nonparametric BLP identification as analysed in BH to other models, and opens up the way for future research on sample counterpart estimation.
FUNDING
Hiroaki Kaido gratefully acknowledges funding by the NSF grants SES-1824344 and SES-2018498.
Footnotes
There are extensions of the BLP framework that allow for the use of Microdata, see Berry et al. (2004, MicroBLP). In this paper, we focus on the aggregate demand version of BLP, and leave an analogous work to MicroBLP for future research.
Note that simultaneous changes in product characteristics and price are allowed. Hence, one can investigate how much price change is required to compensate for a change (e.g., downgrading of a feature) in one of the product characteristics to let a certain fraction of individuals receive a nonnegative utility change, i.e., |$P(\Delta U_{\textit {ijt}}\ge 0)\ge \tau$| for some pre-specified |$\tau \in [0,1]$|, where |$\Delta U_{\textit {ijt}}$| denotes the utility change.
In Dunker et al. (2014b), we also use the insights obtained to propose a parametric estimator for models where there had not been an estimator before.
See Berry and Pakes (2007) for more details.
|$V_{jk}$| is a random variable that varies across individuals and markets, and hence should be denoted as |$V_{ijkt}$| in principle. For conciseness, we drop subscripts |$i$| and |$t$| below.
More precisely, the Radon transform |$\mathcal {R}[f_{\vartheta _{j}}](w,u)$| gives |$f_{\vartheta _{j}}$|’s integral along each hyperplane |$P_{w,u}=\lbrace v\in \mathbb {R}^{d_{\theta }}:v^{\prime }w=u\rbrace$| defined by the angle|$w=(x_{j}^{(2)},p_{j},1)/\Vert (x_{j}^{(2)},p_{j},1)\Vert$| and offset|$u=\delta _{j}/\Vert (x_{j}^{(2)},p_{j},1)\Vert$|. For recovering |$f_{\vartheta _{j}}$| from its Radon transform, one needs exogenous variations in both. Our proof uses the fact that varying |$w$| over the hemisphere |$\mathbb {H}_{+}\equiv \lbrace w=(w_{1},w_{2},\cdots ,w_{d_{\vartheta _{j}}})\in \mathbb {S}^{d_{\vartheta _{j}}-1}:w_{d_{\vartheta _{j}}}\ge 0\rbrace$| and |$u$| over |$\mathbb {R}$| suffices to recover |$f_{\vartheta _{j}}$|.
Note however that this marginalized demand still depends on the joint distribution of the entire random coefficient vector.
Note that the additional independence (or i.i.d.) assumptions on |$(\epsilon _{i1t},\cdots ,\epsilon _{\textit {iJt}})$| is not needed in the pure characteristics model.
It is also possible to analyse the case without the tastes for products. We refer to Dunker et al. (2014b), an earlier version of the paper.
The additional structure can potentially be tested. In Example 2, one may identify the demand for bundles (1, 0) and (1, 1) using the inversion described below under the hypothesis that equation (2.5) holds. Further, treating (1, 0), (0, 1), and (1, 1) as three separate goods (and (0, 0) as an outside good) and applying the inversion of Berry et al. (2013), one may identify the demand for bundles (1, 0) and (1, 1) without imposing (2.5). The specification can then be tested by comparing the demand functions obtained from these distinct inversions. We are indebted to Phil Haile for this point.
Since the bundle effect |$\Delta _{\textit {it}}$| does not appear in (3.16), one may only identify the joint density of the subvector |$(\beta ^{(2)}_{\textit {it}},\alpha _{\textit {it}},\epsilon _{i1t})$| from the demand for bundle (0, 0).
Alternative assumptions can be made to identify the joint density of different components of the random coefficient vector. For example, a large support assumption on |$D_{1t}$| would allow one to recover the joint density of |$(\beta _{\textit {it}}^{(2)},\alpha _{\textit {it}},\epsilon _{i2t}+\Delta _{i,(2,1),t}- \Delta _{i,(2,0),t})$| from the demand for bundle (2, 0).
For simplicity, we only consider the case where |$\delta _{2}\rightarrow -\infty$| or |$\infty$| in (3.22)–(3.23). This requires a full support condition on |$D_{1t}$|. It is possible to replace this assumption with an analogue of Assumption 3.3 by also considering the case where |$\delta _{1}\rightarrow -\infty$| or |$\infty$| and imposing an additional restriction on the distribution of |$(\epsilon _{i1t},\epsilon _{i2t},\Delta _{i,(1,1),t},~\Delta _{i,(2,0),t},\Delta _{i,(2,1),t})$|.
REFERENCES
Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher’s website:
Online Appendix
Replication Package
Co-editor Victor Chernozhukov handled this manuscript.
APPENDIX A: PROOFS
In this Appendix we present the proofs of Lemma 2.1 and Theorem 3.1. All other proofs are in the Online Appendix.
It remains to show that |$\mathcal {F}f_\theta$| is known in some open neighbourhood. By the Fourier slice theorem for the Radon transform |$(\mathcal {F}f_\theta )(w\eta ) = \mathcal {F}_1(\mathcal {R}f_\theta [w,\cdot ])(\eta )$|. Here |$\mathcal {F}_1$| denotes the one-dimensional Fourier transform that acts on the free variable denoted by ‘|$~\cdot ~$|’. Note that |$\mathcal {R}f_\theta [w,\delta ] = 0$| if |$w \in \mathcal {U}$|, but |$(w,\delta ) \notin \big \lbrace (w,w^{\prime }t)|w\in \mathcal {U}, t \in \mathrm{supp}\, (\theta )\big \rbrace$|. Thus, if |$\mathcal {R}f_\theta [w,\delta ]$| is known for all |$(w,\delta ) \in \big \lbrace (w,w^{\prime }t)|w\in \mathcal {U}, t \in \mathrm{supp}\, (\theta )\big \rbrace$|, it is known for all |$w \in \mathcal {U}$| and all |$\delta \in \mathbb {R}$|. It follows that |$\mathcal {F}f_\theta$| is known on some open neighbourhood. This identifies |$f_\theta$|.
(i) First, under the linear random coefficient specification, the connected substitutes assumption in Berry et al. (2013) is satisfied. By theorem 3.1 in Berry et al. (2013), Assumption 2.2 is satisfied. Then, by Assumptions 2.1–2.3 and Lemma 2.1, |$\psi$| is identified. Further, the aggregate demand |$\phi$| is identified by (2.10) and the identity |$\phi _0=1-\sum _{j=1}^J\phi _j$|.