-
PDF
- Split View
-
Views
-
Cite
Cite
Calogero Carletto, Better data, higher impact: improving agricultural data systems for societal change, European Review of Agricultural Economics, Volume 48, Issue 4, September 2021, Pages 719–740, https://doi.org/10.1093/erae/jbab030
- Share Icon Share
Abstract
The agricultural sector is undergoing a period of rapid transformation, driven by the powerful and interconnected impacts of climate change, demographic transitions and uneven economic growth around the world. For governments and the international community to navigate this period of upheaval to protect vulnerable populations and ensure positive societal change will require a similar degree of transformation within agricultural data systems. While technological innovation has resulted in substantive improvements in the availability, timeliness and overall quality of agricultural data, many technical and institutional challenges remain. This paper reviews recent developments in the agricultural data landscape, highlights existing constraints to further progress and argues for agricultural economists to take responsibility for building agricultural data systems equipped to respond to the diverse needs of a changing world.
1. Introduction
The world is experiencing a series of massive, interconnected changes of unprecedented proportion. Accelerating climate change is resulting in ever more frequent and extreme weather events. Major demographic shifts are occurring, with populations increasingly moving from rural to urban spaces and across borders. Meanwhile, rapid economic transformation has led to uneven impacts across countries, population groups and sectors of the economy. The agricultural sector is highly vulnerable to these significant changes, particularly the 500 million family farms who are most likely to bear the brunt of their impacts (Lowder et al., 2016; FAO, 2014). The stressors created by these large-scale changes are adding undue pressure to the fragile livelihoods of poor farmers and other vulnerable groups across the globe, as well as the ecosystems upon which they rely. One such stressor is the scale of rapid transformation in the food systems where farm households participate as producers, consumers, processors and traders. Furthermore, gaps in research on food systems limit our ability to quantify and link decisions related to production, consumption, processing and transportation to externalities on the environment, health and livelihoods more broadly. This shortcoming is partially attributable to the lack of high-quality data necessary for studying the evolving food system paradigm in an integrated and interdisciplinary manner.
The global community cannot meet its commitments to the 2030 Sustainable Development agenda without addressing the plights of the agricultural sector, which remains the ultimate reservoir of employment and economic growth for many low- and middle-income countries (Ivanic and Martin, 2018; Christiaensen and Martin, 2018). This will require global and national institutions, as well as individual actors such as academic researchers, NGOs and private citizens, to work together on establishing and maintaining data systems that can inform the design of interventions and policies needed to address societal challenges.
Agricultural data quality, including issues related to measurement error, coverage or relevance, is increasingly central to this discussion, as these problems significantly constrain the credibility and impact of agricultural research. Attention to data quality issues has too often been inconsistent, with demonstrably more effort dedicated to the use of ex-post adjustments based on econometric solutions than to investing in better ex-ante choices at the design stage. Constraints to external validity are equally important to address, as they limit the potential impact of research. Governments need scalable solutions to questions that cannot be addressed by narrowly focused studies. They also need solutions that address the relative effectiveness of one intervention relative to others based on cost considerations. Unfortunately, incentives for individual researchers are not always aligned with achieving these goals. It is often difficult for an individual researcher to realise the larger potential for their work, given skewed incentives and missing information. These systemic imperfections in data systems are unlikely to be resolved without adequately incentivising coordination and collaboration towards achieving net gains for all involved.
Furthermore, technological innovation and the allure of new data sources have diverted attention away from strengthening the foundational data critical for addressing policy questions as well as for validating and calibrating new modes of data collection. While the recent coronavirus disease 2019 (COVID-19) pandemic has exposed major weaknesses in data systems, it has also highlighted the adaptability and willingness of the data community to embrace change and innovate at an accelerated pace (UNSD and World Bank, 2020). This underscores the need to better exploit complementarities between traditional and alternative data sources and methods, which will require both technical solutions as well as creative institutional arrangements that foster collaboration and value addition.
This paper aims to address a number of key issues towards increasing the contribution of the agricultural economics profession to positive and lasting societal change. This includes addressing the role of agricultural economists in building better data systems and bridging the disconnect between agricultural data, agricultural research and agricultural policy impact. Fundamental to this process is incentivising individual researchers to create data public goods and data systems that are adaptable to the multiple purposes of a diverse and complex world.
Despite recent developments driven by rapid innovation and technological change in data production, we continue to lack the data needed to support vibrant and informed policy dialogues in the agricultural sector. Agricultural statistics remain inadequate in terms of their availability, quality and relevance, particularly in light of new and emerging demands. Agricultural economists could play a far larger role in shaping systems of agricultural data that could in turn vastly increase the impact and policy relevance of their work. Data systems are not predetermined; individual researchers and data producers have the power to influence and shape them through their own choices. Furthermore, advances in technology and new data sources offer a unique opportunity for the agricultural economics profession to harness the full power of data for positive societal change, particularly if existing disincentives are removed and linkages between traditional and new data sources are strengthened. This will require transforming the way we generate and improve agricultural data, select research topics and disseminate and use data.
In the remainder of the paper, the constraints and limitations of existing agricultural data are detailed, both in terms of availability and quality. Section 3 presents recent developments on measuring key data constructs as well as the opportunities offered by technological innovation and new data sources. Section 4 explores how the agricultural economics profession can address key limitations and take advantage of new opportunities. Finally, the paper concludes with suggestions for moving the data agenda forward towards fostering positive change. Throughout, the paper focuses on low-income countries, where agriculture is most important, agricultural data are most lacking and improvements are most likely to result in greater impact.
2. What’s holding us back?
Increasing the impact of agricultural economics begins with improving the evidence base for the research needed to inform policies in the agricultural sector and beyond. The availability of accurate, timely and relevant data has often been the primary constraint to sound research and ultimately to effective policymaking in the sector. However, increasing the amount or even the quality of data alone will not intrinsically lead to greater impact unless the data are used, interpreted and mainstreamed into country policy dialogues. Equally detriment has also been our collective failure to properly quantify and communicate the value of data with simple and accessible metrics of impact. Lack of analytical capacity among key stakeholders, particularly in low-income countries, has been another crippling weakness in agricultural data systems. Improving the supply of data must go hand in hand with increasing local demand as well as capacity for data use. Equally important is proper quantification and communication of the value of data through simple and accessible metrics of impact. By demonstrating the value of high quality data to sound agricultural policy research, agricultural economists can serve as catalysts for changes at both the technical and institutional levels.
Recent years have brought about a seeming abundance of new sources of data, including satellite data, call detail records, data from sensors, citizen-generated data, social media data and more. Many of these data sources offer significant advantages to more traditional forms of data collection, including greater ease of data collection, higher levels of spatial and temporal granularity, timeliness and perhaps above all, a tremendous expansion in the scope of what can be measured. Unfortunately, these sources are also often accompanied by a host of complications that interfere with their utility, including selectivity and coverage bias as well as unreliable or unvalidated methodologies. Additionally, they are often limited in their scope for widespread adoption in low-capacity contexts. Integrating these types of data sources with traditional data sources such as household surveys allows for data producers to fully exploit the potential of both types of data, increasing the scope, timeliness and frequency of household survey data while simultaneously validating and calibrating newer data sources.
A number of constraints, both technical and institutional, can be identified. Due to the complexity of agricultural processes and of smallholder agriculture in particular—extending over long periods of time with significant seasonal and inter-annual variation, lacking proper record-keeping of non-salient, frequent events and so on—agricultural data are often plagued by poor quality and low credibility. The most common methods for collecting agricultural data, whether based on self-reported recall over long time periods or on expert opinions, suffer from large biases and measurement errors. Empirical evidence of these systematic biases abounds in recent literature, and new technologies and improved tools are increasingly being used to mitigate some of these shortcomings (Abay et al., 2019; Dillon et al., 2019; Desiere and Jolliffe, 2018; Gourlay, Kilic and Lobell, 2019; Kosmowski et al., 2019; Bevis and Barrett, 2019; Carletto, Savastano and Zezza, 2013; Carletto, Gourlay and Winters, 2015, 2017b; Kilic et al., 2021; Gaddis et al., 2019).
The limited integration and interoperability of agricultural data has contributed to making today’s agricultural data less relevant to tomorrow’s policy challenges. While countries need productivity data for major crops on a regular basis and at the requisite spatial resolution, it is equally important to have the right data to explain differences across farming systems and countries, towards understanding the role of agriculture in the broader development context and in the light of new and emerging challenges. Improving data integration and interoperability across data sources would greatly contribute to overcoming the limitations of individual data sources in achieving the temporal and spatial resolution needed for many applications. Linking censuses and surveys to create small area estimations is a well-known example of successful interoperability of data sources that can be achieved with few ex-ante investments at the design stage. Another example of the power of interoperability is the use of mixed-mode data collection, where a high-frequency phone survey is embedded into a less frequent but more detailed face-to-face survey, with the latter serving as both a representative sampling frame and a source of the information needed to adjust for potential non-response and under-coverage biases of phone surveys. Addressing this lack of integration involves both technical and institutional factors.
The paucity of individual- and gender-disaggregated data has also been a limiting factor in agricultural research. Progress towards achieving many of the SDG indicators rests on our ability to collect disaggregated data on poverty, employment and agriculture, inter alia. Similar, agricultural research must rely on data from both male and female farmers to inform policies that address gender differences in access and performance. The common practice of interviewing the head of the household or relying on proxy respondents has been shown to result in significant bias (Kilic and Moylan, 2016; United Nations, 2019; Kilic et al., 2020a; Kilic, Moylan and Koolwal, 2020b). While some progress has been made in recent years, it has been limited by the higher cost and logistical complexities of collecting individual-level data as well as by the lack of tested methodologies.
Given changes in weather patterns and the predominance of rain-fed agriculture in low-income countries, timely, accurate and high-resolution weather data are critical for smallholder farming. Lack of such data has been one of the constraining factors to widespread adoption of index-based weather insurance schemes. Climate and weather data come from a number of sources, including gridded data, weather stations and satellite data (Dell, Jones and Olken, 2014; Auffhammer, Hsiang and Schlenker, 2013). Data from weather stations in low- and middle- income countries is limited at best, if not entirely unavailable. Furthermore, weather station data lack the level of spatial variation needed for farm-level analyses, particularly within low-income countries (Di Falco, Veronesi and Yesuf, 2011). Given that satellite-based weather data may also lack the requisite spatial and temporal granularity, obtaining sufficiently comprehensive weather data is often the result of complex interpolation using multiple data sources (Macours, Premand and Vakis, 2012; Di Falco, Veronesi and Yesuf, 2011). While the use of in-situ sensors at the local, farm or plot levels still presents logistical and cost challenges, sensors can be used in combination with data from other sources to obtain a more accurate picture of weather trends at the local level.
Restricted access and dissemination have further contributed to the lack of integration and interoperability of agricultural data, as well as to its limited use and low overall quality. While many countries in low- and middle-income countries lack functioning routine data systems and do not conduct regular agricultural censuses and surveys, data are seldom made available to users even when they exist. The need to maintain confidentiality and ensure the privacy of respondents is often cited as the reason for restricting access, as protected under the statistical laws of the countries. However, evidence in other sectors and for equally sensitive data sources suggest otherwise. For instance, data from population and housing censuses are now more widely accessible through portals such as the Integrated Public Use Microdata Series (IPUMS) from the University of Minnesota (available at http://ipums.org). Similarly, anonymised unit-record data from household surveys are routinely documented and disseminated through platforms such as the Microdata Library (available at http://microdata.worldbank.org). Even when administrative data are made available in aggregate form, lack of digitisation results in processing errors and delayed release. Finally, the common practice of having agricultural officers collect data on indicators against which their performance is measured creates perverse incentives affecting the accuracy and credibility of agricultural data.
Shocks such as the 2007–2008 food price crisis, increasingly frequent extreme weather events, and the most recent COVID-19 pandemic have revealed that existing data systems are insufficient to address the data requirements needed to guide effective and timely policy responses in the agricultural sector. The mobility and social distancing restrictions imposed by the pandemic also exposed some of the weaknesses of face-to-face data collection, the mainstay of current agricultural data systems. While most countries responded by rapidly shifting to alternative modes of data collection such as phone and web surveys, which also enable leaner and more frequent data collection, these changes also underscored the limitations of such solutions, due to the short length of the instruments as well as high rates of non-response and under-coverage resulting from inadequate sampling frames. Mode effects are also likely to affect the quality and comparability of phone and web surveys.
Understanding the role of agriculture for inclusive economic transformation inherently implies two critical features for any agricultural data system. First, inclusivity requires greater data disaggregation along the lines of income, gender and age, among others. Second, transformation requires longitudinal data constructed by following the same households and individuals, often over an extended period of time. Understanding other societal challenges such as climate change and demographic transition also requires data with these features. Despite increases in the availability of longitudinal surveys with a focus on agriculture that collect individual-level data on many relevant constructs, many countries still lack established data systems capable of collecting this kind of detailed data over time.
Finally, much of the existing data and evidence on the agricultural sector lack the needed scale and replicability. The proliferation of small-scale studies and impact evaluations of pilot interventions, while obviously fundamental for the advancement of knowledge, often lack the external validity required to achieve scale at the policy level. Agricultural economists and the broader academic community could fundamentally transform this state of affairs, if we collectively subscribe to a solution that balances individual and public good benefits.
3. What’s propelling us forward?1
Over the past decade, the agricultural data landscape has been strengthened by a number of recent data initiatives and a renewed attention to measurement and data quality issues. The 2008 World Development Report (WDR) on ‘Agriculture for Development’, the first WDR on agriculture in decades, coincided with a major food price crisis and a global recession, aiming a spotlight on the absence of the agricultural data needed to guide the recovery. To address this gap, several like-minded development partners and donors pledged significant funds to improving agricultural data systems through a number of initiatives such as the Global Strategy to Improve Agricultural and Rural Statistics (see http://gsars.org/en/), the Living Standards Measurement Study—Integrated Surveys on Agriculture (see https://www.worldbank.org/en/programs/lsms/initiatives/lsms-isa) and most recently, the 50 × 2,030 Data Smart Agriculture program (see https://www.50x2030.org/). For the first time since its inception, the forthcoming edition of the Handbook of Agricultural Economics includes a chapter dedicated to data, and it is becoming less of a rarity to see articles in top-tier journals addressing measurement issues in agricultural data (Abay et al., 2019; Bevis and Barrett, 2019; Gourlay, Kilic and Lobell, 2019; Desiere and Jolliffe, 2018; Carletto, Savastano and Zezza, 2013). 2021 also marks the release of the first-ever WDR dedicated to data and its potential to contribute to better lives (World Bank, 2021).
Furthermore, fast-evolving technologies in data collection and the emergence of new data sources and modes of data collection, even in the poorest countries, are contributing to dramatic improvements in both the availability, timeliness, frequency and quality of agricultural data. Due to decreases in cost, the use of remote sensing, portable sensors and Global Positioning System (GPS) devices is becoming standard in the production of agricultural data, even in low-capacity, resource-constrained contexts. Machine learning algorithms and artificial intelligence applications offer significant potential for transforming the collection of agricultural data. Nonetheless, the complexity of collecting accurate information on smallholder agriculture persists, due to a number of intrinsic features such as the small size of farms, the remoteness of plots, the common practice of simultaneously cultivating multiple crops several times a year, thick canopy cover and poorly demarcated, irregularly shaped boundaries, making it difficult to measure even the most foundational agricultural data constructs such as crop yields or land area (Carletto et al., 2017b; Gourlay, Kilic and Lobell, 2019).
Finally, the broader-based application of advanced imputation techniques, combined with the collection of more targeted and effective information to improve model prediction, is creating new opportunities for using technologies and more objective measurements in combination with less accurate self-reported measurements to improve estimates while limiting costs. All of the above factors point to both the opportunities and limitations of newly available technologies and methods, highlighting the disruptive role that agricultural researchers and the broader academic community must play in advancing and diffusing improved standards. The remainder of this section is devoted to an overview of the major areas of technological and methodological improvements in agricultural data collection efforts.
3.1. From guesstimates to GPStimates to remote senstimates
The benchmark for land area measurement in agricultural surveys has traditionally been the compass-and-rope method. However, due to complex logistics and cost considerations, such methods have proven unsuitable for use in large-scale survey data collection. Consequently, most land area measurement efforts in surveys on smallholder agriculture have primarily relied on farmers’ self-reporting, a method notoriously fraught with measurement error. The past decade has seen a resurgence of studies aimed at assessing the impact of error in such self-reported measurements, with pervasive implications for many of the key constructs and theories in agricultural economics. These studies have incorporated the use of GPS devices to consistently document the presence of systematic error in farmer’s self-reporting, including large heaping effects—i.e. the rounding of reported measurement around discrete numbers—and edge effects (see Carletto, Savastano and Zezza, 2013; Carletto, Gourlay and Winters, 2015, 2017b; Gourlay, Kilic and Lobell, 2019; Bevis and Barrett, 2020; Desiere and Jolliffe, 2018; Abay et al., 2019). As a result, the use of affordable GPS devices in agricultural surveys is now becoming the mainstream.
GPS measurement is not entirely free from error, due in part to the availability and positioning of satellites, and still requires in-person plot visits, which may be costly for distant plots. To address issues of accuracy, studies have compared the precision of GPS measurements with the ‘gold standard’ compass-and-rope measures taken in highly controlled field experiments, ultimately confirming the preferability of relying on GPS measurement (Carletto et al., 2016, 2017b). The high costs associated with physically visiting distant plots may also be responsible for missing GPS measurements. Multiple imputation methods have been used with some success to estimate the missing data, raising the empirical question of whether costs could be reduced through more systematic subsampling (Kilic et al., 2017a). To this end, imputation methods have been applied to assess potential gains in accuracy relative to farmers’ self-reporting, as well as potential reductions in cost relative to GPS measurements on the full sample. While initial results are promising for using a combination of self-reported measurement on the full sample with more objective GPS measurement on smaller subsamples stratified by distance, more research is needed (Kilic, Djima and Carletto, 2017b).
Another approach currently being tested is the integration of maps and GPS devices into computer-assisted personal interviewing applications to delineate plot boundaries based on farmers’ knowledge (Masuda et al., 2020; Dillon and Rao, 2018). Finally, earth observation data are increasingly being used in smallholder agriculture in low-income countries to measure land area (Azzari et al., forthcoming; Lobell et al., 2019; Lobell, Deines and Tommaso, 2020; Gourlay, Kilic and Lobell, 2019). However, due to the complexity of common farming practices for smallholder agriculture, including the high level of inter-cropping, the solution has been elusive, often requiring expensive high-resolution maps that are at present out of reach for most applications at scale in low-income countries (Gourlay, Kilic and Lobell, 2017). Overall, this continues to be a fast-moving research agenda, where agricultural economists have made valuable contributions by embedding methodological research experiments into their studies.
3.2. Blue skies, muddy fields
The potential power of using remote sensing data to measure agricultural yields cannot be overemphasised. To this end, many researchers and consortia are feverishly working on devising improved algorithms to measure agricultural yields from space (Lobell et al., 2019a; Gourlay, Kilic and Lobell, 2019). Yield estimations based on remote sensing data and spatial modelling provide a unique avenue for countries and the global community to estimate and monitor crop yields at scale, at the requisite periodicity and timeliness. However, improving the precision of spatial algorithms based on earth observation data requires high-quality ground-truthing data for proper calibration. The limited availability of ground data on both plot boundaries and crop production, based on accurate, objective measurements, has been the constraining factor for accelerating progress in remote sensing research in the agricultural space. The main challenge has been to move from conducting small-scale experiments—which have been essential to demonstrate the potential of properly designed field data collection for validating yield estimation from space—to building global layers of ground data that can be used by the broader community for multiple purposes. In a recent publication, Azzari et al. (forthcoming) lay out the basis for such scale-up by providing a clear protocol for collecting ground-truthing data as part of planned survey operations and using them for model calibration purposes.
3.3. Sensors for all
The ubiquity of inexpensive sensors for a wide range of applications is revolutionising how we can collect data and address measurement issues that plague many agricultural constructs. For example, the diffusion of affordable, portable sensors has the potential to change the landscape for soil fertility measurement, enabling the production of reliable soil quality data at scale and in an integrated manner (Carletto et al., 2017a; Gourlay, Kilic and Lobell, 2017). Affordable portable spectrometers, when integrated with large-scale surveys or as a component of smaller methodological experiments, provide a useful validation tool for calibrating remote sensing data at the regional or continental level, as in the case of the Africa Soil Information Service (Hengl et al., 2015). The use of smartphone applications for measuring soil quality is also gradually taking hold, as in the case of the LandPKS project, which offers tools for farmers to collect and interpret data on features of their land such as soil type, soil health and vegetation cover, among others (Herrick et al., 2013).
Spectrometers have also been used for varietal identification. In a study in Ethiopia, Kosmowski and Worku (2018) show promising results for inexpensive spectral analysis to correctly identify the variety of several crops. Another important innovation in data collection using sensors relates to the use of wearable accelerometers for improving the accuracy in the measurement of labour inputs, critical for estimating SDG 2.3.2 on labour productivity in agriculture (Akogun et al., 2020). Data collected through wearable accelerometers can validate farmers’ self-reported number of hours worked as well as the allocation of time across activities, both variables that are notoriously difficult to measure.
3.4. Gender-sensitive agricultural policies need individual-level data
To fully exploit the potential of agricultural productivity growth for reducing poverty, promoting socioeconomic development and increasing economic growth, economists must understand gender differentials in smallholder agriculture and the opportunities offered by improving women’s productivity in farming. Data limitations in this area include issues related to availability as well as quality. Estimations of gender differentials in agricultural productivity have been shown to vary both across and within countries (World Bank and One Campaign, 2014; Kilic, Winters and Carletto, 2015). Long-standing practices to collect agricultural data from ‘the most knowledgeable respondents’ have too often resulted in asking the head of the household, who usually tends to be a male. Even when questions at the crop or plot level are addressed to the plot owner/manager, the interview setting is often such that respondents are easily influenced or may report inaccurate information. Respondent bias in gender-disaggregated data is widespread and well-documented. Methods to overcome these methodological challenges are being addressed in recent literature, providing rigorous empirical evidence of measurement error and proposing new methodologies for the collection of individual-level data in surveys (Doss and Kieran, 2014; Doss et al., 2015; Doss, Kieran and Kilic, 2020; Kilic et al., 2020a; Kilic, Moylan and Koolwal, 2020b; United Nations, 2019). A recent World Bank project called Living Standards Measurement Study- Plus (LSMS-Plus) is addressing shortcomings in gender data by applying validated methodologies for the collection of individual-disaggregated data on asset ownership and labour in six countries (see https://www.worldbank.org/en/programs/lsms/initiatives/lsms-plus). LSMS-Plus evidence to date suggests that collecting individual-level data in household surveys is both feasible and desirable, despite the higher cost. However, achieving scale beyond the initial six countries and mainstreaming the collection of individual-level data in future surveys will require a collective and coordinated effort by key stakeholders. It will also necessitate generating sufficient demand from countries by demonstrating the full value of individual-disaggregated data to inform gender-sensitive policies.
3.5. DNA don’t lie
Accurately measuring the adoption of improved seed varieties continues to be challenging, as farmers regularly misclassify their seeds. This has clear repercussions for the application of other inputs and ultimately for productivity. Both farmers’ self-reporting and expert opinions have been shown to be highly inaccurate (Yigezu et al., 2019; Maredia et al., 2016; Wossen et al., 2019). The gold standard measure of varietal identification is the use of DNA fingerprinting, which has also proven to be useful in identifying the presence of counterfeit varieties in seed markets (Kosmowski et al., 2019; Wossen et al., 2019; Kletzschmar et al., 2018). While this method is virtually infallible for accurate seed variety identification, the cost and logistics of conducting such tests at scale are prohibitive, rendering it challenging to replicate in large-scale survey operations. One recent exception is a study by the Standing Panel on Impact Assessment of the CGIAR, in collaboration with the World Bank Living Standards Measurement Study team and the Central Statistical Agency of Ethiopia, to conduct DNA fingerprinting for several crops on a large sample of farmers (Kosmowski et al., 2019).
3.6. Transformation needs multi-purpose panel data
Addressing global challenges such as climate change and/or informing policies for achieving inclusive economic transformation requires longitudinal data, ideally at the individual level. The past decade has seen major investments in the collection of panel data, particularly in sub-Saharan Africa, where approximately a dozen countries have established systems of collecting individual-level longitudinal data with a strong focus on agriculture. Programmes such as the Living Standards Measurement Study—Integrated Surveys on Agriculture (LSMS-ISA) in several countries in sub-Saharan Africa, the Ghana longitudinal study by Yale and the Institute of Statistical, Social and Economic Research, the Tegemeo panel survey in Kenya, the National Income Dynamics Study (NIDS) in South Africa, and the MSU-supported surveys in Zambia and Mozambique have clearly demonstrated both the utility of such surveys and the high levels of demand for this type of data. For instance, in the past several years, the documented and anonymised unit-record LSMS-ISA datasets from six countries have been downloaded more than sixty-seven thousand times. At the time of this writing, approximately two-thousand research papers and reports have been published using the LSMS-ISA data. Looking to the future, it will be important to both sustain existing panel efforts and to expand them to more countries. This would ideally involve greater coordination across initiatives as well as the inclusion of more academic researchers, particularly those from low-income countries, in both the design and analysis stages.
As we frame the policy conversation on agricultural development around more systemic concepts such as inclusive food systems or resilient recovery, the design of these panels must address these new data demands. Measuring the impact of (and responses to) the ever more frequent and extreme weather events disproportionately affecting agriculture also requires longitudinal data systems (McCarthy et al., 2018, 2021). Given their unpredictability, extreme weather events are best captured by establishing and maintaining long-term panels as an integral component of national data systems. These data system design changes must also include the broader use of integrated, multi-purpose instruments such as LSMS surveys, which collect rich multi-topic data on households as producers, consumers, processors and traders of agricultural products. Surveys must also capture information on highly diversified household livelihood strategies, where agriculture may represent just one among multiple sources of income.
Furthermore, as seen during the COVID-19 pandemic, high-frequency panel surveys by phone have been crucial for tracking the socioeconomic and health impacts of the pandemic to inform national responses (Amankwah and Gourlay, 2021a, 2021b; Furbush et al., 2021; Khamis, 2021; Egger et al., 2021). By providing both a representative frame for high-frequency phone surveys in addition to detailed pre-pandemic baseline information, surveys like the LSMS-ISA have been instrumental for maximising the usability and representativeness of phone surveys. Finally, systematic georeferencing of both dwelling and plots in surveys allows for greater interoperability with time series of earth observation data, expanding the analytical frontier of both types of data.
3.7. Dial A for agriculture and B for bias
The diffusion of mobile phones in low-income countries provides a unique opportunity to collect data from farmers sustainably and at scale. As documented by Akers (2011) in her paper Dial A for Agriculture, fast-growing telecommunications infrastructure and improved phone coverage can enhance the provision of agricultural extension services and foster the adoption of productivity-enhancing technologies. While Akers focuses on the use of mobile phones to support more effective extension services, phone surveys are increasingly being used to collect information at high frequency across a broad range of topics, due to the relatively low cost and expanding coverage of mobile phones, even in low-income countries. Phone surveys have also proven to be useful for the collection of agricultural data on non-salient and repeated events, such as labour inputs or the harvesting of continuous crops such as cassava, for which the use of long recalls is highly inaccurate (Kilic et al., 2021; Arthi et al., 2017; Beegle, Carletto and Himelein, 2012).
Furthermore, as mentioned above, phone surveys have been widely used to address data demands emerging from the pandemic, given the halting of face-to-face surveys due to the mobility and social distancing restrictions imposed by most countries. Nonetheless, it must be acknowledged that phone surveys suffer from several limitations, chief among them the potential biases associated with high levels of attrition and under-coverage (Kastelic et al., 2020; Brubaker, Kilic and Wollburg, 2021). Mode effects may be pervasive and can limit comparability with face-to-face surveys and other data sources (De Leeuw, 2004; De Leeuw and Van der Zouwen, 1988; Lyberg and Kasprzyk, 2004). More research is needed to fully exploit the potential of phone surveys, preferably as part of mixed-mode data systems.
3.8. Imputation needs the right data
The recognition of widespread measurement error in self-reported agricultural data has fuelled much of the recent innovation in data collection methods. However, researchers still face trade-offs between improving accuracy and reducing cost. Due to the availability of new technologies, agricultural data collection increasingly relies on more objective and real-time measurement methods, through the use of GPS devices, sensors and mobile phones. While these technologies are becoming increasingly affordable, using them for large-scale data collection operations nonetheless remains infeasible in poorer, low-capacity countries. To this end, combining the collection of more inaccurate self-reported measurements with the use of direct measurement on properly designed subsamples and the application of advanced imputation methods can result in significant gains in accuracy while keeping costs in check. Imputation as a method is most effective when the right decisions are made at the data collection design stage, both in terms of questionnaire and sampling design.
Building on a paper by Arthi et al. (2017) and relying on their data, a forthcoming paper by Dang and Carletto (2021) demonstrates how combining phone-based diaries on a relatively small subsample of individuals with an imputation-based approach can result in successfully estimating ‘true’ labour allocation for the entire distribution, as measured by frequent, closely supervised, in-person diaries. The analysis suggests that relatively parsimonious imputation models can offer estimates that lie within the 95 per cent confidence intervals—or in many cases, even within one standard error—of the ‘true’ value. Similar types of imputations are being used to estimate the optimal subsample size of more expensive crop-cut measures or DNA fingerprinting, which, when combined with less-accurate self-reported measures of production or seed adoption, can lead to considerable gains in accuracy at reasonable costs.
3.9. Data of the people, for the people, by the people
By relying on large numbers of volunteers, citizen-generated data offer the potential for filling some of the most pernicious data gaps and generating data at higher levels of spatial and temporal granularity (Aceves-Bueno et al., 2017). Based on a systematic review of 244 SDG indicators and past and on-going citizen science projects around the world, a recent paper by Fraisl et al. (2020) assert that citizen-generated data could contribute to monitoring approximately one-third of the SDG indicators, particularly those related to SDG 15—Life on Land, SDG 11—Sustainable Cities and Communities, SDG 3—Good Health and Wellbeing and SDG 6—Clean Water and Sanitation.
While reliance on crowd sourced and other citizen-generated data has been gaining considerable traction in many fields, particularly in the social and environmental sciences, evidence on their use remains scant in the agricultural sector, leaving their potential still largely unexplored. The collection of prices (Zeug et al., 2017; Ochieng and Baulch, 2020), rainfall data (Minet et al., 2017) and soil data (Herrick et al., 2013) represent a few exceptions. However, as with some of the other data sources mentioned above, for citizen-generated data to reach their full potential and make a significant contribution to filling existing knowledge gaps, their weaknesses in terms of data quality, representativity and potential biases due to self-selection of respondents of variable expertise and commitment must be addressed (Arbia et al., 2020; Buil-Gil et al., 2020). In this context, Wiggings et al. (2011) propose a framework to address data quality problems in citizen science data, noting two categories of errors related to protocols and participants and three entry points for possible intervention (that is, before, during and after participation in data generation efforts). Combining the untapped potential of citizen science and crowd sourced data with the rigor of statistical standards and the sectoral knowledge of agricultural economists may be one of the most consequential challenges for the evolution of agricultural data systems.
4. Increasing the impact of agricultural economics
Large gaps remain in the availability and quality of agricultural data. With few exceptions, agricultural economists have taken data systems as given, focusing predominantly on analysing existing data as opposed to considering how they may be improved. More recently, however, we have seen a burgeoning literature on measurement issues, including in top agricultural and development journals, with researchers and data users increasingly focusing on understanding data quality issues and data collection processes. These recent trends bode well for the future, fuelled by increased awareness within the global community and the diffusion of new and more affordable technologies. While each actor has their own role to play, real acceleration of progress can only be achieved through greater interdisciplinary coordination and collaboration, enabling the full exploitation of the strengths of different data sources and methods. For this to occur at the necessary scale, interdisciplinary fora and platforms must be supported, towards actualising what the recently released 2021 WDR calls a ‘social contract for data’. This will require course corrections for agricultural economists and the broader development research community in several directions, including:
Increasing opportunities for publishing methodological research in top journals. While there are many recent examples of papers addressing measurement issues in top development economics journals, a more concerted push is needed, beginning with the publication of special issues on the subjects of measurement and methodological advances in data production. Placing greater emphasis on published agricultural economics research on data quality would create the incentives for more rigorous research on data collection methods. For instance, recent methodological work on land area measurement has spurred a series of publications in top-tier journals and contributed to the resurgence of a vigorous debate on measurement issues in agricultural productivity data and beyond.
Reducing publication bias. While publishing papers in top journals will remain one of the main metrics for academic excellence and rigor, relying on this as the only valuation method for contributions to the agricultural economics profession creates the wrong incentives for generating research that is relevant and useful as opposed to ‘appealing’ or ‘promising’. Shifting incentives could lead to a healthier balance between ‘blue sky research’ or impact studies with little external validity and analytical outputs that are both operationally relevant and worthy of publication.
Paying close attention to data collection design choices. When planning research, agricultural and development economists should take responsibility for making design choices that ultimately improve the quality of the data being collected, which in turn will lead to more credible results. It would also improve future data collection efforts if researchers commit to systematically collating and disseminating lessons learned during the data collection process.
Thinking beyond self-focused objectives. Researchers involved in new data collection efforts should consider how their instruments and research design could be used for purposes other than for their immediate object of interest, so as to generate possible economies of scale and scope through the increased use and re-use of data. At the design stage, where possible, researchers should also pay close attention to the external validity and replicability of their research, and referees should duly account for this in their reviews. In some instances, the evaluation criteria of research proposals with a primary data component could also give more weight to purported multiplicity of data use.
Integrating experiments systematically into research and data collection. Agricultural economists should make a greater effort to incorporate methodological experiments into their planned data collection efforts, towards both reducing costs and increasing the benefits of methodological research. Establishing a ‘marketplace’ for methodological experiments to help connect data producers with researchers and data scientists may create a supportive enabling environment for scaling up this practice.
Documenting and disseminating data for reuse. The full value of data is realised only when data are used by many for multiple purposes. All data collected should be duly documented, anonymised and made available to other researchers within a relatively short time to increase its potential for use. Rich metadata should be systematically collected and disseminated as part of the public access data. Even data from individual pieces of research will increase in value when shared and used in multiple contexts.
Communicating research findings to a non-technical audience. In research projects, communication efforts to share research findings are often underappreciated and underfunded. Adequate communications budgets are seldom included in research projects and, even when available, little effort is put into translating the technical complexities of research into non-technical terms for a wider audience. In order to transform research into policy interventions and societal change, researchers must invest in communicating their findings to a non-technical audience so that hard-earned knowledge can be used to realise improvements in the agricultural sector and beyond.
To amplify the impact of these proposed actions by individual researchers, international agencies and other development partners must create an enabling environment that fosters more effective collaboration and contribution to the generation of public goods. This may include
Supporting the creation of platforms for data sharing. More resources must be invested into establishing platforms for data sharing and lowering the costs of dissemination by absorbing some of the fixed costs.
Fostering the development of tools and protocols for data anonymisation. To maintain the privacy of respondents and confidentiality of the data collected, we must invest in creating better anonymisation tools and fostering the adoption of standard protocols for data anonymisation. For instance, a recent paper by the Inter-Secretariat Working Group on Household Surveys of the UN Statistical Commission proposes practical ways to anonymise geo-referenced information from surveys through masking techniques and by assessing the risk of data disclosure (UNSC, 2021).
Incentivising the use and reuse of data. Donors supporting new data collection efforts should include an Open Data policy as a clause within their grants. While non-compliance with Open Data principles may be difficult to enforce on any specific grant, compliance could then be used as a criterion for endorsing future grants. To further facilitate its repeated use, besides being open access, data should also be made interoperable to the greatest extent possible.
Recognising and rewarding the production of high-quality data. Researchers embarking in primary data collection may lack the incentive to invest in the production of high-quality data and share them with other researchers for re-use. Creating the right incentives may go from acknowledging the contribution of the data producer to co-authorship in research using the data. Furthermore, systematic tracking and citation of data download and use may also help foster the right visibility and recognition for producers of high-quality data. The citation catalogue by the Microdata Library is an example of that approach.2 Maintaining such a citation catalogue is made difficult by poor referencing to datasets in publications and reports; for that, a more systematic and standardised use of digital object identifiers should be promoted.
Encouraging the systematic inclusion of methodological experiments into data collection. To effectively incentivise the inclusion of methodological experiments into data collection efforts, donors must recognise and communicate the cost-effectiveness and value addition of such an approach, as well as provide funding to cover the marginal costs of these experiments.
Facilitating global collaboration among academic institutions. Many of the innovations discussed in this paper are already widely used around the world, mostly in high-income countries. Levelling the field and proposing a more equitable social contract around data will require stronger collaborative agreements across academics, ideally from countries across various income levels. Analytical capacity and the ability to effectively use data, particularly new data sources, is scarce in low- and middle-income countries. Creating a more equitable equilibrium is everyone’s responsibility.
Reducing transaction costs for data availability. Professional organisations such as the European Association of Agricultural Economists and economic journals have a catalytic role to play towards ensuring that data and syntax files produced by researchers are made available to as many interested users as possible, both to ensure transparency and facilitate the re-use of data. While a step in the right direction, the current system adopted by many journals of allowing open access to individual articles upon payment of a small fee may not be sufficient, as researchers from low-income countries may lack the resources to pay for such services.
While most of these recommendations are not novel and some may even come across as naïve, the encouraging conditions and prospects described above, including the operationalisation of the 2021 WDR on Data for Better Lives, offer a unique window of opportunity for accelerating improvements in agricultural data systems and realising a social contract for data centred around the pursuit of a shared vision and the joint production of global data public goods. Agricultural and development economists must take on a more active role in advancing the data agenda towards increasing their professional impact for positive societal change.
5. Conclusions
Agricultural economists must become more engaged in the process of generating more accurate and relevant data on agriculture, towards ultimately enhancing the credibility and impact of agricultural research. While the agricultural data sector faces numerous challenges, as described above, recent technical and institutional developments offer new opportunities to improve the way we collect agricultural data. Agricultural economists have a responsibility to define and constructively contribute to a new social contract on agricultural data, particularly in light of the fast-paced changes in the data landscape and the broad-based skillsets available to the profession.
The technological and methodological innovations described in this article allow for accelerating the modernisation of agricultural data systems to produce better data that are relevant and fit-for-purpose. Within the agricultural economics profession, each of us are responsible for increasing our contributions to this public good agenda, towards ultimately increasing the impact of the profession in addressing present and future societal challenges. This does not imply foregoing individual benefits but rather expanding fitness-of-use by considering the potential positive externalities of our personal data and research activities. Given the intrinsic costs of collaboration and the numerous disincentives for individuals to treat agricultural data as a public good, international institutions and donors have a critical role to play in actualising this vision and levelling the playing field by creating a more effective enabling environment for building the agricultural data systems needed to meet the diverse challenges of a changing world.
Footnotes
This section draws from the chapter ‘Agricultural Data Collection to Minimize Measurement Error and Maximize Coverage’ by C. Carletto. A. Dillon and A. Zezza, forthcoming in the Handbook of Agricultural Economics, Vol. 5. edited by Christopher B. Barrett and David R. Just.
For a central catalogue of citation, visit https://microdata.worldbank.org/index.php/citations/?collection=central. Full list of citation for each dataset can be found in each individual survey page.
References
Author notes
Carletto ([email protected]) is the manager of the Data Production and Methods Unit, Development Data Group at the World Bank. The author would like to thank Andrew Dillon, Dean Jolliffe, Talip Kilic and Alberto Zezza for their comments, and Chris Barrett and Olivier Dupriez for useful insights on the recommendations.