Abstract

Uses of artificial intelligence (AI) are growing around the world. What will influence AI adoption in the international security realm? Research on automation bias suggests that humans can often be overconfident in AI, whereas research on algorithm aversion shows that, as the stakes of a decision rise, humans become more cautious about trusting algorithms. We theorize about the relationship between background knowledge about AI, trust in AI, and how these interact with other factors to influence the probability of automation bias in the international security context. We test these in a preregistered task identification experiment across a representative sample of 9,000 adults in nine countries with varying levels of AI industries. The results strongly support the theory, especially concerning AI background knowledge. A version of the Dunning–Kruger effect appears to be at play, whereby those with the lowest level of experience with AI are slightly more likely to be algorithm-averse, then automation bias occurs at lower levels of knowledge before leveling off as a respondent’s AI background reaches the highest levels. Additional results show effects from the task’s difficulty, overall AI trust, and whether a human or AI decision aid is described as highly competent or less competent.

Los usos de la inteligencia artificial (IA) están creciendo en todo el mundo. ¿Qué es lo que influirá en la adopción de la IA en el ámbito de la seguridad internacional? La investigación en materia del sesgo de la automatización sugiere que los humanos pueden confiar, con frecuencia, demasiado en la IA. Por el contrario, la investigación relativa a la aversión a los algoritmos demuestra que, a medida que aumenta lo que está en juego en una decisión, los humanos se vuelven más cautelosos a la hora de confiar en los algoritmos. Teorizamos sobre la relación entre el conocimiento previo en materia de IA y la confianza en la IA, así como sobre las formas en que estos interactúan con otros factores para influir sobre la probabilidad de que exista un sesgo con relación a la automatización en el contexto de la seguridad internacional. Ponemos a prueba esta teoría mediante un experimento de identificación de tareas prerregistrado con una muestra representativa de 9000 adultos en 9 países con diferentes niveles de industrias de IA. Los resultados respaldan firmemente esta teoría, especialmente en lo que respecta a los conocimientos previos en materia de IA. Observamos que aquí entra en juego una versión del efecto Dunning Kruger, según el cual aquellos con el nivel más bajo de experiencia con IA son ligeramente más propensos a ser reacios a los algoritmos. Por lo tanto, el sesgo de automatización se produce en niveles más bajos de conocimiento antes de estabilizarse a medida que los antecedentes con relación a la IA de un encuestado alcanzan unos niveles más altos. Los resultados adicionales muestran los efectos de la dificultad de cada tarea, de la confianza general en la IA y de si existe un humano o una ayuda para la toma de decisiones de la IA que se describan como altamente competentes o como menos competentes.

Les utilisations de l'intelligence artificielle (IA) se développent dans le monde entier. Quels seront les facteurs qui influenceront l'adoption de l'IA dans le domaine de la sécurité internationale ? La recherche sur les penchants pour l'automatisation suggère que les humains font souvent preuve d'une confiance excessive à l’égard de l'IA, alors que la recherche sur l'aversion aux algorithmes montre que, lorsque les enjeux d'une décision montent, les humains se montrent plus prudents quand il s'agit de faire confiance aux algorithmes. Nous formulons une théorie à propos de la relation entre les connaissances de fond quant à l'IA, la confiance envers l'IA et leurs interactions avec d'autres facteurs pour influencer la probabilité d'un penchant pour l'automatisation dans un contexte de sécurité internationale. Nous testons ces interactions dans une expérience d'identification de tâches préenregistrées appliquée à un échantillon de 9 000 adultes dans 9 pays dont le secteur de l'IA se trouve à des niveaux différents. Les résultats appuient fortement la théorie, notamment concernant les connaissances de fond sur l'IA. Une version de l'effet Dunning Kruger semblerait entrer en considération : les personnes les moins expérimentées avec l'IA ont légèrement plus de chances de s'opposer aux algorithmes, puis le penchant pour l'automatisation intervient quand les connaissances sont légèrement meilleures avant de se stabiliser quand l'expérience de la personne sondée quant à l'IA atteint les niveaux les plus élevés. Des résultats complémentaires montrent les effets de la difficulté de la tâche, de la confiance globale en l'IA et de la description du niveau de compétence de l'aide à la décision humaine ou d'IA.

Introduction

The integration of advances in artificial intelligence (AI) by governments worldwide raises significant questions for politics and society. The delegation of high-level tasks to machines raises the prospect of accidents and could generate challenges for accountability, especially in high-stakes contexts such as international crises. Complicating these challenges is the potential for automation bias. Automation bias is the “tendency [for human operators] to over-rely on automation” (Goddard, Roudsari, and Wyatt 2012, 121) and automated systems, meaning these systems and their outputs become a “heuristic replacement of vigilant information seeking and processing” (Goddard, Roudsari, and Wyatt 2012, 121). Given the growing interest by militaries and other national security institutions in adopting AI and autonomous systems, human–machine relationships are increasingly life-and-death issues. What do these trends mean for international relations, since many international relations theories are predicated on the role of human agency in making decisions and on human-constructed politics—individual, domestic, and international?

How increasing reliance on AI will influence decision-making in international politics is undertheorized and undertested. Existing evidence about automation bias and algorithm aversion tends to feature relatively small sample sizes in the United States. Automation bias experiments rarely focus on a prominent use case for AI-enabled autonomous systems—military uses (Cummings 2004; Cummings et al. 2019). To fill this gap, we test our hypotheses using a novel, preregistered scenario-based survey experiment with 9,000 respondents across nine countries to evaluate the conditions in which automation bias is more likely, using national security scenarios that mirror current and likely future uses of AI algorithms.1 This large and novel sample, covering countries with varying levels of AI investments, especially in the military realm, including China and Russia, provides a unique vantage point for theory testing.

We theorize—and demonstrate—that key drivers of automation bias for AI systems in national security contexts are experiential and attitudinal. Factors such as task difficulty, background knowledge—familiarity, knowledge, and experience—with AI, trust and confidence in AI, and self-confidence determine how much humans rely on machines—or other humans—when making decisions. In particular, we theorize that a version of the Dunning–Kruger effect is at play, where those with the lowest level of experience with AI are slightly more likely to be algorithm-averse, then automation bias occurs at lower levels of knowledge before leveling off as a respondent’s AI background reaches the highest levels. Additional theorizing focuses on the task’s difficulty, AI trust, and whether a human or AI decision aid is described as highly competent or less competent.

The experiment and our findings contribute to the international relations literature in three ways. First, the results show how AI will likely shape how people make decisions about critical areas of international politics, so it sheds light on decision-making processes at the core of international relations debates. Second, the focus on AI and automation bias bolsters growing research on how emerging technologies shape international politics (Kreps 2016; Sechser et al. 2019; Horowitz 2020; Kahn 2022a). Third, the article contributes to ongoing debates about the role of trust and confidence in political decision-making, which is particularly salient now in the context of growing interest in generative AI and large language models such as ChatGPT and GPT-4.

Moreover, the findings can also contribute to ongoing public policy debates about AI and automation. For example, in 2003, two separate accidents were caused, in part, by failures in a Patriot missile’s automated tracking and identification friend or foe (IFF) systems, resulting in three fatalities. These accidents represented a complicated cascade of simultaneous human and machine failures, precipitated by an error in an automated decision aid and facilitated by established organizational practice that defaulted toward the automated settings (Ministry of Defense 2004; Hawley 2017).

Understanding how humans interact with and respond to AI and autonomous systems and determining where automation bias originates or is more likely to emerge is essential to designing effective, safe systems. As states increasingly integrate AI into military and decision-making systems, the factors that exacerbate automation bias will be directly relevant for the safety and effectiveness of these systems and is increasingly a priority for states and militaries at every stage in the technology life cycle—from design to testing, adoption, and deployment (Konaev, Huang, and Chahal 2021).

It is also relevant for international politics. For example, the US-launched Political Declaration on Responsible Military Use of AI and Autonomy commits to ensuring relevant personnel exercise appropriate care and human judgment in deploying and using military AI and autonomy capabilities. It also mandates personnel be adequately trained to understand the capabilities and limitations of AI and autonomous systems, thereby making context-informed judgments on their use. These commitments necessitate understanding human–machine integration and teaming dynamics, particularly identifying and mitigating automation bias in systems—and systems of systems—that incorporate AI and increasing degrees of automation. Addressing automation bias is therefore directly relevant for international security, multilateral governance, and confidence-building efforts for military AI and autonomy (U.S. Department of State Bureau of Arms Control, Verification and Compliance 2023).

Thus, in addition to the importance for international relations, the results can inform the development of guidelines, AI education, and training programs that can improve decision-making in human–AI teams, mitigate the risk of accidents and failures, and ultimately enhance the safety and effectiveness of these systems. In what follows, we lay out our theory and hypotheses, describing the importance for international politics. We then introduce the survey experiment and research design, describe the results, and discuss limitations and next steps.

Theory

There is growing interest in how technological change surrounding robotics, autonomous systems, AI, and machine learning will influence international politics (Hudson 2019; Horowitz 2020; Jensen, Whyte, and Cuomo 2020; Johnson 2021a). Existing work tends to focus on questions surrounding drone strikes (Johnston and Sarbahi 2016; Kreps and Wallace 2016; Mir and Moore 2019; Lin-Greenberg 2022), uses of AI in nuclear command and control, (Fitzpatrick 2019; Sechser et al. 2019; Hersman 2020; Cox and Williams 2021; Johnson 2021b; Kahn 2022b), and autonomous weapon systems (Scharre 2018; Horowitz 2019). How AI will shape decision-making in international relations remains understudied, especially outside the realm of crisis escalation (Horowitz and Lin-Greenberg 2022). How humans choose whether and how to use algorithms will be a critical part of that equation.

Automation bias refers to the tendency of humans, in some situations, to rely on AI decision aids above and beyond the extent to which they should, given the reliability of the algorithms (Mosier and Skitka 1996; Skitka, Mosier, and Burdick 1999). Algorithm aversion refers to the opposite—the tendency of humans, in some situations, to discard algorithms in favor of their own judgment despite evidence in favor of relying on an algorithm. Several factors may shape when and how automation bias or algorithm aversion manifests and the frequency with which it occurs. These include experiential factors, such as familiarity with and knowledge of the system and like technologies; attitudinal factors, such as trust and confidence; and environmental factors, such as task difficulty and time constraints (Southern and Arnstern 2009; Reichenbach, Onnasch, and Manzey 2010; Goddard, Roudsari, and Wyatt 2014; Massey, Simmons, and Dietvorst 2015; Alon-Barkat and Busuioc 2023). This paper focuses primarily on the first two factors: experiential and attitudinal. The effects of certain environmental factors are well theorized and researched, so we do not propose any novel hypotheses regarding those factors (Bailey and Scerbo 2007; Goddard, Roudsari, and Wyatt 2012; Povyakalo et al. 2013; Lyell and Coiera 2016).

Experiential: Familiarity With and Knowledge of AI

How knowledge and experience influence how individuals and countries behave is a long-running topic in international politics (Haas 1992; Reiter 1994). Existing research suggests experience generates greater trust and reduces aversion to using algorithms, while time-constraint elements increase the likelihood of automation bias (Bailey and Scerbo 2007; Bin 2009). For clarity and brevity, we will use the term “background” to encompass the impact of three distinct but related factors: knowledge, familiarity, and experience. Knowledge, familiarity, and experience with AI could shape how individuals react to algorithms designed for new situations and choose whether to use algorithms in general.

Behavioral science work shows that while having no knowledge of technology can lead to fear and rejection, a limited background in technology can lead to overconfidence in its capabilities. People with limited initial background in a wide variety of topical areas become subject to the “beginners bubble” (Sanchez and Dunning 2018, 10), an illustration of the Dunning–Kruger effect whereby those with surface-level knowledge become overconfident in that knowledge, leading to suboptimal decision-making (Kruger and Dunning 1999). As people gain more experience, familiarity, and knowledge, the degree of overconfidence declines.

This aligns with literature on hype cycles and technology adoption processes, such as the Gartner Hype Cycle. In the early stages of technology development, when visibility is high, there can be a peak of inflated expectations, eventually followed by a steep drop in excitement when there is a disconnect due to misalignment between observed performance and expectations—before eventually increasing and then stabilizing as hype becomes more proportionate to actual performance as individuals become more knowledgeable about the technology (Bahmanziari, Pearson, and Crosby 2016; Blosch and Fenn 2018).

As AI technology develops, consistent with the Dunning–Kruger effect and hype cycles, initial excitement based on limited knowledge should thus lead to an overestimation of the actual effectiveness of the system. As the technology develops further, the relationship should reverse, generating a trust gap when those early expectations have been shattered by reality. While the technology continues to improve, it has not yet earned back users’ trust. In the final stages, a mature technology has stabilized and reached a high level of effectiveness, familiarity, confidence, and trust in the system again, resulting in overconfidence (albeit a more stable one) in its abilities.

We therefore theorize that, when assessing applications of AI and susceptibility to automation bias, the relationship between background in AI and reliance on automation should be nonlinear. Those with no experience, familiarity, or knowledge should be skeptical of AI, meaning they are also unlikely to be prone to automation bias. Those with limited backgrounds should be the most susceptible to automation bias because they have just enough knowledge, familiarity, and experience to think they understand AI but not enough to recognize limits and issues with applications. Finally, on average, those with a substantial background in AI should be more evenly positioned between aversion to AI and automation bias—in theory, relying on the AI system proportionate to its expected performance and accuracy. In other words, they know enough to realize both the utility of algorithms in some cases and when to question or check algorithmic outputs.2 For example, large language models have shown a propensity to generate irrelevant, nonsensical, or even false content. This phenomenon, hallucination, is a concrete example of the need to recognize and rectify automation bias (Heikkilä 2023a,b). For instance, a large language model may list a citation for a nonexistent article yet attribute it to a legitimate author or journal (or both). Failure on the part of the human user not to confirm this information has already led to errors—StackOverflow, a public forum for programming help, banned answers generated by ChatGPT due to the high incidences of subtle errors and factual inaccuracies (Stack Overflow 2022).

Figure 1 illustrates a stylized take on this relationship.

Reliance on automation relative to prior background in AI
Figure 1.

Reliance on automation relative to prior background in AI

We theorize that background in AI (knowledge, familiarity, and experience) will influence respondents’ willingness to rely on input from AI-based systems and algorithms.

 
H1:

Those with the lowest levels of experience, knowledge, and/or familiarity (background) are relatively more averse to AI; people with middle levels of background are relatively over-reliant on AI, and those with the highest levels of background are relatively appropriately reliant on AI.

Attitudinal: Trust and Confidence in AI

Trust is an essential topic in international relations, making it critical to understand in the AI context (Hoffman 2002; Kydd 2007). In addition to firsthand knowledge of, familiarity with, and experience with AI, attitudinal factors should influence the likelihood of automation bias or algorithm aversion. Attitudinal factors measure whether individuals using AI and AI-enabled systems trust the system or algorithm to work as expected. Additionally, whether the operator has trust and/or confidence in the system to aid them in completing the task will also be relative to how much confidence and trust they have in themselves to complete the task.

The distinction between trust and confidence matters. Luhmann argues trust and confidence influence how individuals make decisions under risk conditions (Luhmann 1979). Trust refers to a condition of individual responsibility and knowledge where someone believes in a specific actor, often due to knowledge or the perception of shared experiences (Seligman and Montgomery 2019). Trust is an active decision—someone chooses to trust someone else. Confidence refers to the ability to predict the behavior of others not due to individual knowledge or experience but because of laws, social norms, or established benchmarks for success or an acceptable margin of error; active choice is not necessarily required.

For AI, that confidence should come not from shared experience or familiarity with how algorithms work but from faith in the system—the coders programming algorithms, the testers determining if they are reliable, and the evaluation processes surrounding algorithms’ design, deployment, and use of algorithms. Confidence might also come from assessments of data, such as externally provided data on the reliability of an algorithm, system, or process that one does not personally understand.

The distinction between trust and confidence could play a critical role in helping explain individual and organizational decisions about AI adoption. Shared experiences and familiarity generate trust. However, shared experiences and familiarity do not necessarily lead to more support for AI linearly because experience and knowledge could lead to knowledge of the limitations of AI.

Studies show many people are hesitant to trust AI. One survey by the Pew Research Center found that 52 percent of US respondents were more concerned than excited about the increased use of AI (Tyson and Kikuchi 2023). Distrust of AI is likely heightened in cases where the system has already experienced a failure, even if it is usually highly reliable (Alvarado-Valencia and Barrero 2014). Distrust in AI greatly undermines the effectiveness of AI capabilities because it makes people less likely to use them even when they work.

 
H2:

The more trusting of and open to AI technologies a respondent is in general, the more likely they are to rely on the recommendation of an AI-enabled system in a specific instance.

 
H3:

The more testing and training a system is described to have, the more confidence the respondent will have in that system, and the more likely they will rely on its advice.

Finally, another dimension of confidence is the relationship between self-confidence in the ability to do a task and the willingness to update your view in response to new information from a system like a decision aid. Those who view themselves as more competent should be less likely to trust AI-enabled systems. Existing studies show the relationship between self-confidence and confidence in AI is such that “human self-confidence significantly contributes to their acceptance of AI decisions” and that “humans often misattribute blame to themselves and enter a vicious cycle of relying on a poorly performing AI” (Chong et al. 2022, 10718). For example, in Chong et al. (2022), self-confidence remained the predominant factor across all test groups: Low self-confidence can increase people’s willingness to rely on AI systems. In contrast, high self-confidence can cause people to reject input from automated systems unfairly. Thus, managing the role of self-confidence in the ability to complete the task is critical in understanding the biases people have toward automated systems and AI and also toward fully grasping the decision-making process that occurs when utilizing AI input.

 
H4:

The higher the level of respondent self-confidence in the ability to do a task, the lower the probability a decision aid will influence their views of that task.

In sum, we hypothesize three attitudinal factors will impact rates of automation bias—two relating to the system and one to the respondents themselves—(i) trust in the system, (ii) confidence in the system, and (iii) self-confidence of the respondent regarding task completion. We also expect these attitudinal factors to be directly affected by environmental factors, such as task difficulty and time constraints, which pressurize cognitive resources. However, as explained above, we do not hypothesize about these factors directly, as the literature already establishes these relationships.

Research Design

To test the above hypotheses, we designed a scenario-based survey experiment of the general adult public in nine countries: the United States, Russia, China, France, Australia, Japan, South Korea, Sweden, and the United Kingdom. The sample size for each country is 1,000 respondents, giving us a total sample size of 9,000. We obtained a representative sample of adults from each country except China and Russia, where we obtained a representative sample of urban adults.3

Surveys of the general public make sense for testing our hypotheses for several reasons. First, the public can often be a good proxy for elite preferences, especially when there is no theoretical reason to think that elites would have different preferences (Kertzer 2022; Kertzer and Renshon 2022). Since we are early in the age of AI and foreign policy, elites do not have decades of experience that would lead to different decision-making heuristics. Elites, when they have decades of experience with an issue not shared by the general public, are more likely to hold different views because they are conditioned to evaluate it differently. With AI use still in its early stages—with discussions of potential regulation in the United States, for example, just underway—it means we should not expect elite viewpoints to widely differ from the public (Horowitz and Kahn 2021).

Second, we request respondents to perform a surveillance identification task. For this task, we have no basis to believe the general public’s ability to identify would differ from those who would handle such identifications in a national security context. In the real world, this kind of surveillance identification would be completed by relatively junior military personnel, not elites working in the White House. Even if one thinks there are substantial differences between elites and the public regarding views on AI, those would be less applicable to interpreting the results of our specific scenario. Additionally, research into the efficacy of decision-making shows that decisions made by experts versus those made by the general public have little difference in quality, meaning a general population can provide insights on elites as well (Tetlock 2009). Therefore, the results of this survey can help us understand how decision-making in international security contexts occurs and the ability to make cross-cultural comparisons without potentially being biased by prior training. Finally, because we are surveying different countries, elite attitudes would be challenging to gather and unhelpful for aggregation due to distinct country-specific effects and biases, possibly muddying the results.

Country Selection

Four main factors drove the country selection process: (i) the presence of an AI strategy, (ii) the level of national investment in AI, (iii) regional variety, and (iv) prominence within international security discussions. The nine countries span the scope of Asia, Europe, and North America, including major powers and AI investors. Moreover, the investments made by these nine countries span the range of AI research, economic purposes, and national security investment, meaning the selected countries also have various national interests in the AI sphere. Finally, we also selected countries where we could reliably get an accurate sample of their populations.

Each of the nine countries has published a national AI strategy outlining their goals for using and investing in AI (OECD AI 2022). The publication of national AI strategies demonstrates a prioritization of AI as an industry and technology that will make questions such as those presented in our survey potentially more relevant and recognizable to the population.

Another element for country selection was the level of investment and research development each country has in AI-related fields. Stanford’s Human-Centered Artificial Intelligence (HAI) 2021 AI Index measured national AI investments across twenty-two different factors, including investment, research output, patents and intellectual property, and jobs and industries related to AI (HAI at Stanford University 2021). In this report, the United States, China, South Korea, Australia, and the United Kingdom ranked in the top 10, while Japan, France, and Sweden ranked within the top 20. The United States has been a prominent leader in investment in AI; most major AI industries are headquartered there. While Russia did not rank as highly on the indexes or levels of investment as the other eight countries on the list, it made substantial investments and proclaimed its intention to become a leader in AI.

We also looked at regional variation and major power status. Regional variation refers to selecting countries worldwide and those with potentially different interests and political systems. The selected countries are distributed across various global regions, including North America, different parts of Europe, Asia, Eurasia, and Oceania. Additionally, they represent different regime types, political orientations, and countries associated with prominent roles within broader international institutions, including the European Union, the North Atlantic Treaty Organization, the Asian Infrastructure Investment Bank, the United Nations (UN), and the UN Security Council.

Survey Design

As outlined above, previous survey research on automation bias frequently did not encompass AI-enabled automation, was confined to specific fields, or had limited sample sizes or scope (Cummings 2004; Parasuraman and Manzey 2010; Goddard, Roudsari, and Wyatt 2012). Survey experiments conducted on the American public have been shown to increase our understanding of audience costs (Trager and Vavreck 2011), the democratic peace (Tomz and Weeks 2013), human rights (Wallace 2014; Zvobgo 2019), and support for or willingness to use autonomous technologies (Young and Carpenter 2018; Horowitz et al., 2022, 2023). In contrast, other extensive cross-national surveys highlight significant cultural differences in approaches to AI (Awad et al. 2018).

The experiment employs a within-participant, before-after design that evaluates how the same respondents alter their behavior based on different treatments. Participants first received instructions on the task—identifying whether an airplane belongs to their country’s military or an adversary’s military based on a set of defined characteristics. Participants then completed five practice rounds with no pressurizing constraints, time limits, or obscuring of the images. In each round, they determined whether an airplane was an enemy or a friendly one and received live feedback on the accuracy of their identifications. The intended effect of providing live feedback was to establish a baseline for respondents regarding their effectiveness at the task.

Following the practice rounds, participants proceeded to the experimental portion of the survey. Here, they encountered ten randomized hypothetical airplane identification scenarios. These scenarios varied in difficulty level, with the identification tasks becoming more challenging after the first five rounds.4 Respondents were then asked to identify the aircraft in each scenario, after which they received an experimental treatment in the form of a decision aid. The decision aid was described as a team member who would provide a recommendation and come in the form of either an AI algorithm or human analyst, with varying degrees of testing and training. High confidence language suggested that the AI algorithm or human analyst “has undergone extensive testing and training to identify airplanes under these conditions.”

In contrast, the low-confidence language described the decision support aid as “still undergoing testing and being trained to identify airplanes under these conditions.” The respondent would then have the opportunity to change their answer or keep it the same as their initial identification. If the respondent was shown the control, their initial answer was maintained as their final answer without the opportunity to switch.

The treatments were randomized across a two-by-four experimental design with a control condition for nine possible treatment conditions. The decision aid could be a low-confidence human analyst or AI algorithm, a high-confidence human analyst or AI algorithm, or no suggestion at all for the control. We also randomized whether the recommendation of the decision aid was correct or incorrect. See the online appendix for specific wording. The descriptions for the human analyst and the AI algorithm were kept consistent. This uniformity also applies to the expressions of varying confidence levels in the system. The only variation was whether the suggestion came from a human or an AI:

  • Human analyst low confidence—correct identification

  • Human analyst high confidence—correct identification

  • AI algorithm low confidence—correct identification

  • AI algorithm high confidence—correct identification

  • Human analyst low confidence—incorrect identification

  • Human analyst high confidence—incorrect identification

  • AI algorithm low confidence—incorrect identification

  • AI algorithm high confidence—incorrect identification

  • No identification suggestion

Dependent Variable

Unless otherwise noted, the dependent variable in the analyses below is a binary variable of whether a respondent “switched” their answer after being shown a treatment (1 if switched, 0 otherwise).

Independent Variables

We leverage the experimental design to test the hypotheses above, including demographic data on respondents and batteries of questions about their experience with and attitudes toward AI. We operationalize trust in AI, confidence in the treatment, self-confidence in the ability to complete the task, cognitive pressure, and AI background as follows:

To test hypothesis 1, we introduced questions to measure participants’ experience with, knowledge of, and familiarity with AI and automated systems. These questions include ones to test their factual knowledge of AI and AI technologies, such as if they have a background in AI or computer science more broadly, were able to identify technologies that used AI, and if they can correctly answer questions about AI technologies. Another set of questions assesses experience with AI, such as whether individuals had used AI-based systems in the past, in either work or home contexts. Finally, we gauge familiarity with AI with a battery of general interest, awareness, and exposure to the concepts of AI in the news, from colleagues, friends, or other sources. Additional details are in the online appendix, where we connect each set of questions to the specific measure we use in the paper.

We theorize above that experience, familiarity, knowledge, trust, and confidence are closely related but distinct factors that may influence rates of automation bias. However, acknowledging there may be some degree of interdependence between these factors, we measure these factors both independently and together in various indices as outlined below.

We create a standardized measure of AI background with these questions, enabling us to understand how previous interactions with or knowledge of AI influence how people interact with automated systems and whether the depth of that exposure plays a role. The questions also reflect the commonly accepted styles and subject matters for measuring these indicators, making the results of this study more comparable with, and able to build on, previous studies (Zhang and Dafoe 2019).5

  • AI Knowledge: Measured using how well respondents performed on two AI knowledge quiz questions.6

  • AI Familiarity: Measured using a scale of how familiar respondents were with AI, based on whether they had heard about AI from friends or family, read about AI in the news, from other avenues, or not at all.

  • AI Experience: Measured using two indicators: (i) whether an individual had some degree of coding or programming experience, and (ii) whether a respondent had used a specific application and if they viewed the application to utilize AI technology.

  • AI Background Index: A normalized index aggregating knowledge, experience, and familiarity with AI. Experience and familiarity were each weighted at two-fifths. Knowledge received a smaller weight of one-fifth because the knowledge questions proved difficult for respondents, with only 25 respondents out of the total 9,016 getting both questions correct.

In the online appendix, figure A3, plots the mean values for each variable. We test hypotheses 2 through 4 with the following variables:

  • Trust in AI: To test hypothesis 2, we created an “AI Beliefs” battery of questions to serve as a baseline measure for how trusting an individual respondent is of AI and AI-enabled systems. The battery comes from the top three factor-loaded items from each positive and negative subscale of an AI-specific scale based on a validated index that maps attitudes toward technologies, the Technology Readiness Index 2.0 (Lam, Chiang, and Parasuraman 2008; Parasuraman and Colby 2014; Schepman and Rodway 2020).7

  • Treatment Confidence: To test hypothesis 3, confidence in the decision aid was set by the above language for each treatment, with high and low confidence language held constant across the human analysts and AI algorithm treatments.

  • Self-Confidence: To test hypothesis 4, we operationalize the relationship between self-confidence and answer switching by measuring the number of correct identifications made by participants in the practice rounds. The live feedback element of the practice section provided the respondents with a benchmark for how well they could complete the task, absent any pressurizing cognitive constraints.

In table 1 and the online appendix, figure A1, we show the summary statistics and correlation matrix for key variables. As respondents each completed ten identification rounds, in presenting the statistics, we show the data broken down by each identification round rather than grouped by respondent.

Table 1.

Summary statistics

StatisticNMeanStandard deviationMinMax
Sex (female = 1)89,8100.520.500.001.00
Age90,16046.5116.561893
Political ideology (right = 10)70,2205.612.510.0010.00
Highest level of education90,1602.241.5605
Total number of practice rounds correct90,1602.651.2105
Received a high confidence treatment90,1600.450.5001
Received a low confidence treatment90,1600.440.5001
Received an AI algorithm treatment90,1600.440.5001
Received a human analyst treatment90,1600.450.5001
Switched identification after treatment80,1580.230.420.001.00
AI background index90,1600.230.150.000.86
Trust in AI73,99016.054.310.0028.00
StatisticNMeanStandard deviationMinMax
Sex (female = 1)89,8100.520.500.001.00
Age90,16046.5116.561893
Political ideology (right = 10)70,2205.612.510.0010.00
Highest level of education90,1602.241.5605
Total number of practice rounds correct90,1602.651.2105
Received a high confidence treatment90,1600.450.5001
Received a low confidence treatment90,1600.440.5001
Received an AI algorithm treatment90,1600.440.5001
Received a human analyst treatment90,1600.450.5001
Switched identification after treatment80,1580.230.420.001.00
AI background index90,1600.230.150.000.86
Trust in AI73,99016.054.310.0028.00
Table 1.

Summary statistics

StatisticNMeanStandard deviationMinMax
Sex (female = 1)89,8100.520.500.001.00
Age90,16046.5116.561893
Political ideology (right = 10)70,2205.612.510.0010.00
Highest level of education90,1602.241.5605
Total number of practice rounds correct90,1602.651.2105
Received a high confidence treatment90,1600.450.5001
Received a low confidence treatment90,1600.440.5001
Received an AI algorithm treatment90,1600.440.5001
Received a human analyst treatment90,1600.450.5001
Switched identification after treatment80,1580.230.420.001.00
AI background index90,1600.230.150.000.86
Trust in AI73,99016.054.310.0028.00
StatisticNMeanStandard deviationMinMax
Sex (female = 1)89,8100.520.500.001.00
Age90,16046.5116.561893
Political ideology (right = 10)70,2205.612.510.0010.00
Highest level of education90,1602.241.5605
Total number of practice rounds correct90,1602.651.2105
Received a high confidence treatment90,1600.450.5001
Received a low confidence treatment90,1600.440.5001
Received an AI algorithm treatment90,1600.440.5001
Received a human analyst treatment90,1600.450.5001
Switched identification after treatment80,1580.230.420.001.00
AI background index90,1600.230.150.000.86
Trust in AI73,99016.054.310.0028.00

Results

We start by evaluating our primary dependent variable, the “rate of switching,” the rate at which respondents opted to change their initial identification after being presented with the treatment condition in each round. Figure 2 illustrates the mean switching rate for respondents that received each treatment: high- or low-confidence human analyst or AI algorithm decision support. Individuals tended to switch more often when presented with high-confidence treatments. At these high-confidence levels, the frequency of switching answers was highest when a human analyst treatment was involved. Conversely, at low-confidence levels, individuals were likelier to switch their answers if the treatment involved an AI algorithm. This dynamic suggests that the general public holds different thresholds for error tolerance and expected performance for humans and AI systems. When an AI or human analyst is described as having undergone “extensive testing and training,” humans elicit the most confidence. However, when described as “still undergoing testing and training,” individuals may view any AI, even an incompletely trained and tested one, as preferable to a human analyst, whom they perceive as more prone to error and mistakes.

Mean level of switching per treatment type
Figure 2.

Mean level of switching per treatment type

We now turn to evaluating our hypotheses. Table 2 presents the results of the initial regression analysis designed further to test our hypotheses in models one and two. The dependent variable is the same switching variable. For simplicity of display and interpretation, we use OLS even though the dependent variable is binary. The results are consistent using logit models. The universe of cases is one observation per respondent identification, so there are ten observations (reflecting the ten game rounds) total per respondent. We, therefore, also cluster standard errors on the respondent. We include country variables to account for country effects.

Table 2.

Analysis of respondent switching

(1)(2)(3)(4)(5)(6)(7)
Overall binaryb/SEAI condition binaryb/SEHuman condition binaryb/SESwitching in AI conditionAI familiarity modelb/SESwitching in AI conditionAI knowledge modelb/SESwitching in AI conditionAI experience modelb/SESwitching in AIconditionAI background modelb/SE
Political ideology0.032***0.037***0.039***0.034***
(0.010)(0.010)(0.010)(0.010)
Level of difficulty0.034***0.035***0.041***0.200***0.200***0.201***0.200***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
Practice round accuracy−0.008***−0.011***−0.008***−0.042**−0.047**−0.047**−0.044**
(0.002)(0.002)(0.002)(0.019)(0.019)(0.019)(0.019)
Age−0.000*−0.000**−0.000−0.000−0.001−0.001−0.000
(0.000)(0.000)(0.000)(0.002)(0.002)(0.002)(0.002)
Gender0.030***0.035***0.034***0.225***0.211***0.206***0.218***
(0.004)(0.006)(0.005)(0.049)(0.049)(0.049)(0.050)
Level of education0.005***0.007***0.005**0.025*0.043***0.044***0.036**
(0.001)(0.002)(0.002)(0.015)(0.015)(0.015)(0.015)
Treatment condition: AI0.045***
(0.003)
Treatment condition: high confidence0.056***0.017***0.016***0.093***0.092***0.093***0.092***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
AI familiarity1.928***
(0.388)
AI familiarity squared−1.788***
(0.463)
AI knowledge0.954**
(0.412)
AI knowledge squared−1.366***
(0.530)
AI experience0.923***
(0.280)
AI experience squared−1.134***
(0.300)
AI background index1.289***
(0.483)
AI background index squared−1.547**
(0.653)
Normalized AI sentiment−0.349**−0.203−0.223−0.274*
(0.158)(0.159)(0.160)(0.161)
Constant0.147***0.219***0.196***−1.588***−1.554***−1.579***−1.615***
(0.011)(0.015)(0.015)(0.167)(0.174)(0.171)(0.180)
Observations90,16039,90340,25526,21426,21426,21426,214
R20.0200.0150.014
Pseudo R20.0100.0070.0080.007
Log likelihood−44,495.143−21,683.412−21,555.620−14,202.630−14,237.277−14,229.660−14,236.006
F88.73826.23927.421
(1)(2)(3)(4)(5)(6)(7)
Overall binaryb/SEAI condition binaryb/SEHuman condition binaryb/SESwitching in AI conditionAI familiarity modelb/SESwitching in AI conditionAI knowledge modelb/SESwitching in AI conditionAI experience modelb/SESwitching in AIconditionAI background modelb/SE
Political ideology0.032***0.037***0.039***0.034***
(0.010)(0.010)(0.010)(0.010)
Level of difficulty0.034***0.035***0.041***0.200***0.200***0.201***0.200***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
Practice round accuracy−0.008***−0.011***−0.008***−0.042**−0.047**−0.047**−0.044**
(0.002)(0.002)(0.002)(0.019)(0.019)(0.019)(0.019)
Age−0.000*−0.000**−0.000−0.000−0.001−0.001−0.000
(0.000)(0.000)(0.000)(0.002)(0.002)(0.002)(0.002)
Gender0.030***0.035***0.034***0.225***0.211***0.206***0.218***
(0.004)(0.006)(0.005)(0.049)(0.049)(0.049)(0.050)
Level of education0.005***0.007***0.005**0.025*0.043***0.044***0.036**
(0.001)(0.002)(0.002)(0.015)(0.015)(0.015)(0.015)
Treatment condition: AI0.045***
(0.003)
Treatment condition: high confidence0.056***0.017***0.016***0.093***0.092***0.093***0.092***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
AI familiarity1.928***
(0.388)
AI familiarity squared−1.788***
(0.463)
AI knowledge0.954**
(0.412)
AI knowledge squared−1.366***
(0.530)
AI experience0.923***
(0.280)
AI experience squared−1.134***
(0.300)
AI background index1.289***
(0.483)
AI background index squared−1.547**
(0.653)
Normalized AI sentiment−0.349**−0.203−0.223−0.274*
(0.158)(0.159)(0.160)(0.161)
Constant0.147***0.219***0.196***−1.588***−1.554***−1.579***−1.615***
(0.011)(0.015)(0.015)(0.167)(0.174)(0.171)(0.180)
Observations90,16039,90340,25526,21426,21426,21426,214
R20.0200.0150.014
Pseudo R20.0100.0070.0080.007
Log likelihood−44,495.143−21,683.412−21,555.620−14,202.630−14,237.277−14,229.660−14,236.006
F88.73826.23927.421

Notes: Standard errors clustered by respondent in parentheses. *p<0.10; **p<0.05; ***p<0.01.

Table 2.

Analysis of respondent switching

(1)(2)(3)(4)(5)(6)(7)
Overall binaryb/SEAI condition binaryb/SEHuman condition binaryb/SESwitching in AI conditionAI familiarity modelb/SESwitching in AI conditionAI knowledge modelb/SESwitching in AI conditionAI experience modelb/SESwitching in AIconditionAI background modelb/SE
Political ideology0.032***0.037***0.039***0.034***
(0.010)(0.010)(0.010)(0.010)
Level of difficulty0.034***0.035***0.041***0.200***0.200***0.201***0.200***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
Practice round accuracy−0.008***−0.011***−0.008***−0.042**−0.047**−0.047**−0.044**
(0.002)(0.002)(0.002)(0.019)(0.019)(0.019)(0.019)
Age−0.000*−0.000**−0.000−0.000−0.001−0.001−0.000
(0.000)(0.000)(0.000)(0.002)(0.002)(0.002)(0.002)
Gender0.030***0.035***0.034***0.225***0.211***0.206***0.218***
(0.004)(0.006)(0.005)(0.049)(0.049)(0.049)(0.050)
Level of education0.005***0.007***0.005**0.025*0.043***0.044***0.036**
(0.001)(0.002)(0.002)(0.015)(0.015)(0.015)(0.015)
Treatment condition: AI0.045***
(0.003)
Treatment condition: high confidence0.056***0.017***0.016***0.093***0.092***0.093***0.092***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
AI familiarity1.928***
(0.388)
AI familiarity squared−1.788***
(0.463)
AI knowledge0.954**
(0.412)
AI knowledge squared−1.366***
(0.530)
AI experience0.923***
(0.280)
AI experience squared−1.134***
(0.300)
AI background index1.289***
(0.483)
AI background index squared−1.547**
(0.653)
Normalized AI sentiment−0.349**−0.203−0.223−0.274*
(0.158)(0.159)(0.160)(0.161)
Constant0.147***0.219***0.196***−1.588***−1.554***−1.579***−1.615***
(0.011)(0.015)(0.015)(0.167)(0.174)(0.171)(0.180)
Observations90,16039,90340,25526,21426,21426,21426,214
R20.0200.0150.014
Pseudo R20.0100.0070.0080.007
Log likelihood−44,495.143−21,683.412−21,555.620−14,202.630−14,237.277−14,229.660−14,236.006
F88.73826.23927.421
(1)(2)(3)(4)(5)(6)(7)
Overall binaryb/SEAI condition binaryb/SEHuman condition binaryb/SESwitching in AI conditionAI familiarity modelb/SESwitching in AI conditionAI knowledge modelb/SESwitching in AI conditionAI experience modelb/SESwitching in AIconditionAI background modelb/SE
Political ideology0.032***0.037***0.039***0.034***
(0.010)(0.010)(0.010)(0.010)
Level of difficulty0.034***0.035***0.041***0.200***0.200***0.201***0.200***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
Practice round accuracy−0.008***−0.011***−0.008***−0.042**−0.047**−0.047**−0.044**
(0.002)(0.002)(0.002)(0.019)(0.019)(0.019)(0.019)
Age−0.000*−0.000**−0.000−0.000−0.001−0.001−0.000
(0.000)(0.000)(0.000)(0.002)(0.002)(0.002)(0.002)
Gender0.030***0.035***0.034***0.225***0.211***0.206***0.218***
(0.004)(0.006)(0.005)(0.049)(0.049)(0.049)(0.050)
Level of education0.005***0.007***0.005**0.025*0.043***0.044***0.036**
(0.001)(0.002)(0.002)(0.015)(0.015)(0.015)(0.015)
Treatment condition: AI0.045***
(0.003)
Treatment condition: high confidence0.056***0.017***0.016***0.093***0.092***0.093***0.092***
(0.003)(0.004)(0.004)(0.034)(0.034)(0.034)(0.034)
AI familiarity1.928***
(0.388)
AI familiarity squared−1.788***
(0.463)
AI knowledge0.954**
(0.412)
AI knowledge squared−1.366***
(0.530)
AI experience0.923***
(0.280)
AI experience squared−1.134***
(0.300)
AI background index1.289***
(0.483)
AI background index squared−1.547**
(0.653)
Normalized AI sentiment−0.349**−0.203−0.223−0.274*
(0.158)(0.159)(0.160)(0.161)
Constant0.147***0.219***0.196***−1.588***−1.554***−1.579***−1.615***
(0.011)(0.015)(0.015)(0.167)(0.174)(0.171)(0.180)
Observations90,16039,90340,25526,21426,21426,21426,214
R20.0200.0150.014
Pseudo R20.0100.0070.0080.007
Log likelihood−44,495.143−21,683.412−21,555.620−14,202.630−14,237.277−14,229.660−14,236.006
F88.73826.23927.421

Notes: Standard errors clustered by respondent in parentheses. *p<0.10; **p<0.05; ***p<0.01.

The results support our theoretical expectations. There is a significant relationship between background in AI (hypothesis 1), level of trust in AI (hypothesis 2), treatment confidence level (hypothesis 3), practice round accuracy (hypothesis 4), and rate of switching at the p < 0.05 level and above. We explore each of these results further in turn.

We further test hypothesis 1 regarding how a background in AI should influence automation bias in models 4–7 of table 2. We switch to a logit model here, in part, to highlight the consistency of the results across different model specifications, and additional results in the online appendix show that the model specification does not impact the results. We restrict the universe of cases to just those respondents that received the AI treatment condition and evaluate the impact of our AI Background Index variable, which includes the knowledge, experience, and familiarity subindices we also examine below. Given our theoretical prediction that the relationship between background in AI and answer switching by respondents will be nonlinear in hypothesis 1, we square the AI Background Index variable.

The results show strong and statistically significant nonlinear interaction effects. We begin with a comparison of means, evaluating the substantive effects by comparing mean values of answer switching after receiving an AI algorithm treatment as the value of the AI Background Index increases. Figure 3 plots the switching rate for the universe of cases in models 4–5. The predicted nonlinear relationship is evident, supporting the theory illustrated in figure 1. Essentially, when individuals lack a background in AI, similar to having no exposure to other types of technology, they tend to be skeptical and treat it as unreliable. Small amounts of prior exposure (like small amounts of knowledge from a Dunning–Kruger perspective) make people think they understand AI, so they become overconfident in its capabilities. With greater knowledge, though, comes a more tempered viewpoint—more supportive than the skepticism that comes with no background in AI, but more skepticism than what comes with lower levels.

Rate of switching when treatment is an AI algorithm, given AI background.
Figure 3.

Rate of switching when treatment is an AI algorithm, given AI background.

More specifically, reliance on AI (in this case, the rate of switching when receiving an AI treatment) initially increases steadily at the lowest combined level of AI familiarity, knowledge, and experience, peaking when AI Background Index is around the mean (0.224) and median values (0.196) of AI Background Index, before it decreases and levels out as the AI Background Index variable increases past the third quantile mark of around 0.313 and trends toward the maximum of 0.86.

In figure 4, we evaluate the substantive effects of each of the subindices that together constitute the AI Background Index, and the background index itself, based on the regression models in table 2, allowing us to show the effects of each mechanism independently. We generate predicted probabilities for switching across each of the values of the AI Familiarity, AI Knowledge, AI Experience, and AI Background Index variables, holding other variables at their means, and plot the results. The results curves look slightly different than in figure 3 due to the control variables in the regression, and clustered standard errors. Their consistency, however, demonstrates a clear pattern of results that supports our hypothesis.

Predicted probability of switching: AI background and index components.
Figure 4.

Predicted probability of switching: AI background and index components.

The probability of answer switching in the AI condition is 19 percent when AI Familiarity is lowest. Consistent with the Dunning–Kruger effect (Sanchez and Dunning 2018), and what research on technology hype cycles suggests, as perceived AI familiarity grows, respondents become more likely to switch their answers in response to information provided by an AI decision aid, peaking at 29 percent. As AI Familiarity gets to the highest level, though, the probability of switching goes down again to 22 percent.

We observe comparable effects for AI Knowledge (beginning at 22 percent, peaking at 25 percent, and terminating at 16 percent) and AI Experience (beginning at 20 percent, peaking at 25 percent, and terminating at 18 percent). The AI Knowledge curve is slightly steeper, an expected trend given the significantly higher difficulty level of the AI knowledge questions. The index starts at 20 percent when AI Background Index is lowest, peaks at 25 percent as background rises, and drops to 16 percent when AI Background Index is at its highest. The results provide strong support for hypothesis 1.

Attitudinal: Trust and Confidence

We now look at hypotheses 2–4, evaluating the impact of trust and confidence on the likelihood of switching. As outlined earlier, we measure trust in AI with a validated Trust in AI index variable and confidence in the system with our reliability treatment condition, Treatment Confidence). We also measure self-confidence as the percentage of practice rounds accurately identified (Total Number of Practice Rounds Correct).

Figure 5 displays the percentage of respondents who agreed or disagreed with each statement used to create the Trust in AI index. A significant portion of respondents exhibit some degree of openness to AI. A majority consider AI exciting and potentially beneficial, with 62 percent and 73 percent, respectively, of the sample responding with “agree” or “strongly agree.” Over half (53 percent) express an interest in using “artificially intelligent systems.” About 46 percent even conceded that AI is, on balance, more accurate than humans. However, there is notable hesitation and concern regarding AI use, with 43 percent believing AI to be “dangerous,” 39 percent thinking organizations use AI unethically, and 36 percent finding AI “sinister.” Collectively, these responses sketch a general picture of individuals’ openness and trust toward AI technologies, enabling the construction of the Trust in AI index described in the Research Design section.

Responses to AI beliefs battery
Figure 5.

Responses to AI beliefs battery

To further examine hypothesis 2, we evaluate the connection between a respondent’s trust and confidence in AI and their propensity to follow the guidance of an AI decision aid or exhibit automation bias. Figure 6 plots respondents’ rate of switching their answers when shown an AI algorithm treatment, given how trusting they are in AI-based technology and systems. Generally, the switching rate for both high- and low-confidence AI algorithm treatments showed a similar, nonlinear trend as trust in AI increased but was overall higher for high-confidence treatments. Similar to the combined effects of knowledge, experience, and familiarity with AI, individuals are more prone to algorithm aversion at the lowest levels of trust.

Average switching given an AI treatment across AI trust levels
Figure 6.

Average switching given an AI treatment across AI trust levels

Confidence in the AI decision aid mattered less to respondents with lower general trust in AI. For those respondents with a Trust in AI index score less than the median (0.571) (or mean 0.573), their personal perception of the technology outweighed the influence of information from the experimental condition on the level of testing or training the AI algorithm received. However, respondents with a trust level in AI higher than the mean or median were substantially more likely to change their answers if the AI algorithm was established to have a higher degree of accuracy through “extensive testing and training.” This finding implies that once the barrier of trust in the technology is overcome, confidence—reflected in the testing, evaluation, and expected accuracy of the system—becomes a more central factor in the willingness to depend on the technology.

The above also shows evidence for hypothesis 3, as does the comparison of means at the beginning of the results section. In general, when the decision aid for the respondent is described in terms that elicit higher degrees of confidence, respondents are significantly more likely to change their answers. The findings hold whether respondents received the human or AI experimental treatments.

Finally, we find support for hypothesis 4. Figure 7 shows the rate of respondent answer switching after being shown each treatment type, given their practice round accuracy. As predicted, the higher a respondent scored on the practice round identification tasks, on average, the less likely they were to switch their answers in the “real” identification rounds for most treatment scenarios. Self-confidence, derived from the practice rounds, translated into perceived task competence—the higher the level of self-confidence, the lower the probability that respondents will rely on decision aids.

Rate of switching given treatment type and accuracy in practice rounds.
Figure 7.

Rate of switching given treatment type and accuracy in practice rounds.

For those respondents who got no identifications correct in the practice round, they most frequently switched their answers when presented with an AI treatment, doing so over 25 percent of the time. In contrast, across the AI and human treatment conditions, respondents who achieved a perfect score in the practice section engaged in answer switching only 18 percent of the time. In the AI treatment condition, respondents with low practice round accuracy consistently switched their answers, regardless of treatment condition. In contrast, those with high practice round accuracy changed their answers about 20 percent of the time for high-confidence AI treatments, but only 16 percent of the time for low-confidence AI treatments. In the human treatment condition, the low confidence treatment generated effects that look like the AI treatment condition: more switching at low levels of practice round accuracy than high levels. In the high confidence treatment condition, in contrast, irrespective of the degree of practice round accuracy, the switching rate remained constant at around 24 percent. The logic would be interesting to explore in future research.

Conclusion

What influences the adoption of new technologies is a crucial question for international politics. Governments’ use of AI involves questions of trust, confidence, and human agency that are fundamental to our understanding of international relations, particularly as AI increasingly influences political decisions. As the integration of AI into militaries worldwide continues, how individuals and organizations make decisions about adopting AI will become even more critical. Research on automation bias suggests that humans can often be overconfident in AI, but much of that research is limited to the healthcare and aviation sectors. Research on algorithm aversion, in contrast, shows that, as the stakes of a decision rise, humans become more cautious about trusting algorithms.

We test theories about automation bias in the international security context using a large, unique sample of 9,000 respondents in nine countries. These countries have extensive—yet diverse—levels of AI industries and military investments. The respondents completed a military aircraft identification task as part of the overall survey experiment, whereby they received advice from either an AI or human decision aid before locking in their answer to the identification task. The results strongly support our hypotheses. A version of the Dunning–Kruger effect appears to be at play, whereby those with the lowest level of experience with AI are slightly more likely to be algorithm-averse, then automation bias occurs at lower levels of knowledge before leveling off as a respondent’s AI background reaches the highest levels.

We find that there is a nonlinear relationship between the probability that an individual will rely on an AI decision aid in the international security context and their overall exposure to AI, where exposure encompasses the extent to which an individual has a general familiarity and awareness of the technology, concrete knowledge of how the technology works, and firsthand experience using the technology. At the lowest levels of a prior background with AI, algorithm aversion is most likely, and at average levels, automation bias is most likely.

Only at the highest levels of overall exposure to AI are individuals more balanced in whether they rely on the AI decision aid for military aircraft identification, and we see greater accuracy in task completion, particularly amongst those respondents who had low self-confidence in their ability to complete the task. The online appendix, figure A2, details these findings. Respondents with the lowest levels of self-confidence in their ability to complete the task were more likely to switch initially incorrect answers when suggested a correction by an AI decision aid, and they displayed greater accuracy as their background in AI increased, from just around 50 percent accuracy at low levels to nearly 70 percent accuracy at high levels.

We also find that trust and confidence play a significant role in whether an individual will act on recommendations made by the AI decision aid. The more positive someone’s attitudes toward AI technology are, the more trusting they are in that technology in general, and the more likely the individual is to trust a recommendation of the AI decision aid in our specific scenario over that of a human analyst. Similarly, we find that the greater the described accuracy of the system, the more likely a respondent is to follow the decision aid’s suggestions. However, we find that this is frequently tempered by a respondent’s self-confidence. The more accurate a respondent is in the initial practice segment of the experiment, the less likely they are to second-guess their decisions and overly depend on a decision aid in subsequent task identification rounds. This holds whether the suggestion comes from a human analyst or an AI algorithm.

Several limitations of this study could inform future research. First, this study focused on the military domain because most previous automation bias experiments did not. Future research could include a nonmilitary scenario and a military scenario to allow for the direct comparability of results. Second, rather than focusing solely on public views, future research could explore the perspectives of elites or those directly involved in national security and military efforts in various countries. Third, future research could look at additional tasks in the military domain to see how automation bias and algorithm aversion vary across the type of task. Fourth, future research in other fields could also apply a similar methodological approach to answer related questions about human–machine teaming and cognitive biases. Finally, researchers could use this article’s novel sample and data to answer questions about cross-national differences in automation bias in the national security context.

Overall, these results demonstrate that how people think about AI and the features of any specific AI-enabled system influences how the system is used in an international security context, including rates of automation bias and algorithmic aversion. The results also build on existing behavioral science research on cognitive biases and receptivity to using automated and increasingly computerized tools. Automation bias is not merely a consideration at the individual level; it is built into systems via protocol and operational procedure and in broader institutional approaches, policies, and norms that govern human–machine interaction, integration, and teaming.8 Therefore, understanding these issues in the international security context can guide public policy development for hardware and software. These findings contribute to new theory and illustrate the importance of understanding trust and confidence in AI adoption and how AI will broadly influence international relations.

Acknowledgements

The views expressed in this article do not reflect the views of the Department of Defense or the United States government. All errors are the responsibility of the authors. The authors would like to thank Julia Ciocca for her research and ideas. This article was made possible, in part, by a grant from the Air Force Office of Scientific Research and the Minerva Research Initiative under grant #FA9550-18-1-0194. The data underlying this article are available on the ISQ Dataverse at https://dataverse.harvard.edu/dataverse/isq.

Footnotes

1

Preregistration at https://osf.io/64p8g. Variations from the preregistration plan only involved technical clarification of hypotheses due to feedback and deferring testing some hypotheses for future papers, not substantive changes. As described below, we also choose a specific scenario where there is no reason to expect the attitudes of those making judgments in the real world to substantially differ from those of the general public.

2

See the preanalysis plan available at https://osf.io/64p8g for additional information.

3

YouGov fielded the survey in the United States and Russia from January 31, 2022 to February 13, 2022, while Delta Poll fielded the survey in the remaining countries from February 17, 2022 to March 14, 2022.

4

In terms of difficulty, respondents had either 10 or 7 seconds to make an identification, and the airplane had either partially obscured or entirely obscured features. See the appendix for the images used and an example scenario.

5

A future study could ask about knowledge concerning AI applications in national security. We did not ask this question to avoid potentially biasing respondents, but as uses become more widespread, it would make sense to ask it in future experiments.

6

The questions used to create this index and the rest described below are viewable in the online appendix.

7

The questions used to create this index are viewable in the online appendix.

8

For example, Airbus and Boeing have distinct philosophies regarding automation and human–machine teaming (HMT) on their flight decks. While Airbus emphasizes “hard” limits, allowing automation to take precedence except when necessary for safety, Boeing adopts a “soft” limit approach that places the pilot more firmly as the final authority (Abbott 2000). When elements of these systems were changed on 737 Max Boeing airplanes (that complicated the ability for pilots to physically override the system) and insufficiently explained to human pilots, this directly contributed to two deadly disasters in 2019 (Baker 2020)—the investigations into the crashes reached similar conclusions: The human factor was not sufficiently considered in the integration of the new system—that it was introduced without sufficiently training and instructing operators about the role and procedures of the human in overriding and exerting control over it.

Author Biography

Michael C. Horowitz is a Professor of Political Science at the University of Pennsylvania.

Lauren Kahn is a Senior Research Analyst at the Center for Security and Emerging Technology at Georgetown University.

References

Abbott
 
Kathy H.
 
2000
. “
Human Factors Engineering and Flight Deck Design
.” In
Digital Avionics Handbook
, edited by
Spitzer
 
Cary
,
Ferrell
 
Uma
, and
Ferrell
 
Thomas
, 1st ed.  
chapter 9
,
9.1
3
..
Boca Raton, FL
:
CRC Press
.

Alon-Barkat
 
Saar
, and
Busuioc
 
Madalina
.
2023
. “
Human-AI Interactions in Public Sector Decision-Making: ‘Automation Bias’ and ‘Selective Adherence’ to Algorithmic Advice
.”
Journal of Public Administration Research and Theory
.
33
(
1
):
153
69
.

Alvarado-Valencia
 
Jorge A.
, and
Barrero
 
Lope H.
.
2014
. “
Reliance, Trust and Heuristics in Judgemental Forecasting
.”
Computers in Human Behavior
.
36
:
102
13
.

Awad
 
Edmond
,
Dsouza
 
Sohan
,
Kim
 
Richard
,
Schulz
 
Jonathan
,
Henrich
 
Joseph
,
Shariff
 
Azim
,
Bonnefon
 
Jean-François
, and
Rahwan
 
Iyad
.
2018
. “
The Moral Machine Experiment
.”
Nature
.
563
:
59
64
.

Bahmanziari
 
Tammy
,
Pearson
 
J. Michael
, and
Crosby
 
Leon
.
2016
. “
Is Trust Important in Technology Adoption? A Policy Capturing Approach
.”
Journal of Computer Information Systems
.
43
(
4
):
46
54
.

Bailey
 
Nathan R.
, and
Scerbo
 
Mark W.
.
2007
. “
Automation-Induced Complacency for Monitoring Highly Reliable Systems: The Role of Task Complexity, System Experience, and Operator Trust
.”
Theoretical Issues in Ergonomics Science
.
8
(
4
):
321
48
.

Baker
 
Sinéad
.
2020
. “
Boeing 737 Max: What’s Happened after the 2 Deadly Crashes
.” .

Bin
 
Guo
.
2009
. “
Moderating Effects of Task Characteristics on Information Source Use: An Individual-Level Analysis of R&D Professionals in New Product Development
.”
Journal of Information Science
.
35
(
5
):
527
47
.

Blosch
 
Marcus
, and
Fenn
 
Jackie
.
2018
.
Understanding Gartner’s Hype Cycles
.
Gartner Research
. Accessed February 28, 2024. https://www.gartner.com/en/documents/3887767.

Chong
 
Leah
,
Zhang
 
Guanglu
,
Goucher-Lambert
 
Kosa
,
Kotovsky
 
Kenneth
, and
Cagan
 
Jonathan
.
2022
. “
Human Confidence in Artificial Intelligence and in Themselves: The Evolution and Impact of Confidence on Adoption of AI Advice
.”
Computers in Human Behavior
.
127
:
107018
.

Cox
 
Jessica
, and
Williams
 
Heather
.
2021
. “
The Unavoidable Technology: How Artificial Intelligence Can Strengthen Nuclear Stability
.”
The Washington Quarterly
.
44
:
69
85
.

Cummings
 
Mary
.
2004
. “
Automation Bias in Intelligent Time Critical Decision Support Systems
.” In
AIAA 1st Intelligent Systems Technical Conference
,
557
62
.

Cummings
 
Mary
,
Huang
 
Lixiao
,
Zhu
 
Haibei
,
Finkelstein
 
Daniel
, and
Wei
 
Ran
.
2019
. “
The Impact of Increasing Autonomy on Training Requirements in a UAV Supervisory Control Task
.”
Journal of Cognitive Engineering and Decision Making
.
13
(
4
):
295
09
.

Fitzpatrick
 
Mark
.
2019
. “
Artificial Intelligence and Nuclear Command and Control
.”
Survival
.
61
(
3
):
81
92
.

Goddard
 
Kate
,
Roudsari
 
Abdul
, and
Wyatt
 
Jeremey C.
.
2012
. “
Automation Bias: A Systematic Review of Frequency, Effect Mediators, and Mitigators
.”
Journal of the American Medical Informatics Association
.
19
(
1
):
121
7
.

Goddard
 
Kate
,
Roudsari
 
Abdul
, and
Wyatt
 
Jeremey C.
.
2014
. “
Automation Bias: Empirical Results Assessing Influencing Factors
.”
International Journal of Medical Informatics
.
83
(
5
):
368
75
.

Haas
 
Peter M.
 
1992
. “
Introduction: Epistemic Communities and International Policy Coordination
.”
International Organization
.
46
(
1
):
1
35
.

HAI at Stanford University
.
2021
.
Global AI Vibrancy Tool: Who’s Leading the Global AI Race?
.
Stanford, CA
:
Stanford University
.
Accessed December 29, 2023. https://aiindex.stanford.edu/vibrancy/
.

Hawley
 
John K.
 
2017
. “
Patriot Wars. Center for a New American Security
.” .

Heikkilä
 
Melissa
.
2023a
. “
A Chatbot that Asks Questions Could Help You Spot When It Makes No Sense
.” .

Heikkilä
 
Melissa
.
2023b
. “
We Know Remarkably Little about How AI Language Models Work
.” .

Hersman
 
Rebecca
.
2020
. “
Wormhole Escalation in the New Nuclear Age
.”
Texas National Security Review
.
3
:
90
09
.

Hoffman
 
Aaron M.
 
2002
. “
A Conceptualization of Trust in International Relations
.”
European Journal of International Relations
.
8
(
3
):
375
01
.

Horowitz
 
Michael C.
 
2019
. “
When Speed Kills: Lethal Autonomous Weapon Systems, Deterrence and Stability
.”
Journal of Strategic Studies
.
42
(
6
):
764
88
.

Horowitz
 
Michael C.
.
2020
. “
Do Emerging Military Technologies Matter for International Politics?
Annual Review of Political Science
,
23
:
385
00
.

Horowitz
 
Michael C.
, and
Kahn
 
Lauren
.
2021
. “
What Influences Attitudes about Artificial Intelligence Adoption: Evidence from US Local Officials
.”
PLoS One
.
16
(
10
):
 e0257732
.

Horowitz
 
Michael C.
,
Kahn
 
Lauren
,
Macdonald
 
Julia
, and
Schneider
 
Jacquelyn
.
2022
. “
COVID-19 and Public Support for Autonomous Technologies—Did the Pandemic Catalyze a World of Robots?
PLoS One
.
17
(
9
):
e0273941
.

Horowitz
 
Michael C.
,
Kahn
 
Lauren
,
Macdonald
 
Julia
, and
Schneider
 
Jacquelyn
.
2023
. “
Adopting AI: How Familiarity Breeds both Trust and Contempt
.”
AI & Society
.
May
:
1
15
.

Horowitz
 
Michael C.
, and
Lin-Greenberg
 
Erik
.
2022
. “
Algorithms and Influence Artificial Intelligence and Crisis Decision-Making
.”
International Studies Quarterly
.
66
(
4
):
sqac069
.

Hudson
 
Valerie M.
 
2019
.
Artificial Intelligence and International Politics
.
New York, NY
:
Routledge
.

Jensen
 
Benjamin M.
,
Whyte
 
Christopher
, and
Cuomo
 
Scott
.
2020
. “
Algorithms at War: The Promise, Peril, and Limits of Artificial Intelligence
.”
International Studies Review
.
22
(
3
):
526
50
.

Johnson
 
James
.
2021a
.
Artificial Intelligence and the Future of Warfare: The USA, China, and Strategic Stability
.
Manchester
:
Manchester University Press
.

Johnson
 
James
.
2021b
. “
'Catalytic Nuclear War’ in the Age of Artificial Intelligence & Autonomy: Emerging Military Technology and Escalation Risk between Nuclear-Armed States
.”
Journal of Strategic Studies
.
44
:
1
41
.

Johnston
 
Patrick B.
, and
Sarbahi
 
Anoop K.
.
2016
. “
The Impact of Us Drone Strikes on Terrorism in Pakistan
.”
International Studies Quarterly
.
60
(
2
):
203
19
.

Kahn
 
Lauren
.
2022a
. “
How Ukraine is Remaking War
.”
Foreign Affairs. Accessed December 29, 2023. https://www.foreignaffairs.com/ukraine/how-ukraine-remaking-war
.

Kahn
 
Lauren
.
2022b
. “
Mending the “Broken Arrow”: Confidence Building Measures at the AI-Nuclear Nexus
.” .

Kertzer
 
Joshua D.
 
2022
. “
Re-Assessing Elite-Public Gaps in Political Behavior
.”
American Journal of Political Science
.
66
(
3
):
539
53
.

Kertzer
 
Joshua D.
, and
Renshon
 
Jonathan
.
2022
. “
Experiments and Surveys on Political Elites
.”
Annual Review of Political Science
.
25
:
529
50
.

Konaev
 
Margarita
,
Huang
 
Tina
, and
Chahal
 
Husanjot
.
2021
. “
Trusted Partners: Human-Machine Teaming and the Future of Military AI. Center for Security and Emerging Technology
.” .

Kreps
 
Sarah E.
 
2016
.
Drones: What Everyone Needs to Know
.
New York, NY
:
Oxford University Press
.

Kreps
 
Sarah E.
, and
Wallace
 
Geoffrey P.R.
.
2016
. “
International Law, Military Effectiveness, and Public Support for Drone Strikes
.”
Journal of Peace Research
.
53
(
6
):
830
44
.

Kruger
 
Justin
, and
Dunning
 
David
.
1999
. “
Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments
.”
Journal of Personality and Social Psychology
.
77
(
6
):
1121
34
.

Kydd
 
Andrew H.
 
2007
.
Trust and Mistrust in International Relations
.
Princeton, NJ
:
Princeton University Press
.

Lam
 
Shun Yin
,
Chiang
 
Jeongwen
, and
Parasuraman
 
A.
.
2008
. “
The Effects of the Dimensions of Technology Readiness on Technology Acceptance: An Empirical Analysis
.”
Journal of Interactive Marketing
.
22
(
4
):
19
39
.

Lin-Greenberg
 
Erik
.
2022
. “
Wargame of Drones: Remotely Piloted Aircraft and Crisis Escalation
.”
Journal of Conflict Resolution
.
66
(
10
):
1737
65
.

Luhmann
 
Niklas
.
1979
.
Trust and Power
.
New York, NY
:
John Wiley and Sons
.

Lyell
 
David
, and
Coiera
 
Enrico
.
2016
. “
Automation Bias and Verification Complexity: A Systematic Review
.”
Journal of the American Medical Informatics Association
.
24
(
2
):
423
31
.

Massey
 
Cade
,
Simmons
 
Joseph P.
, and
Dietvorst
 
Berkeley J.
.
2015
. “
Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err
.”
Journal of Experimental Psychology: General
.
144
(
1
):
114
26
.

Ministry of Defense
.
2004
. “
Aircraft Accident to Royal Air Force Tornado GR MK4A ZG710
.” .

Mir
 
Asfandyar
, and
Moore
 
Dylan
.
2019
. “
Drones, Surveillance, and Violence: Theory and Evidence from a Us Drone Program
.”
International Studies Quarterly
.
63
(
4
):
846
62
.

Mosier
 
Kathleen L.
, and
Skitka
 
Linda
.
1996
. “
Human Decision Makers and Automated Decision Aids: Made for Each Other?
” In
Automation and Human Performance: Theory and Applications
, edited by
Parasuraman
 
Raja
, and
Mouloua
 
Mustapha
,
201
20
..
Mahwah, N.J
:
Lawrence Erlbaum Associates
.

OECD AI
.
2022
.
National Ai Policies & Strategies
.
OECD AI Policy Observatory
.
Accessed December 29, 2023. https://oecd.ai/en/dashboards
.

Parasuraman
 
A.
, and
Colby
 
Charles L.
.
2014
. “
An Updated and Streamlined Technology Readiness Index: TRI 2.0
.”
Journal of Service Research
.
18
(
1
):
59
74
.

Parasuraman
 
Raja
, and
Manzey
 
Dietrich H.
.
2010
. “
Complacency and Bias in Human Use of Automation: An Attentional Integration
.”
Human Factors: The Journal of the Human Factors and Ergonomics Society
.
52
(
3
):
381
10
.

Povyakalo
 
Andrey A.
,
Alberdi
 
Eugenio
,
Strigini
 
Lorenzo
, and
Ayton
 
Peter
.
2013
. “
How to Discriminate between Computer-Aided and Computer-Hindered Decisions
.”
Medical Decision Making
.
33
(
1
):
98
07
.

Reichenbach
 
Juliane
,
Onnasch
 
Linda
, and
Manzey
 
Dietrich
.
2010
. “
Misuse of Automation: The Impact of System Experience on Complacency and Automation Bias in Interaction with Automated Aids
.”
Proceedings of the Human Factors and Ergonomics Society Annual Meeting
.
54
(
4
):
374
8
.

Reiter
 
Dan
.
1994
. “
Learning, Realism, and Alliances: The Weight of the Shadow of the Past
.”
World Politics
.
46
(
4
):
490
26
.

Sanchez
 
Carmen
, and
Dunning
 
David
.
2018
. “
Overconfidence among Beginners: Is a Little Learning a Dangerous Thing?
Journal of Personality and Social Psychology
.
114
(
1
):
10
28
.

Scharre
 
Paul
.
2018
.
Army of None: Autonomous Weapons and the Future of War
.
New York, NY
:
WW Norton & Company
.

Schepman
 
Astrid
, and
Rodway
 
Paul
.
2020
. “
Initial Validation of the General Attitudes towards Artificial Intelligence Scale
.”
Computers in Human Behavior Reports
.
22
(
4
):
1
13
.

Sechser
 
Todd S.
,
Narang
 
Neil
, and
Talmadge
 
Caitlin
.
2019
. “
Emerging Technologies and Strategic Stability in Peacetime, Crisis, and War
.”
Journal of Strategic Studies
.
24
(
6
):
727
35
.

Seligman
 
Adam
, and
Montgomery
 
David W.
.
2019
. “
The Tragedy of Human Rights: Liberalism and the Loss of Belonging
.”
Society
.
56
(
3
):
203
9
.

Skitka
 
Linda J.
,
Mosier
 
Kathleen L.
, and
Burdick
 
Mark
.
1999
. “
Does Automation Bias Decision-Making?
International Journal of Human-Computer Studies
,
51
(
5
):
991
1006
.

Southern
 
William N.
, and
Arnstern
 
Julia Hope
.
2009
. “
The Effect of Erroneous Computer Interpretation of ECGs on Resident Decision Making
.”
Society for Medical Decision Making
.
29
(
3
):
372
6
.

Stack Overflow
.
2022
. “
Temporary Policy: Generative AI (e.g., ChatGPT) is Banned
.” .

Tetlock
 
Philip E.
 
2009
.
Expert Political Judgment
.
Princeton NJ
:
Princeton University Press
.

Tomz
 
Michael R.
, and
Weeks
 
Jessica L.P.
.
2013
. “
Public Opinion and the Democratic Peace
.”
American Political Science Review
.
107
(
4
):
849
65
.

Trager
 
Robert F.
, and
Vavreck
 
Lynn
.
2011
. “
The Political Costs of Crisis Bargaining: Presidential Rhetoric and the Role of Party
.”
American Journal of Political Science
.
55
(
3
):
526
45
.

Tyson
 
Alec
, and
Kikuchi
 
Emma
.
2023
. “
Growing Public Concern about the Role of Artificial Intelligence in Daily Life
.” .

U.S. Department of State Bureau of Arms Control, Verification and Compliance
.
2023
. “
Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy
.” .

Wallace
 
Geoffrey P.R.
 
2014
. “
Martial Law? Military Experience, International Law, and Support for Torture
.”
International Studies Quarterly
.
58
(
3
):
501
14
.

Young
 
Kevin L.
, and
Carpenter
 
Charli
.
2018
. “
Does Science Fiction Affect Political Fact? Yes and No: A Survey Experiment on ‘Killer Robots’
.”
International Studies Quarterly
.
62
(
3
):
562
76
.

Zhang
 
Baobao
, and
Dafoe
 
Allan
.
2019
.
Artificial Intelligence: American Attitudes and Trends
.
Oxford
:
University of Oxford
. .

Zvobgo
 
Kelebogile
.
2019
. “
Human Rights versus National Interests: Shifting Us Public Attitudes on the International Criminal Court
.”
International Studies Quarterly
.
63
(
4
):
1065
78
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data