-
PDF
- Split View
-
Views
-
Cite
Cite
Julian Kauk, Helene Kreysa, Stefan R Schweinberger, Large-scale analysis of fact-checked stories on Twitter reveals graded effects of ambiguity and falsehood on information reappearance, PNAS Nexus, Volume 4, Issue 2, February 2025, pgaf028, https://doi.org/10.1093/pnasnexus/pgaf028
- Share Icon Share
Abstract
Misinformation disrupts our information ecosystem, adversely affecting individuals and straining social cohesion and democracy. Understanding what causes online (mis)information to (re)appear is crucial for fortifying our information ecosystem. We analyzed a large-scale Twitter (now “X”) dataset of about 2 million tweets across 123 fact-checked stories. Previous research suggested a falsehood effect (false information reappears more frequently) and an ambiguity effect (ambiguous information reappears more frequently). However, robust indicators for their existence remain elusive. Using polynomial statistical modeling, we compared a falsehood model, an ambiguity model, and a dual effect model. The data supported the dual effect model (
Our study sheds light on the intricate dynamics of (mis)information reappearance on social media, demonstrating that both ambiguity and falsehood significantly contribute to the recurrence of information online. Through the analysis of a large-scale Twitter dataset, we discovered that ambiguity is a more potent predictor of reappearance than falsehood. Nonetheless, our models accounted for only a small fraction of the variance, highlighting the complex and multifaceted nature of online (mis)information dynamics. Despite specific limitations, these findings are pivotal for crafting strategies to strengthen the resilience of our information ecosystem, providing essential guidance for policymakers, journalists, and social media platforms in their efforts to combat misinformation.
Introduction
The spread of online misinformation has become a significant -global concern, disrupting our information ecosystem (1, 2). We define misinformation as any false or misleading information (3, 4). The effects of misinformation are complex and not fully understood (5), ranging from adverse individual impacts (e.g. reduced adherence to health measures; see Ref. (6)) to broader societal issues, including the erosion of social cohesion and democracy, decreased trust in politicians and the press, and weakened support for climate change mitigation (1, 7–9).
Both academia and the public have made substantial efforts to understand and counteract misinformation (see, e.g. (10–15)). However, the field still faces significant challenges (see Refs. (1, 5, 16, 17)). These challenges include establishing a consistent conceptual framework, developing standardized methods for data collection, and determining the actual prevalence and impact of misinformation.
The key role of social media in amplifying misinformation is widely recognized among scholars (see, e.g. (1, 18–20)). Research focuses on how misinformation spreads on social media and how this differs from the spread of credible information. Understanding and predicting the dynamics of online (mis)information, particularly through the study of information cascades, has attracted significant academic attention (see, e.g. (21–24)); for a survey, see Ref. (25). Significant contributions have enhanced our understanding of the dynamics of false and true information cascades. Vosoughi et al. (15), using a large-scale Twitter dataset, found that false news cascades tend to exhibit greater “fitness” in terms of depth, size, and velocity of spread, likely due to perceived novelty. Zhang et al. (26) examined the diffusion of conspiracy- and science-related cascades on Reddit, finding similar patterns to Vosoughi et al. (15), indicating cross-platform validity.
Epidemic models, originally developed to model the spread of infectious diseases, have significantly influenced research on the spread of online (mis)information. These models are frequently used to characterize the overall time series of (mis)information spread and describe cascade diffusion using graph theory (see, e.g. (11, 27, 28)); see also Refs. (29–31). Generally, epidemic models (i) divide a population into different compartments (in the SIR model, individuals are either susceptible [S], infected [I], or recovered [R] from believing a certain piece of information) and (ii) define the transitions between compartments. Various complex models have been proposed to describe the spread of online (mis)information (see Refs. (31, 32)), including nonepidemic models like threshold models (33) and cascading models (34).
However, simple epidemic models, such as the classic SIR model, and other diffusion models often fail to capture the full lifecycle of (mis)information. This is evident when analyzing time series data from social media, which often show numerous erratic peaks over time (cf. (35–37)). These fluctuations are frequently due to exogenous influences affecting the diffusion of (mis)information. While existing models describe the endogenous topology of networks well, they often overlook exogenous variables that modulate (mis)information dynamics. This gap has been noted by Raponi et al. (31), who stated that “propagation seems to be characterized also by factors that are not endogenous to the graph (data) under analysis” (p. 29). Consequently, many studies neglect the important characteristic of how information evolves over longer periods.
A related research question with significant implications for modeling the spread of online (mis)information is the extent to which misinformation reappears after its initial emergence, and whether this tendency differs from that of more credible information. Answering this question is crucial, as cognitive psychology suggests that the repetitive presentation of (false) information can increase its believability, known as the illusory truth effect (see, e.g. (38, 39). Moreover, a differential temporal pattern of (re)appearance could help in detecting misinformation.
Shin et al. (36) studied the temporal dynamics of false and true rumors on Twitter and found that false rumors tended to resurface, while true rumors did not. Specifically, they observed that true rumors appeared only once, while false rumors reemerged an average of
A further factor which may play a crucial role in the reappearance of information is ambiguity, as individuals persistently seek confirmation under uncertainty. We here define ambiguity in information as content whose truthfulness is uncertain or indeterminate, often involving mixed, partial, or conflicting truth claims (see also Refs. (40, 41). In this regard, Mitra et al. (40) explored the temporal dynamics of highly and less credible events on Twitter using the CREDBANK dataset (see Ref. (35)). Although less credible events reappeared more frequently, the effect sizes were smaller than those reported by Shin et al. (36). The authors primarily attributed these findings to uncertainty, as their dataset virtually lacked false events (only one event was rated as inaccurate), but largely contained “uncertain,” “probably accurate,” or “certainly accurate” events. The absence of false events limits the generalizability of their findings, as they cannot attribute their findings to ambiguity or falsehood. More causal evidence for the role of ambiguous information is provided by Allen et al. (42), who investigated whether outright misinformation or factually accurate but misleading content was more effective in driving COVID-19 vaccine hesitancy. They found that “gray area” content—factually accurate yet deceptive—was 46 times more influential in fostering vaccine hesitancy than clear misinformation. This underscores the critical importance of addressing such ambiguous content, which often remains unflagged on social media, but may poses a substantial threat to public health and opinion making.
Research objectives
This work aims to deepen our understanding of the long-term dynamics of online (mis)information through a large-scale analysis of fact-checked stories on Twitter. Data from fact-checking organizations such as Snopes and PolitiFact has been widely used in the literature (see, e.g. (15, 36, 43, 44)). While these organizations often show high agreement when assessing the same story (see Ref. (45)), this approach is not without limitations. Selection biases and resource constraints can influence which stories are fact-checked, raising concerns about the representativeness of the fact-checked dataset (see Refs. (46, 47)). Nevertheless, fact-checking remains a powerful tool for studying online (mis)information. Leveraging its strengths, this work aims to address the following research questions:
Is there an effect of falsehood on the reappearance of online information? Initial exploratory results suggest false information tends to reappear more frequently than true information (see Ref. (36)), but conclusive evidence is needed.
Is there an effect of ambiguity on the reappearance of online information? There is preliminary evidence that ambiguous information tends to reappear more frequently than unambiguous information (see Ref. (40)).
If both effects are replicable, which effect is more prominent? No prior research has simultaneously examined falsehood and ambiguity effects, leaving a gap in understanding their relative importance.
Methods
Data
We utilized the adaptive community-response (ACR) method, as presented by Kauk et al. (17), to compile our Twitter dataset. The ACR method leverages data from fact-checking sources to automatically create queries for retrieving tweets associated with specific (misinformation) stories. This method combines keywords from fact-checking sources and adapts to Twitter’s syntax by including Twitter-specific terms relevant to the story. The ACR method is designed to perform only queries that exhibit sufficient precision while maximizing recall for a particular story.
Each story in our dataset was verified by one of two major fact-checking sources, Snopes and PolitiFact (48, 49). These sources primarily validate stories related to the United States but also address global topics. Relying on fact-checkers’ verdicts is a standard practice in the field (cf. (15, 18, 36, 50–52)), allowing researchers to accurately assess information credibility. Fact-checking organizations typically use an ordinal scale to rate story credibility, usually ranging from “highly false” to “completely true.” Consistent with prior research (cf. (15, 35)), we employed a five-level Likert scale to represent the credibility of the stories (see Table 1). Basic information for each story (including story claim, fact-checking source, query, number of tweets, onset, and end) is available as a .csv file on OSF.
Mapping of the textual ratings of the fact-checking pages to the five-level Likert scale.
Fact-checking source . | 0 . | 1 . | 2 . | ||
---|---|---|---|---|---|
False . | Mostly false . | Mixed . | Mostly true . | True . | |
Snopes | “False” “Scam” | “Mostly False” | “Mixture” | “Mostly True” | “True” |
PolitiFact | “Pants on Fire” “False” | “Mostly False” | “Half True” | “Mostly True” | “True” |
Fact-checking source . | 0 . | 1 . | 2 . | ||
---|---|---|---|---|---|
False . | Mostly false . | Mixed . | Mostly true . | True . | |
Snopes | “False” “Scam” | “Mostly False” | “Mixture” | “Mostly True” | “True” |
PolitiFact | “Pants on Fire” “False” | “Mostly False” | “Half True” | “Mostly True” | “True” |
Mapping of the textual ratings of the fact-checking pages to the five-level Likert scale.
Fact-checking source . | 0 . | 1 . | 2 . | ||
---|---|---|---|---|---|
False . | Mostly false . | Mixed . | Mostly true . | True . | |
Snopes | “False” “Scam” | “Mostly False” | “Mixture” | “Mostly True” | “True” |
PolitiFact | “Pants on Fire” “False” | “Mostly False” | “Half True” | “Mostly True” | “True” |
Fact-checking source . | 0 . | 1 . | 2 . | ||
---|---|---|---|---|---|
False . | Mostly false . | Mixed . | Mostly true . | True . | |
Snopes | “False” “Scam” | “Mostly False” | “Mixture” | “Mostly True” | “True” |
PolitiFact | “Pants on Fire” “False” | “Mostly False” | “Half True” | “Mostly True” | “True” |
While we categorize “False” and “True” stories as misinformation and true information, respectively, we classify the intermediate categories as ambiguous information. Ambiguous information, in line with our definition provided in the introduction, refers to content rated by fact-checkers as “Mostly false,” “Mixed,” or “Mostly true,” indicating that the veracity is not clear-cut and allows for multiple interpretations.
Data cleaning and preparation
We conducted several steps to enhance data quality and prepare it for final analyses. A crucial step involved excluding tweets that were (i) irrelevant or (ii) contrary to the specific story. This was accomplished by combining basic natural language processing (NLP) tools with advanced transformer-based models, which have notably advanced NLP performance recently (cf. (53)).
Initially, we cleaned our tweets (we only considered original tweets and their retweets) by removing unwanted characters. Specifically, URLs and various irregular characters were eliminated, allowing only alphanumeric characters ([A-Za-z0-9]) and specific punctuation characters ([?!.,’]). All text was converted to lowercase. Mentions (@) and hashtags (#) were kept, though the symbols @ and # were removed, as mentions and hashtags often hold crucial context in tweets. To filter out irrelevant tweets, we utilized a sentence-transformers model (see Refs. (54, 55)) to measure semantic similarity between tweets and story claims. Specifically, we calculated the cosine similarity
The optimal threshold for balancing recall and precision can vary depending on the NLP tool, typically lying in an intermediate range due to the precision-recall tradeoff (17). Considering these factors, we predefined a text similarity threshold of
Finally, to enhance the robustness of our response variables, we excluded
Response variables
We used the same measures as Shin et al. (36, p. 281) to characterize the temporal dynamics of the stories. We will introduce the two measures, peak frequency and burstiness, in the upcoming sections.
Peak frequency
This measure is based on a simple peak-finding algorithm capable of identifying local maxima by comparing neighboring data points. We applied two criteria to ensure that the peaks represented meaningful entities. A minimal height of
Formally, we consider a time series X, as given by
where
where
The peaks set (
Burstiness
This measure reflects the concentration of tweets, calculated as the ratio of the maximum deflection in the time series to its total sum. Burstiness is therefore given by:
A
Modeling
We used two types of regression models within the generalized linear model (GLM) framework to analyze our response variables. We modeled peak frequency using zero-truncated negative binomial regression (59), a statistical technique designed for count data without zero counts and capable of accounting for overdispersion (i.e. variance greater than the mean). We used the r package Countreg (version 0.3-0) for maximum likelihood estimation of the parameters. The link function is defined as:
where
We employed beta regression (60) to model burstiness, which is well-suited for continuous data constrained within the interval (0, 1). For this, we used the r package Betareg (version 3.1-4) for maximum likelihood estimation of the parameters. The link function used is:
where
In this study, we investigate three model structures: linear, quadratic without a linear term, and quadratic with a linear term. These structures are applied using both zero-truncated negative binomial regression and beta regression. The detailed model structures will be explained in the following paragraphs.
Falsehood model
We tested the relationship between the rating of the stories (see Table 1) and the response variable (peak frequency or burstiness) via a linear model stricture. We assumed linear relationships between information credibility and our response variables, corresponding to the falsehood effect reported by Shin et al. (36). This model assumes no effect of ambiguity on the response variables, meaning that the variance of our outcome variables is solely explainable by the falsehood effect.
The model is given by:
where
Ambiguity model
We used a constrained quadratic model structures without a linear term to model the relationship between the rating of the stories and our outcome variables. This model resembles the proposed ambiguity effect reported by Mitra et al. (40), where mixed information (exhibiting maximal ambiguity) should have the highest number of peaks. This model assumes an effect of ambiguity only, with no effect of falsehood.
The model is given by:
where
Dual effect model
Finally, we employed an unconstrained quadratic model structure to examine the relationship between the rating of stories and our response variables, assuming modulation by both falsehood and ambiguity (see Refs. (36, 40)). This model incorporates both a linear term (falsehood effect) and a quadratic term (ambiguity effect).
The model is represented as:
where
Model selection
We used (pseudo) coefficients of determination
Thus,
Table 2 summarizes the model structures considered in the present study, including a null model without falsehood and ambiguity effects.
Model structures under considerations in the present study and the respective effects they are testing.
Model . | Equation . | Effect . |
---|---|---|
Intercept | No effect | |
Linear | Falsehood effect | |
Quadratic (constr.) | Ambiguity effect | |
Quadratic (unconst.) | Dual effect of falsehood and ambiguity |
Model . | Equation . | Effect . |
---|---|---|
Intercept | No effect | |
Linear | Falsehood effect | |
Quadratic (constr.) | Ambiguity effect | |
Quadratic (unconst.) | Dual effect of falsehood and ambiguity |
Model structures under considerations in the present study and the respective effects they are testing.
Model . | Equation . | Effect . |
---|---|---|
Intercept | No effect | |
Linear | Falsehood effect | |
Quadratic (constr.) | Ambiguity effect | |
Quadratic (unconst.) | Dual effect of falsehood and ambiguity |
Model . | Equation . | Effect . |
---|---|---|
Intercept | No effect | |
Linear | Falsehood effect | |
Quadratic (constr.) | Ambiguity effect | |
Quadratic (unconst.) | Dual effect of falsehood and ambiguity |
Results
Note that all model outputs and scripts are available on OSF. All models converged.
Tweet sample
Overall, we recorded
In total, 123 stories (false: 92, mixed: 9, and true: 22) surpassed the threshold of
Our dataset encompasses various domains and topics, including elections, climate change, COVID-19, economy, and Ukraine, among others (see the word cloud in the Supplementary file 1, Fig. 1). A summary table detailing all 123 stories is available on OSF.

Falsehood, ambiguity, and dual effect models for peak frequency. A) Relationship between rating and peak frequency, where dashed curves represent the fits of the inferior models, and the solid line represents the fit of the superior model. B) Effect estimates (rate ratios) for the falsehood and ambiguity effects, with error bars indicating standard errors.
Peak frequency
Over all stories, peak frequency ranged from 1 to 31 with a mean of
Main results of the zero-truncated negative binomial regression models for peak frequency.
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | 0 | – | ||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | 0 | – | ||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
Note.
Main results of the zero-truncated negative binomial regression models for peak frequency.
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | 0 | – | ||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | 0 | – | ||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
Note.
Falsehood model
Initially, we examined whether the rating linearly predicted peak frequency, resembling a falsehood effect. As anticipated, the model indicated a negative relationship between rating and peak frequency (see Fig. 1A). However, the linear coefficient was not significantly different from zero (
Ambiguity model.
Subsequently, we tested whether the rating predicted peak frequency in a quadratic fashion, indicating an ambiguity effect. As expected, the model predicted an inverted U-shaped curve (see Fig. 1A), supported by a significant negative quadratic coefficient (
Dual effect model
Finally, we evaluated whether a dual effect model, incorporating both a linear term (falsehood effect) and a quadratic term (ambiguity effect), could predict peak frequency (see Fig. 1A). Consistent with the ambiguity model, we found a significant negative quadratic coefficient (
The AIC of the dual effect model (
The relatively greater impact of ambiguity is also illustrated in Fig. 1B, where each effect is treated as a dichotomous variable, reflecting differences between false (rating:
Burstiness
Over all stories, burstiness ranged from
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | – | |||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | – | |||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
Note.
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | – | |||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
. | Coefficient . | Model diagnostic . | Relative likelihood . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Model . | AIC . | AICc . | BIC . | Null . | Falseh. . | Ambig. . | Dual . | |||||
Null | – | – | – | |||||||||
Falsehood | – | – | ||||||||||
Ambiguity | – | – | ||||||||||
Dual effect | – |
Note.
Falsehood model
The falsehood model indicated a positive relationship between rating and burstiness (see Fig. 2A). However, similar to our findings for peak frequency, the linear coefficient was not significantly different from zero (

Falsehood, ambiguity, and dual effect models for burstiness. A) Relationship between rating and burstiness, where dashed curves represent the fits of the inferior models, and the solid line represents the fit of the superior model. B) Effect estimates (odds ratios) for the falsehood and ambiguity effects, with error bars indicating standard errors.
Ambiguity model
The ambiguity model predicted a U-shaped curve (see Fig. 2A), supported by a significant negative quadratic coefficient (
Dual effect model
In the dual effect model, we found a significant positive quadratic coefficient (
The AIC of the dual effect model (
The relatively greater effects of ambiguity are again expressed in Fig. 2B, where each effect is treated as a dichotomous variable. The effect estimates (odds ratios) were higher for ambiguity (
Control analyses
We conducted a set of control analyses to ensure the robustness of our main analysis. Here, we present the main results of the control analyses, with further information provided in the Supplementary material.
Poisson models for peak frequency
Zero-truncated negative binomial regression models address overdispersion in the data using an additional parameter. Poisson regression models do not include this parameter, making them more parsimonious. Model comparisons confirmed that zero-truncated negative binomial regression models were appropriate, as all Poisson models showed signs of overdispersion, confirmed by likelihood ratio tests (
Control models: Cubic and saturated models
To validate the robustness of our superior models (the dual effect model for peak frequency and the ambiguity model for burstiness), we used two overparameterized models as controls: a cubic model (including linear, quadratic, and cubic terms) and a saturated model (treating the fact-checking rating as a categorical variable). Our superior models outperformed the cubic and saturated models for both response variables, as confirmed by likelihood ratio tests and AIC differences.
One-inflation check for peak frequency
We also checked for one-inflation of the peak frequency, which would indicate an overrepresentation of one counts in the data. We found no evidence of one-inflation, as confirmed by likelihood ratio tests.
Orthogonalization of linear and quadratic predictors
Predictors within a polynomial regression can be correlated, leading to biased estimates of the regression coefficients. We orthogonalized our polynomials, resulting in uncorrelated linear and quadratic predictors, which allowed us to better distinguish both effects. The orthogonal polynomial regression modeling confirmed the superiority of ambiguity effects over falsehood effects. While predicting more significant ambiguity effects, the significance of the linear coefficients declined. These findings confirm the prominence of ambiguity effects and demonstrate that falsehood effects are less stable than previously reported in the literature.
Sensitivity analysis for the text similarity threshold
To ensure that the observed effects were not dependent on a specific threshold, we tested the models across a wide range of threshold values (from
Sensitivity analysis for the minimum tweet count per story
We performed a sensitivity analysis to assess whether our findings held across different thresholds for the minimum number of tweets per story. For peak frequency, the linear effect of story credibility was found to be rather weak and inconsistent, while the quadratic effect showed much greater stability, pointing to a stronger ambiguity effect. Similarly, for burstiness, the quadratic effect remained stable across different thresholds, suggesting that the ambiguity effect is robust and less influenced by the minimum tweet count per story.
Sensitivity analysis for the peak detection parameters
We employed the peak-finding algorithm proposed by Shin et al. (36), which detects local maxima by comparing neighboring data points. In line with Shin et al. (36), we applied a minimum peak height of
Control model incorporating tweet- and user-level covariates
We conducted a control analysis by adjusting the resolution of our model from the story level to the peak level and incorporating relevant covariates. Using a mixed ordinal regression, we examined the effects of tweet and user characteristics (tweet length, hashtags, mentions, text similarity with the original claim, negative sentiment, follower count, and account verification) on peak frequency. The results showed a robust ambiguity effect on story reappearance and significant effects for text similarity and negative sentiment, while the falsehood effect was not significant anymore.
Control analysis excluding “Scam” and “Pants on Fire” categories
To address potential inconsistencies with the 5-point Likert scale, we conducted a control analysis excluding the “Scam” and “Pants on Fire” categories. The results confirmed the robustness of the dual effect model for peak frequency and the ambiguity model for burstiness. For peak frequency, the linear and quadratic coefficients remained significant, and the dual effect model showed the best fit based on AIC values. Similarly, for burstiness, the ambiguity model retained its significance and exhibited the best model fit.
Reanalysis of Mitra et al. (40)
We performed a preliminary reanalysis of the data provided by Mitra et al. (35, 40) to determine whether an ambiguity model my also better predict their data. Despite the limitations of their dataset, particularly the virtual absence of inaccurate events, our modeling indicated a slight superiority of the ambiguity over the falsehood model for burstiness. The ambiguity model was
Generalizability of results across fact-checking sources
We tested the dual effect model for peak frequency and the ambiguity model for burstiness separately for Snopes and PolitiFact to evaluate the generalizability of our findings. Both sources showed consistent trends, supporting the presence of ambiguity and falsehood effects, despite differences in their story selection criteria and methods of analysis. These preliminary results suggest that the observed patterns are robust across diverse fact-checking approaches.
Temporal stability of false stories: exploring potential effects of Twitter policy changes
Twitter’s moderation policies may have influenced our results by suppressing certain types of misinformation, particularly following major political events. To assess this, we analyzed trends in the peak frequency and burstiness of false stories over time. Using both a temporal median split and generalized additive models (GAMs), we found no evidence of significant linear or nonlinear temporal effects on these variables. While our findings suggest minimal impact of policy changes on the temporal dynamics of false stories in this dataset, future studies with larger datasets could offer greater sensitivity to detect subtle effects.
Validating model significance with permutation tests
We employed permutation tests to validate the significance of our models, leveraging their robustness to violations of assumptions and suitability for small datasets. By comparing observed z-values to null distributions generated through resampling, we confirmed the robustness of the ambiguity effect across both response variables. However, the falsehood effect proved less stable, with its significance diminished under permutation testing.
Discussion
In this study, we evaluated the long-term temporal dynamics of a collection of stories, each with a fact-checking rating. This addressed a significant research gap, as longitudinal studies of (mis)information dynamics are rare. We explored the relationship between the credibility of information and its temporal patterns. Consistent with previous research (see Ref. (36)), we observed that false information reappeared more frequently than true information. More importantly, however, this relatively small effect was accompanied by a pronounced ambiguity effect. Specifically, we found that ambiguous information is less concentrated but reappears more often than unambiguous information. Our study contributes to the literature by tracking misinformation stories over an extended period, highlighting important dynamics in the spread of (mis)information.
Theoretical considerations
The present study establishes the dual effect of ambiguity and falsehood, previously investigated only individually. This dual influence has profound implications for our understanding of the information ecosystem on social media, which will be discussed below. However, our study does not explain why both ambiguous and false information reoccur more frequently than unambiguously true information. Future work should explore the underlying mechanisms.
Regarding the falsehood effect, we suspect that two factors are involved. First, misinformation, when distributed as disinformation (i.e. with intent) (cf. (3)), may be redistributed by certain spreaders to generate additional public attention. Spreaders may be humans or bots, the latter possibly programmed to do so deliberately (see Ref. (64)). This study did not investigate the role of bots in redistributing misinformation; future research should explore which user groups tend to “reawaken” misinformation.
Another possible explanation for the falsehood effect is the emotional language of misinformation. Misinformation often carries higher emotional arousal than true information (cf. (65, 66), which may help it persist longer in the receiver’s mind and generate sustained attention. Disinformation spreaders may intentionally increase emotional arousal to achieve better spread. However, the extent to which this contributes to the falsehood effect remains to be fully explored.
The prominent ambiguity effect could be explained by cognitive factors: Uncertainty about an event or piece of information elicits greater cognitive involvement than certainty, as it makes the environment less predictable (67). As a result, individuals may seek further information to increase predictability, which could explain the tendency of ambiguous information to reappear. This explanation aligns with Mitra et al. (40), who argued that “perhaps the underlying uncertainty of an event is what keeps people yearning for additional information…” (p. 11). In-depth analyses of content changes over time may provide a better understanding of why ambiguous information reappears.
Both effects contributed differently to explain the variance of our response variables. We observed a much more pronounced ambiguity effect, whereas the falsehood effect had a smaller impact. The comparatively limited importance of the falsehood effect may contrast with initial findings by Shin et al. (36), who reported that “most false rumors repeated periodically [11 out of 13], whereas true rumors did not [0 out of 4]” (p. 284). Our data, based on a larger and more heterogeneous dataset, suggest a weaker falsehood effect than their initial report.
Practical implications
The present study has significant implications for both the scientific community and the public. Politicians, journalists, and social media providers could all benefit from our findings. These groups should be aware of the illusory truth effect, which posits that repetitive presentation of information increases its plausibility. Our findings suggest that false and ambiguous information is more plausibly enhanced compared to unambiguously true information. Malicious actors on social media may exploit this by designing “misinformation attacks” that mix false information with some true aspects, leveraging both falsehood and ambiguity. Our data support this, showing that “mostly false” information had more peaks compared to all other ratings.
These findings have specific implications for the aforementioned groups. Politicians should craft clear public statements to reduce the risk of their statements being repeatedly echoed. Journalists should strive to reduce ambiguity in their reporting and engage in peer-to-peer fact-checking to decrease the prevalence of false information. Social media providers should consider our findings when designing their platforms. They could use the reappearance of information as a signal to detect false and ambiguous content, allowing them to flag such information with warnings, potentially reducing its spread (see, e.g. (28)). Timing is crucial: while novel and unreviewed information may promise high outreach, it often comes at the expense of accuracy and clarity. In the public interest, all groups should act carefully and prudently, prioritizing the accuracy of public discourse over speed.
Limitations and future work
It is important to note that the effects of both falsehood and ambiguity accounted only for a low proportion of the variance in our response variables, with the best models explaining less than
Improved peak identification could help reduce noise in the data. The peak detection algorithm proposed by Shin et al. (36) is a good starting point but has weaknesses: its parameter values are chosen arbitrarily, and the results depend on these values. Burstiness, another response variable, may be more robust since it does not depend on parameter values; however, it is only an indirect measure of peak frequency. Future studies could employ more advanced temporal analyses to improve peak detection. Moreover, operationalizing ambiguity at the tweet level represents a promising avenue for future research: In the present study, ambiguity is assessed exclusively through the verdict of a fact-checking source. Incorporating tweet-level measures of ambiguity, such as perplexity or entropy, could enhance our understanding of the specific types of ambiguity that drive the reappearance of information.
This study offers valuable insights into the behavior of online (mis)information while highlighting opportunities for future research to broaden its scope. Expanding beyond fact-checked stories, future work could integrate diverse data sources, such as content from different social media platforms or news domains (see, e.g. (46)). Combining these sources through a mixed-method approach could enable a deeper understanding of the dynamics between low- and high-credibility content. Fact-checked stories, although precise due to rigorous professional evaluation, are potentially subject to selection biases, as organizations may prioritize certain types of stories for verification (46, 47, 69). Meanwhile, analyzing content from broader news domains may capture a wider variety of stories, albeit with lower accuracy since not all items undergo individual scrutiny. Previous research has demonstrated parallels in the trends observed with both approaches (15), suggesting their complementarity. Future studies should delve further into the relationship between these methods to enhance our understanding of the complex ecosystem of online (mis)information.
Future studies should examine the impact of social networking service policy changes on response variables more closely. In this study, data were accessed post hoc via the Twitter API, meaning some tweets or accounts may have been deleted or suspended due to policy changes and were therefore inaccessible during data collection. This issue is particularly relevant to misinformation following major political events, such as the 2016 U.S. presidential election or the January 6, 2021, Capitol attack (see Refs. (70, 71)). Twitter is notably affected by tweet and account deletions, even within the first few days of publication (e.g. (72)). Zubiaga (73) demonstrated that Twitter data availability diminishes over time, with losses that can exceed
Nonetheless, recent limitations on accessing social media data, particularly from platforms like Twitter/X, present serious challenges for this area of research and raise concerns about the future of academic inquiry in this field. Since Elon Musk’s acquisition of Twitter, changes to API pricing and restrictions on data scraping have made it nearly impossible to obtain the necessary data (cf. (74)), hindering ongoing and future studies. In this study, a mixed-method approach could have strengthened the relevance of our conclusions. However, we were restricted to analyzing a limited number of 123 fact-checked stories, leaving the examination of news domain dynamics incomplete. Like many others, we lost access to the Twitter API following Musk’s takeover, which has limited our capacity to fully explore the dynamics of online (mis)information. We call on social media platforms, policymakers, and other stakeholders to facilitate greater public access to social media data for researchers. This is essential to addressing critical issues in the online space, such as misinformation, hate speech, polarization, and censorship.
Conclusion
In this study, we analyzed a large-scale Twitter dataset and uncovered a dual effect of ambiguity and falsehood on the tendency of information to reappear. While initial evidence existed for both effects, their simultaneous presence had not been confirmed before. Our work addressed this gap and provided insights into their relative importance, revealing that the ambiguity effect is considerably more pronounced than the falsehood effect. These findings can guide politicians, journalists, and social media providers in designing information architectures that promote a more resilient online information ecosystem.
Acknowledgments
We thank Ylva Stenberg for her suggestions regarding the figure styling. During the preparation of this work the authors used GPT-4o in order to improve the grammar, spelling, and conciseness of the text. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article. We acknowledge support by the German Research Foundation Projekt-Nr. 512648189 and the Open Access Publication Fund of the Thueringer Universitaets- und Landesbibliothek Jena. We acknowledge support by the LIBERTY Connect Fund of the Friedrich Schiller University Jena.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Author Contributions
J.K.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing—original draft, writing—review and editing; H.K.: project administration, validation, writing—review & editings; S.R.S.: project administration, resources, supervision, validation, writing—review and editing.
Data Availability
The data and code underlying this article are available in the Open Science Framework (OSF) at https://osf.io/863cu/?view_only=9e463738a64f431981236a9708f2dd45.
References
Footnotes
The other respective effect is still considered as a continuous variable in the regression model.
Author notes
Competing Interest: The authors declare no competing interests.