The Efficacy Potential of Cyber Security Advice as Presented in News Articles

Basic functional requirements for the news-scraper.

Requirement	Detail
1	The ability to systematically search for news articles within a set time frame utilizing pre-set search queries.
2	The ability to extract the full content from news articles.
3	The ability to extract metadata, including publication date, author(s), titles, source names and country of origin, for further analysis.

Requirement	Detail
1	The ability to systematically search for news articles within a set time frame utilizing pre-set search queries.
2	The ability to extract the full content from news articles.
3	The ability to extract metadata, including publication date, author(s), titles, source names and country of origin, for further analysis.

TABLE 1

Basic functional requirements for the news-scraper.

Requirement	Detail
1	The ability to systematically search for news articles within a set time frame utilizing pre-set search queries.
2	The ability to extract the full content from news articles.
3	The ability to extract metadata, including publication date, author(s), titles, source names and country of origin, for further analysis.

Requirement	Detail
1	The ability to systematically search for news articles within a set time frame utilizing pre-set search queries.
2	The ability to extract the full content from news articles.
3	The ability to extract metadata, including publication date, author(s), titles, source names and country of origin, for further analysis.

FIGURE 1

An overview of the news-scraper tool.

We utilized a news-aggregation API to filter content from a variety of unstructured and structured news sources were consistent with our definition, and we added functions to enable the complete capture of content, in accordance with Requirements 2 and 3.

The captured data was then fed into a data-storage pipeline before being converted into a flat-file database storage solution. Incoming data was merged with existing records when required to avoid duplicate data.

3.2 The search terms

To fulfill Requirement 1, we followed the precedent of Schatz et al. (2017), who sought to derive a more precise definition of security by utilizing Google Trends to automatically collect the phrases that individuals were using to search for security content. As this had to be accomplished from the perspective of our individual user, this excluded the possibility of replicating the work of Humayun et al. (2020) who looked at primary studies undertaken within academia. Instead, we followed the Systematic Mapping Study protocol of Kosar et al. (2016).

We defined a set of search and inclusion/exclusion criteria (for example, Cybersecurity OR Cyber AND Security) and additional queries containing both base search terms and queries derived from Google Trends (online OR advice OR protection OR protect OR prevent OR preventative OR tips OR email OR social network OR password OR hack OR hacked OR hacking).

We augmented the Google Trends queries with phrases pertaining to 20 cyber security events that (1) had occurred in the previous 24 months and (2) had been covered by at least 10 major English-language news outlets (for the queries, please refer to Table A8). Except for where it was appropriate within the event searches, all search terms were technology-agnostic—they did not include explicit references to products or services. The news-scraper then carried out searches over a 24-month time span and returned all results that included these terms within the title or body of the content. Therefore, while not exhaustive, our corpus represents security advice as accurately as possible within the confines of our scope.

3.3 Cleaning the data

First, we screened our results according to the inclusion/exclusion criteria. These were defined as follows:

Must be a news or blog article that directly addresses at least one aspect of cyber security/contain our search terminology directly. Blog articles were limited to tutorials, editorials, tool demonstrations and discussion of technical reports.
Must be written in English (due to the nature of our analysis methodology).
Must be accessible and not hidden behind a paywall or other kind of lockout mechanism (as in these cases only a few lines of text may have been retrieved).

Any article found to be in breach of these criteria was excluded.

In this way, we reduced the initial pool of 16,876 usable articles from our first cleaning process to 15,422 individual articles. For the remaining articles our focus and technique were informed by recent work, such as that of Satyapanich et al. (December, 2019), which describes the process for extracting semantic information (such as people, places and events) from security articles and that of Al Moubayed et al. (2017), who used Bayesian topic modelling to ascribe classifications to, and uncover trends in, security and criminal documents. We prepared the corpus for analysis using common data pre-processing techniques. We utilized tokenization to break down the text, first into sentence units and then into individual words. We then replaced uppercase text with lowercase equivalents and removed punctuation. We lemmatized the corpus to standardize the tense and to replace any third-person words with first-person variants. Finally, we used a stemming technique to reduce words to their root form, where appropriate Porter (2006).

3.4 Classifying the data

As we saw in Section 2.4, individuals require actionable elements within their security advice. We therefore utilized an ontological framework to help us classify and integrate the data collected from the sources queried by our news-scraper. The application of ontologies to model and reason about cyber security requirements has gained significant attention in recent years, particularly for complex systems such as critical infrastructure and smart cities. These ontologies provide a formal, machine-readable representation of key concepts and relationships, enabling precise capture and communication of security needs, automated reasoning and analysis, knowledge sharing and reuse across domains and integration of security with other system aspects early in the development lifecycle De Nicola & Villani (2021). While the critical role of end-users in the overall cyber security posture of organizations and systems is increasingly recognized, current research focuses more on incorporating end-user perspectives into broader cyber security frameworks and models. For instance, some frameworks aim to identify users’ security behaviors in real-time and provide targeted interventions Ruighaver et al. (2007), while others leverage serious games to train users in detecting and responding to social engineering threats Hendrix et al. (2016). However, no clear examples of ontologies solely focused on targeting end-users and their immediate requirements were found, indicating a potential gap in current research that warrants further investigation Groš (2021), Oltramari et al. (2015).

In response to this we decided to perform a non-exhaustive search for an ontology which could be applied to end-user behaviour, even if the intended target audience is not explicitly defined as such. We began by searching for ontologies using keywords such as ‘cyber security’, ‘end-user’ and ‘actionable advice’, and then reviewing them against a set of selection criteria shown in Table 2.

TABLE 2

Inclusion criteria for selecting an ontology applicable to end-user cyber security behavior.

ID	Criteria	Justification
1	Evaluative in nature	The ontology should allow for an effective demonstration that a particular level of security has been achieved (efficacy potential) Souag et al. (2015).
2	Accessible to non-technologists	The ontology should be understandable and applicable by end-users, not just IT professionals, using clear language and unambiguous concepts Kendall & McGuinness (2019).
3	Frequently updated	The ontology should incorporate recent security concepts relevant to end-users, such as edge computing Piasecki et al. (2021).

ID	Criteria	Justification
1	Evaluative in nature	The ontology should allow for an effective demonstration that a particular level of security has been achieved (efficacy potential) Souag et al. (2015).
2	Accessible to non-technologists	The ontology should be understandable and applicable by end-users, not just IT professionals, using clear language and unambiguous concepts Kendall & McGuinness (2019).
3	Frequently updated	The ontology should incorporate recent security concepts relevant to end-users, such as edge computing Piasecki et al. (2021).

TABLE 2

Inclusion criteria for selecting an ontology applicable to end-user cyber security behavior.

ID	Criteria	Justification
1	Evaluative in nature	The ontology should allow for an effective demonstration that a particular level of security has been achieved (efficacy potential) Souag et al. (2015).
2	Accessible to non-technologists	The ontology should be understandable and applicable by end-users, not just IT professionals, using clear language and unambiguous concepts Kendall & McGuinness (2019).
3	Frequently updated	The ontology should incorporate recent security concepts relevant to end-users, such as edge computing Piasecki et al. (2021).

ID	Criteria	Justification
1	Evaluative in nature	The ontology should allow for an effective demonstration that a particular level of security has been achieved (efficacy potential) Souag et al. (2015).
2	Accessible to non-technologists	The ontology should be understandable and applicable by end-users, not just IT professionals, using clear language and unambiguous concepts Kendall & McGuinness (2019).
3	Frequently updated	The ontology should incorporate recent security concepts relevant to end-users, such as edge computing Piasecki et al. (2021).

In attempting to follow this criteria we found that many ontologies were indeed aimed at a technical or policy audience and often included several layers of abstraction within the work, used vaguely defined terminology or simply did not include our original requirement of actionable security advice⁶.

An ontology by the Center for Internet Security (CIS)⁷ was chosen, which meets all of the inclusion criteria outlined in Table 2. We discuss how it does so below.

Criteria 1: Although it was not intended specifically for individual users, this ontology prioritizes risk-based security and focuses on the practical mitigation of these risks by identifying and utilizing 20 domain-specific CIS-vectors that represent practical and actionable remedies for security threats. It does so through providing an evaluative framework that allows users to assess their security posture against specific control objectives. The controls are prioritized and provide clear guidance on essential cyber hygiene measures, making them accessible even to those with limited cybersecurity expertise. A high-level version of the framework can be seen in Figure 2. The individual CIS-vectors are discussed in detail in Section 4.
Criteria 2: CIS Controls provide specific, actionable guidance on the most critical steps organizations should take to tangibly improve their security, covering Criteria 2, whereas other ontologies may be more descriptive, or somewhere in between Adach et al. (2022). As an illustrative example of Criteria 2, we refer to Woods et al. (2017), where the CIS ontology was shown in use with insurance underwriting professionals when selecting policy controls.
Criteria 3: According to CIS, the CIS Controls are also frequently updated by a global community of experts to address the evolving threat landscape and incorporate recent security concepts (further confirmed by the fact that version 7.1 was utilized at the time of writing, which was then superseded by version 8) ⁸. In addition to meeting criteria 3, the CIS ontology has been aligned with other ontological frameworks such as that of NIST to allow for easier adoption by organizations and projects⁹.

FIGURE 2

An overview of the CIS-vectors ontological framework.

This pragmatic approach, combined with the clear tie-in to demonstrating security achievement, allows it to provide us with the requisite entity types and properties which are ascribed to individual news articles within the corpus as additional metadata. Thus, we are able to use the ontology to define the entities, relations and other factors that can be extracted from the corpus. The ontology also allowed us to focus the corpus and to restrict our vision to the research objectives, as the language utilized within security can range from extremely specific to extremely ambiguous Ruohonen & Kimppa (2019). In many cases, this range can make it difficult to apply an ontology to specific news articles within the corpus.

3.5 Additional work to encompass null values from CIS-vectors

Given our search terminology, we observed that 6,134 of the 15,422 articles (representing 36.3% of the total corpus) contained references to any of our CIS-vectors. We performed a second pass on the corpus, introducing additional syntactic variants of the terminology utilized within the CIS-vectors. For example, we separated ‘malware defenses’ into ‘malware AND defences’, ‘malware defence’ and ‘malware defense’ to correct for localization issues.

The results of the second pass are illustrated in Figure 4.1. As can be seen, the occurrence rate was subsequently 7,988 articles, or 51.7% of the corpus. Each of these articles contained references to one or more CIS-vectors. For the remaining 48.3% of the corpus, we performed a Latent Dirichlet Allocation (LDA) analysis of these articles in order to generate further details, the results of which are outlined in Section 4.2. LDA is a statistical modeling tool that allows for the discovery of otherwise abstract topics within text files. It provides us with both a topic-per-word and topic-per-document model. To ensure the accurate selection of topic numbers and models, we followed the methodologies proposed by Cao et al. (2009) and Deveaud et al. (2014).

4 Results

In this section we give consideration to our results. First, we corroborate the findings of Alagheband et al. (2020), which indicated that coverage of security topics in the New York Times has steadily increased over the last decade. Figure 3 highlights this increase over time: the ‘vast terra incognita of print’ Taylor & Wolff (2004). The data also exposed the sheer diversity of publishers, ranging from traditional outlets such as the BBC news and CNN through to specialty security blogs. Even so, we must acknowledge that this list is inevitably incomplete, as our search methodology, while extensive, was non-exhaustive and it was limited to English-language media.

FIGURE 3

Articles published per day between January 2015 and December 2020.

Next, we identify the prevalence and features of ‘ideal’ news articles in our corpus and use this information to help answer our research objectives. An ideal news article must contain a summary of the information that an individual user requires (in this case, regarding security advice), eliminating irrelevant and redundant information wherever possible Goldstein et al. (1999). To determine the prevalence of such articles in our corpus, we first utilized our CIS-vectors to ascertain how many of the articles contain content-specific vocabulary that users may expect to find within these articles, and we performed additional analysis on those articles that contained no such terms. We then derived statistics pertaining to sentence length and vocabulary size, which we then compared to third-party corpora (where available). Finally, we utilized sentiment analysis as an efficacy potential measurement tool, building on work by Kalra & Prasad (2019), who used it for stock market assessments. This was done to decipher any trends that could inform our efficacy potential research question.

FIGURE 4

The occurrence rate of CIS-vectors. Null values are excluded.

FIGURE 5

A correlation plot, highlighting in particular the strong correlation between CIS-4 and CIS-16.

4.1 CIS-vector occurrences

Figure 4 highlights the occurrences of our CIS-vectors in the corpus. The most-used CIS-vectors were CIS-13 (Data protection), CIS-11 (Limitation and control of network ports, protocols and services) and CIS-2 (Inventory and control of software assets). CIS-13 highlights the growing trend towards data protection awareness and its relevance for individual users; it occurred 0.4 times per article, on average.

Delving deeper into the reasons for this expanding data protection coverage, we find that, between 2018 and 2019, the most significant topics were related to data breaches, data protection guidelines for individuals and organizations (such as the EU’s General Data Protection Regulation (GDPR)¹⁰ ) and data privacy-related security advice for social media users. In 2020 there was a shift towards protecting health-related data in medical contexts, with advice and threat messaging geared towards disease contact and exposure tracing applications, such as those mentioned by Yasaka et al. (2020).

CIS-11 indicates network security-related information and advice, and its occurrence rate increased significantly between 2019 and the end of 2020. At least one publication (Lindner et al., 2020) notes a similar increase in interest. Again, we found that most of this network security advice was related to privacy, and it appeared in texts ranging from technical articles to installation guides for the Tor Project. In many cases, these articles contained more difficult vocabulary and technical terminology than the average publication.

CIS-2 pertains to software assets and their associated CIS-vectors, and it proved to be one of the most diffuse topic. In our corpus, we found articles linked to Internet of Things home security, smart grid and connected vehicle software and security issues that arise in connection with these devices and services.

Correlations between the CIS-vectors are depicted in Figure 5. The correlations were weak across the corpus, with one notable exception: the correlation between CIS-16 (Account monitoring and control) and CIS-4 (Controlled use of administrative privileges). Though CIS-16 appeared more frequently overall, tokens associated with both vectors appeared consistently between articles.

4.2 Articles containing no CIS-vectors

Table 3 lists the most common topics that occurred in those articles that featured no CIS-vectors from our classification (representing 48.2% of the corpus). The topics were derived through LDA topic modelling, as described in Section 3.5.

TABLE 3

The five most common topics in the non-CIS articles.

Topic 3	Topic 6	Topic 9	Topic 11	Topic 14
Security	Safety	Cyber	Trump	Police
Internet	Health	Security	President	Crime
System	Recovery	Attacks	Election	Cases
Users	Covid-19	Business	Russia	Issue
Datap	Protection	Threats	U.S.	Cyber

Topic 3	Topic 6	Topic 9	Topic 11	Topic 14
Security	Safety	Cyber	Trump	Police
Internet	Health	Security	President	Crime
System	Recovery	Attacks	Election	Cases
Users	Covid-19	Business	Russia	Issue
Datap	Protection	Threats	U.S.	Cyber

TABLE 3

The five most common topics in the non-CIS articles.

Topic 3	Topic 6	Topic 9	Topic 11	Topic 14
Security	Safety	Cyber	Trump	Police
Internet	Health	Security	President	Crime
System	Recovery	Attacks	Election	Cases
Users	Covid-19	Business	Russia	Issue
Datap	Protection	Threats	U.S.	Cyber

Topic 3	Topic 6	Topic 9	Topic 11	Topic 14
Security	Safety	Cyber	Trump	Police
Internet	Health	Security	President	Crime
System	Recovery	Attacks	Election	Cases
Users	Covid-19	Business	Russia	Issue
Datap	Protection	Threats	U.S.	Cyber

We can see that, despite the absence of CIS-vectors, security is still a focal point in these articles. In these cases, though, the focus is on national (cyber) security (Topic 11), cyber crime (Topic 14), business threats (Topic 9) and health and safety issues related to cyber crime and security (Topic 6). Topic 3 embodies similar concepts as CIS-vectors CIS-13 (Data protection), CIS-11 (Limitation and control of network ports, protocols and services) and CIS-2 (Inventory and control of software assets).

4.3 Sentence length and vocabulary size

We use sentence length, vocabulary size and a selection of readability scores as proxies for difficulty.

4.3.1 Sentence length

Sentence length is an often-utilized tool in the discovery of readability within corpora Goldstein et al. (1999), Lim et al. (2018). Figure 6 displays the average article length. The mean article length was 9.92 sentences, and the median length was 10 sentences. We can compare to the work of Goldstein et al. (1999) on the automated summarization of news articles, which led to a corpus of 1,000 Reuters articles with a (post-summarization) average length of 23 sentences. We can also compare this to the work of Lim et al. (2018), whose smaller corpus yielded an average of 14 sentences per article.

The gap between the publication of these comparators (1999 and 2018, respectively) may suggest an overall decline in the length of news articles. It also suggests that our corpus of security-specific news is on the shorter side of the spectrum. This last point is, however, caveated by the fact that a comparison with a more historical data and a wider potential variety of possible sources would be needed to further confirm this finding.

4.3.2 Vocabulary

We estimated the vocabulary growth of the corpus using Heaps’ law Heaps (1978), which describes the relationship between tokens and types. This law states that a vocabulary, expressed as $v$ unique word types, is proportional to the power law of $n$ ⁠, the number of tokens in an arbitrary text. The relation is expressed as

\begin{aligned} v = K n^{β} . \end{aligned}

Here, $K$ is a positive constant and $β$ lies between 0 and 1. In effect, as a body of text increases, the potential to discover new distinct word types decreases. In our corpus, we can see from Figure 7 that the vocabulary range largely adheres to the predicted value (black line). This means that new vocabulary terms are continually arising in the data, which could complicate users’ informal learning.

4.4 Readability scores

A readability index, such as the ones shown in Table 4, is an estimation of how difficult a text is to read. In online environments, it is often measured to assess click-through rates and user satisfaction Kanungo & Orr (2009). Grinberg (2018) utilized it, alongside sentence length, to model user engagement with news articles. As such, it is an interesting variable to consider when assessing the efficacy potential of the texts in our corpus.

Readability is determined by measuring a text’s complexity, which is approximated via quantifiable attributes such as word length, sentence length, syllable count and so on. The Flesch–Kincaid test Flesch (2007) is one of the most utilized readability tests, and it calculates readability by (1) dividing the number of utilized words by the number of sentences and (2) dividing the average number of syllables per word by the number of utilized words. The scoring range starts at 100 for the easiest to read and descends to 0 for unreadable texts. As an example, the combined Harry Potter novels have a score of 72.83. Other frequently used systems include the Gunning–Fog index Roberts et al. (1994), which looks at sentence length and number of polysyllabic words; the Coleman–Liau index Coleman & Liau (1975), which does not assess syllables; the Automated Readability index Senter & Smith (1967); and the Simple Measure of Gobbledygook (or SMOG) Laughlin (1969), which utilizes a similar methodology as the Flesch–Kincaid, but from sections within the text. All of these metrics utilize a 100–0 scoring system and are broadly comparable with each another. As such, we employ all of them in this study.

Table 4 highlights a selection of readability scores, all utilizing the same 100–0 scoring scale. To ensure the accuracy and reliability of our analysis, outliers in the readability scores data-set were identified and removed. The outlier detection was performed using the Interquartile Range (IQR) method. We calculated the first quartile (Q1) and the third quartile (Q3) for each readability measure. Outliers were defined as scores falling outside the range given by the following:

\begin{aligned} Lower Bound = Q 1 - 1.5 \times IQR \end{aligned}

\begin{aligned} Upper Bound = Q 3 + 1.5 \times IQR \end{aligned}

where $IQR = Q 3 - Q 1$ ⁠. Outliers were removed to improve the accuracy and interpretability of the results. The presence of extreme values in the data can distort statistical measures such as means and variances, and can affect the distribution and visualization of the readability scores. After removing outliers, we re-evaluated the distribution and summary statistics of the readability scores.

FIGURE 6

The distribution of sentences per article.

FIGURE 7

A visualization of Heaps’ law.

Table 5 above summarizes the number of outliers, total data points and outlier percentages for each readability measure. The percentages of outliers were relatively low, ranging from 2.73% to 4.05%, which is generally considered manageable. Figure 8 illustrates the distribution of readability scores after removing outliers.

The readability analysis of the corpus reveals that the text exhibits substantial complexity, as indicated by various readability metrics. The Flesch–Kincaid test’s Grade Level, with a score of 12.52, suggests that the text is suitable for readers who have completed secondary education. The Gunning Fog Index, at 16.03, implies that the text is intended for individuals with a college-level education, reflecting its complexity through longer sentences and a higher proportion of complex vocabulary. The Coleman–Liau Index, scoring 13.23, aligns with a reading level approximately one year beyond secondary school, focusing on average word length and sentence length. The SMOG Index, which stands at 14.55, indicates that a more advanced educational background is necessary for full comprehension, generally suggesting some college education or higher. The Automated Readability Index (ARI) of 13.00 further supports this, suggesting that the text is best understood by high school graduates or college students. Finally, the Average Grade Level (AGL) of 13.87 corroborates the high complexity of the text, pointing to a university-level readership.

These indices collectively illustrate that the corpus is tailored for an educated audience, characterized by complex sentence structures and sophisticated vocabulary. If the aim of the articles is to make the information more accessible to a wider audience, it may be beneficial to simplify the language and structure. However, due to the inherently complex and multifaceted nature of cybersecurity issues, such discussions will inevitably involve challenging concepts. As a result, the average casual reader may find limited value in engaging with these articles.

4.5 Sentiment analysis

Sentiment analysis is a group of text analysis techniques that allow for the automatic derivation of sentiment (positive or negative) from large data-sets Hussein (2018). Sentiment analysis is widely used across domains, from marketing Hussein (2018) to stock market analysis Kalra & Prasad (2019). Previous sentiment analysis work within the security field has focused on predicting cyber attacks or identifying potential perpetrators, for example, by assessing sentiment in online hacker forums Macdonald et al. (2015). The lexicons generated from these studies (for example, those pertaining to sentiment within political analysis of sovereign cyber capabilities) are of limited use within our work, as their terminology often differs substantially from what could be construed as ‘security advice’ based on our definition. As such, we utilized Latent Semantic Scaling (LSS), which is a semi-supervised technique for scaling documents based on work by Deerwester et al. (1990). It allows for a limited set of pre-generated seed words, which are words embedded with a specific positive or negative value. To produce our small library of seed words, we utilized the SENTPROP Hamilton et al. (2016) framework. We chose this framework as it combines word-vector embeddings with a label propagation approach, which are well-known techniques to generate seed-word libraries. Additionally, SENTPROP can generate accurate results with smaller corpora. Hamilton et al. (2016). In our system, the overall sentiment of a news article is correlated with the sentiments of individual words within that article, thereby allowing for a sentiment polarity check.

TABLE 4

Total readability metric scores and quartiles.

Metric	Score	Q1	Median	Q3	IQR
Flesch–Kincaid	12.52	10.50	12.85	15.16	4.66
Gunning–Fog Index	16.03	13.72	16.40	19.06	5.34
Coleman–Liau Index	13.23	11.65	13.37	15.01	3.37
SMOG	14.55	12.66	14.55	16.46	3.80
Automated Readability Index	13.00	11.00	13.00	16.00	5.00
Average Grade Level	13.87	11.96	14.18	16.32	4.36

Metric	Score	Q1	Median	Q3	IQR
Flesch–Kincaid	12.52	10.50	12.85	15.16	4.66
Gunning–Fog Index	16.03	13.72	16.40	19.06	5.34
Coleman–Liau Index	13.23	11.65	13.37	15.01	3.37
SMOG	14.55	12.66	14.55	16.46	3.80
Automated Readability Index	13.00	11.00	13.00	16.00	5.00
Average Grade Level	13.87	11.96	14.18	16.32	4.36

TABLE 4

Total readability metric scores and quartiles.

Metric	Score	Q1	Median	Q3	IQR
Flesch–Kincaid	12.52	10.50	12.85	15.16	4.66
Gunning–Fog Index	16.03	13.72	16.40	19.06	5.34
Coleman–Liau Index	13.23	11.65	13.37	15.01	3.37
SMOG	14.55	12.66	14.55	16.46	3.80
Automated Readability Index	13.00	11.00	13.00	16.00	5.00
Average Grade Level	13.87	11.96	14.18	16.32	4.36

Metric	Score	Q1	Median	Q3	IQR
Flesch–Kincaid	12.52	10.50	12.85	15.16	4.66
Gunning–Fog Index	16.03	13.72	16.40	19.06	5.34
Coleman–Liau Index	13.23	11.65	13.37	15.01	3.37
SMOG	14.55	12.66	14.55	16.46	3.80
Automated Readability Index	13.00	11.00	13.00	16.00	5.00
Average Grade Level	13.87	11.96	14.18	16.32	4.36

TABLE 5

Outlier statistics for readability scores.

Measure	Total Outliers	Total Points	Percentage
Flesch–Kincaid	510	16,852	3.03%
Gunning–Fog Index	530	16,859	3.14%
Coleman–Liau Index	682	16,847	4.05%
SMOG	461	16,876	2.73%
Automated Readability Index	629	16,749	3.76%
Average Grade Level	531	16,869	3.15%

Measure	Total Outliers	Total Points	Percentage
Flesch–Kincaid	510	16,852	3.03%
Gunning–Fog Index	530	16,859	3.14%
Coleman–Liau Index	682	16,847	4.05%
SMOG	461	16,876	2.73%
Automated Readability Index	629	16,749	3.76%
Average Grade Level	531	16,869	3.15%

TABLE 5

Outlier statistics for readability scores.

Measure	Total Outliers	Total Points	Percentage
Flesch–Kincaid	510	16,852	3.03%
Gunning–Fog Index	530	16,859	3.14%
Coleman–Liau Index	682	16,847	4.05%
SMOG	461	16,876	2.73%
Automated Readability Index	629	16,749	3.76%
Average Grade Level	531	16,869	3.15%

Measure	Total Outliers	Total Points	Percentage
Flesch–Kincaid	510	16,852	3.03%
Gunning–Fog Index	530	16,859	3.14%
Coleman–Liau Index	682	16,847	4.05%
SMOG	461	16,876	2.73%
Automated Readability Index	629	16,749	3.76%
Average Grade Level	531	16,869	3.15%

The results of this sentiment analysis process can be seen in Figure 9. The scores suggest an overall decrease in positive sentiment over the time period; however, these results are not statistically significant, likely because (1) the increase in published articles over the time period distorted the results and (2) a high p-value deviated significantly across the standard alpha value (set at 0.05).

5 Discussion

We now consider our results in the context of the research objectives of Section 1. We focus on the results derived from our CIS framework and associated vectors in Section 5.1, and ascertain how the readability, vocabulary and sentiment of the corpus affects its efficacy potential in Section 5.2.

5.1 What kind of informally learnt and actionable security advice most often appears in news articles?

Three overarching themes prevail in our security corpus. The first is data protection (Theme 1), which is reflected in the strong focus on CIS-13 (Data protection) and Topic 3 of our LDA analysis. The second is physical and digital security (Theme 2), which is supported by CIS-11 (Limitation and control of network ports, protocols and services), CIS-2 (Inventory and control of software assets) and Topic 3 of the LDA analysis. The third is personal and collective safety (Theme 3) in the face of personal, business or sovereign threats to one’s security, which is supported by Topics 3, 11 and 14.

All of these themes represent a unique set of constructs and associated user behaviours. For Themes 1 and 3, a significant driver for personal safety is privacy: ‘the right of a party to maintain control over, and confidentiality of, information about itself’ Oldehoeft (1992). Although privacy is a significant token by itself (appearing 4,887 times in the corpus), further indirect references to it suggest that it is the underlying motivation for a significant number of data protection-related articles, be they in the realm of health data, shopping data or, more broadly, associated with the GDPR.

In Theme 2, personal safety entails the need for user intervention in faulty systems, either because the system cannot determine the cause of a certain threat or the appropriate corrective action to take, or, in some cases, because the system itself is acting maliciously towards the user. These articles were the most likely to contain directly actionable security advice, and thus were the most efficacious for individual users.

Theme 3 also encapsulates threats to business and sovereignty. These articles are unlikely to contain actionable security advice, but they can aid in the creation of policy Cook (1998), which may then lead to actionable advice. These articles may even influence public opinion regarding the (cyber) security of national sovereignty, much like how terrorism news shaped national opinion and policy, as seen in work by Gadarian (2010). This cycle of influence leads to the creation of policies and legislation, such as the aforementioned GDPR, which in turn influences public awareness of potential data security threats, ultimately stimulating new forms of cyber offense and defensive capabilities. These capabilities are then disseminated to individual users, potentially as a form of security advice. Assessing future developments within these themes and re-assessing their relevance periodically could provide a lens for evaluating the past, current and future impact of news media on security advice efficacy potential.

FIGURE 8

The distributions of our readability metrics.

FIGURE 9

A visualization of changing sentiment, depicting a slight increase in negative sentiment along with a corresponding increase in article generation.

TABLE 6

An overview of the themes and supporting evidence.

Theme	Supporting evidence
1. Data protection	CIS-13, Topic 3
2. Cyber-physical systems security	CIS-2, CIS-11, Topic 3
3. Personal and collective safety	Topic 3, Topic 11, Topic 14

Theme	Supporting evidence
1. Data protection	CIS-13, Topic 3
2. Cyber-physical systems security	CIS-2, CIS-11, Topic 3
3. Personal and collective safety	Topic 3, Topic 11, Topic 14

TABLE 6

https://learn.cisecurity.org/cis-controls-download

An overview of the themes and supporting evidence.

Theme	Supporting evidence
1. Data protection	CIS-13, Topic 3
2. Cyber-physical systems security	CIS-2, CIS-11, Topic 3
3. Personal and collective safety	Topic 3, Topic 11, Topic 14

Theme	Supporting evidence
1. Data protection	CIS-13, Topic 3
2. Cyber-physical systems security	CIS-2, CIS-11, Topic 3
3. Personal and collective safety	Topic 3, Topic 11, Topic 14

5.2 What is the efficacy potential of this security advice as consumed by an individual user?

Many of the articles an individual user may access for cyber security advice may contain subject-specific vocabulary (such as that found within our ontological framework). Given that (1) there is limited overlap between advice sets within our ontological framework and (2) the average length of the articles in our corpus (expressed as sentence length) is shorter than the average length of comparator articles (see Section 4.3), there appears to be a certain level of focus within the articles that could indicate efficacy potential. However, we have also seen from ontological frameworks such as the CIS-Control schema that these tools may not encompass all of the possible security vectors within the current media environment. Furthermore, these results must be qualified given our topic modelling methodology. Our application of Heaps’ law highlights the growing vocabulary within our corpus, demonstrating that the subject-specific terminology in news articles on security advice is continuously evolving. This may point to an increasingly diversified interest in security advice that is tailored to a specific, predetermined goal. This encourages us to question the efficacy potential of all-encompassing frameworks such as the CIS-Control schema.

The results of our readability tests and sentiment analysis may further challenge the efficacy potential of current media-mediated security dissemination. We find within our corpus a trend towards high reading difficulty levels: ease of reading correlated with publication type, and news articles ranked higher on all readability indices. As all five of our assessment metrics reported statistically significant results with similar distribution scores (see Table 4), we can confidently assert that just 3% of our corpus was written at a U.S. school system 6th-grade level, which is typically the recommended reading level for standard distributed materials Kher et al. (2017). Most of the articles in this corpus require a reading level of a typical college undergraduate.

Recalling that an individual user must have (1) a sense of certainty about the content, (2) a personal interest in the content and (3) sufficient ability to deploy the content in order to feel sufficiently compelled to act on the information, this threat control process could easily be derailed by the continued divergence and growth of subject-specific vocabulary and dense prose. Haney & Lutters (2018) argue that there is a rejection threshold that informs the maintenance of security in a rapidly evolving landscape, and they maintain that individual users are approaching this threshold.

Security is not the only specialized field that deals with these dissemination issues, and it may be helpful to observe the solutions pursued in other contexts. For example, medical advice dissemination to the general public (taken here as the equivalent of our ‘individual user’) also involves communicating complicated concepts and extensive vocabulary to individuals who have no relevant formal training on the subject. Britt et al. (2017) found that many readers stop reading medical texts if they gauge significant difficulty within the first few sentences. Consequently, the American Medical Association (AMA) and the U.S. Department of Health and Human Services (USDHHS) have set explicit guidelines that require public-facing information to achieve a U.S.-standardized readability level of 6th grade or below Kher et al. (2017). Extrapolating these considerations to our own corpus, it would stand to reason that increasing readability to a more generally accessible level could constitute a cost-effective remedy.

Although the overall sentiment of the corpus would not suggest that users may be being treated as an enemy (as, for example, was documented in Adams and Sasse’s seminal 1999 paper; Adams & Sasse (1999)), it does appear that what we encountered would not fulfil Kerckhoffs’ criterion for ease of use. Neither would we agree that cyber security advice as portrayed in our corpus allows for self-efficacy upon reading. Instead, an individual user must face security topics using a multi-pronged approach, whereby self-efficacy is derived from multiple sources of increasing complexity. If the cyber security field is to continue down the path of increased specialization, perhaps the time has come to recognize this emerging reality and clarify—in a transparent fashion—the expectations that are being placed on users.

6 Limitations and future work

The scope of this study was limited by the type and amount of information we were able to acquire to build the corpus. In our case, this meant focusing on English-language material, even though a preliminary search conducted before implementation unearthed a rich catalogue of data in other languages. This also means that our security topics, analysis and findings likely exhibit Anglo-Saxon bias. The technical tools utilized for the readability scores were also designed for English-language articles. There is significant scope for the enhancement of our search methodology, where for example users may only utilize the first page of any search enquiry Höchstötter & Lewandowski (2009). It is our hope that this methodology be utilized to answer the same research objectives in other languages and cultural contexts.

Whilst we underscored the suitability of the CIS ontology, we also must recognize the drawbacks of this approach. The CIS ontology, although prescriptive in the manner in which it prioritizes controls, lacks risk assessment specifics and may lead to misaligned priorities and gaps as the end-user may have differing priorities. Furthermore, its suitability can also be attributed that it is due to ambiguity around its own intended target audience, and finally the CIS Controls have not undergone rigorous scientific analysis of their efficacy despite their popularity Groš (2021). However, as we are utilizing this ontology in an effort to answer our research questions rather than appealing directly to users, and as the other ontologies we surveyed suffer from broadly similar drawbacks, we do not consider these drawbacks to be sufficient to remove it as our choice. Instead we believe that more scientific analysis and sharing of case studies on CIS Control implementations by the community would also help solidify their value proposition, and its use underscores the need for further development in user-focused cyber security ontologies which may serve as a better basis from which to base a study such as ours.

We utilized automated methodologies in order to classify topics and measure sentiment and reading difficulty, and the results are tempered by the respective limitations of these methodologies, in particular the use of a bag-of-words model, which does not capture semantic meaning or context. This method treats words as independent features, potentially leading to overestimations in mapping articles to CIS controls. For example, the mere presence of keywords might incorrectly suggest relevance to a control, disregarding nuanced meanings conveyed through context. This limitation can skew our analysis, highlighting the need for advanced techniques like word embeddings or transformer models to improve semantic understanding and mapping accuracy. Moreover, our results represent a specific snapshot in the security timeline; access to a larger historical data-set would inevitably change the overall results, potentially yielding a more statistically significant sentiment analysis.

Our approach to tackling the second research objective may limit the usefulness of our conclusions. We approximated article efficacy potential by using text analysis to predict user engagement, and we did not consider other metrics that could have enhanced the findings. Traditionally speaking, reading-difficulty assessments in laboratory settings involve comprehension tests, eye tracking and brain-imaging. Knowledge of how users interact with our corpus in these terms would allow for a significantly richer analysis of security advice efficacy potential.

The aforementioned limitations can, of course, be addressed in future research that builds upon what is presented here—not least because our research method (described in Section 3) allows for continuous data capture. Furthermore, the data within this corpus could serve as the foundation for further analysis of security advice dissemination. Because this corpus contains a significant variety of sources, structural analysis of sentence construction for threat messaging could reveal the rhetorical structure of fear appeals, as per previous work in the field such as that of Renaud & Dupuis (2019). A fear appeal is designed to motivate the reader to execute security advice, and an in-depth analysis of its features could yield results that would improve the efficacy potential of security advice dissemination.

The corpus itself could be augmented with social media data, which would add the significant vector of digital naïve advice Schotter (2003). Bias within the articles could be used as another indicator of efficacy potential via methods like that presented by Lim et al. (2018). We believe that the results of this study can provide a basis for further reflection on security advice dissemination, and that it can stimulate a conversation about individual users’ learning environment. Importantly, we hope that it serves as a point of departure for future studies.

7 Conclusion

We have presented work on a corpus of security advice generated from mainstream news articles as might be faced by individual users on a regular basis. The work was oriented by two questions: (1) What kind of informally learnt and actionable security advice most often appears in news articles? (2) What is the efficacy potential of this security advice as consumed by an individual user?

We found that news-mediated security advice has been increasing since 2018, and that many such news articles focus on specific security topics. This level of focus may indicate efficacy potential. Additionally, we found that news-mediated security advice is characterized by short article length and low readability, making it difficult for many individual users to comprehend its content. We found that the subject-specific terminology within our security news articles is continuously evolving, potentially indicating increasingly diversified interest in goal-specific security advice. Again, this may increase the relative difficulty of acquiring and comprehending news-mediated security advice, with an associated impact on efficacy potential. Our approach involved using quantitative methods to yield qualitative findings. Our hope is that this research can help lay the foundations for various means of quantifying and improving the efficacy potential of security advice dissemination.

7 Data availability statement

The data that support the findings of this study are available in a repository and can be accessed here: https://huggingface.co/datasets/Quinm101/cybernewsarticles.

Footnotes

Given that cyber security is the discipline of concern in this paper, we shall refer to it simply as ‘security’.

https://livlab.org/seta/

https://www.nist.gov/

https://www.ncsc.gov.uk/

https://arstechnica.com/

Examples of ontologies with many of the selection criteria but falling short of being eligible for inclusion were ENISA’s IoT Security Standards Gap Analysis (https://www.enisa.europa.eu/publications/ iot-security-standards-gap-analysis) and a report by the UK’s Department for Digital, Culture, Media and Sport, mapping security recommendations for various audiences (https://www.gov.uk/government/publications/mapping-of-iot-security- recommendations-guidance-and-standards).

https://www.cisecurity.org/controls/cis-controls-faq

https://www.cisecurity.org/blog/v7-1-introduces-implementation-groups- cis-CIS-vectors/

https://gdpr-info.eu/

References

Abomhara

and

Køien

(

2015

)

Cyber-security and the internet of things: vulnerabilities, threats, intruders and attacks

J. Cyber Secur. Mobil.

–

10.13052/jcsm2245-1439.414

Adach

Hänninen

and

Lundqvist

(

2022

)

Security ontologies: A systematic literature review

. In

International Conference on Enterprise Design, Operations, and Computing

, pp.

–

Springer

Adams

and

Sasse

M. A.

(

1999

)

Users are not the enemy

Commun. ACM

–

10.1145/322796.322806

Ajzen

(

1985

)

From intentions to actions: A theory of planned behavior

. In

Action Control

, pp.

–

Springer

Al Hasib

(

2009

)

Threats of online social networks

Int. J. Comput. Sci. Netw. Secur.

288

–

293

Al-Mhiqani

M. N.

Ahmad

Yassin

Hassan

Abidin

Z. Z.

Ali

N. S.

and

Abdulkareem

K. H.

(

2018

)

Cyber-security incidents: a review cases in cyber-physical systems

Int. J. Adv. Comput. Sci. Appl.

499

–

508

Al Moubayed

Wall

and

McGough

A. S.

(

2017

)

Identifying changes in the cyber-security threat landscape using the LDA-web topic modelling data search engine

. In

Tryfonas

(ed),

Human Aspects of Information Security, Privacy and Trust

, pp.

287

–

295

Springer International Publishing

Cham.

Lecture Notes in Computer Science, vol. 10292

10.48550/arXiv.1901.02672

Alagheband

M. R.

Mashatan

and

Zihayat

(

2020

)

Time-based gap analysis of cyber-security trends in academic and digital media

ACM Trans. Manag. Inform. Syst.

–

Bada

Sasse

A. M.

, and

Nurse

J. R. C.

. (

2015

) Cyber-security awareness campaigns: Why do they fail to change behaviour?

International Conference on Cyber Security for Sustainable Society

Bandura

Freeman

W. H.

and

Lightsey

(

1999

)

Self-efficacy: The Exercise of Control

, pp.

158

–

166

Springer

Barnes

S., B.

(

2006

)

A privacy paradox: Social networking in the United States

First Monday

10.5210/fm.v11i9.1394

10.1016/j.obhdp.2006.07.001

Bertino

and

Islam

(

2017

)

Botnets and internet of things security

IEEE Comput.

–

Bonaccio

and

Dalal

R. S.

(

2006

)

Advice taking and decision-making: An integrative literature review, and implications for the organizational sciences

Organ. Behav. Hum. Decis. Process.

101

127

–

151

Brandimarte

Acquisti

and

Loewenstein

(

2013

)

Misplaced confidences: Privacy and the control paradox

Soc. Psychol. Personal. Sci.

340

–

347

10.1177/1948550612455931

Britt

R., K.

Collins

W. B.

Wilson

Linnemeier

and

Englebert

A. M.

(

2017

)

ehealth literacy and health behaviors affecting modern college students: A pilot study of issues identified by the american college health association

J. Med. Int. Res.

, e392.

10.2196/jmir.3100

Bull

Thompson

Searson

Garofalo

Park

Young

and

Lee

(

2008

)

Connecting informal and formal learning experiences in the age of participatory media

Contemp. Issues Technol. Teach. Educ.

100

–

107

Burghouwt

Spruit

and

Sips

(

2011

)

Towards detection of botnet communication through social media by monitoring user activity

. In

International Conference on Information Systems Security

, pp.

131

–

143

Springer

Byrne

Weidert

Liff

Horvath

Smith

Howe

and

Ray

(

2012

)

Perceptions of internet threats: Behavioral intent to click again

. In

Proceedings of the 27th Annual Conference of the Society for Industrial and Organizational Psychology

, pp.

–

Caballero

(

2017

)

Security education, training, and awareness

. In

Computer and Information Security Handbook

, pp.

497

–

505

Elsevier

10.1016/S1361-3723(13)70062-9

Caldwell

(

2013

)

Plugging the cyber-security skills gap

Comput. Fraud Secur.

2013

–

10.1016/j.neucom.2008.06.011

Cao

Xia

Zhang

and

Tang

(

2009

)

A density-based method for adaptive LDA model selection

Neurocomputing

1775

–

1781

Casas

Soro

Vanerio

Settanni

and

D’Alconzo

(

2017

)

Network security and anomaly detection with big-dama, a big data analytics framework

. In

The 2017 IEEE 6th Int. Conf. on Cloud Networking (CloudNet)

, pp.

–

IEEE

Cashell

Jackson

W. D.

Jickling

and

Webel

(

2004

)

The economic impact of cyber-attacks

. In

Congressional Research Service Documents, CRS RL32331

, p.

Washington DC

Çelen

Kariv

and

Schotter

(

2010

)

An experimental test of advice and social learning

Manag. Sci.

1687

–

1701

10.1287/mnsc.1100.1228

10.1007/s10683-010-9257-1

Chaudhuri

(

2011

)

Sustaining cooperation in laboratory public goods experiments: a selective survey of the literature

Exp. Econ.

–

10.1111/j.1468-0009.2010.00608.x

Chen

Chiang

R. H. L.

and

Storey

V. C.

(

2012

)

Business intelligence and analytics: From big data to big impact

Manag. Inform. Syst. Q.

1165

–

1188

Coleman

and

Liau

T. L.

(

1975

)

A computer readability formula designed for machine scoring

J. Appl. Psychol.

283

–

284

Contandriopoulos

Lemire

Denis

J.-L.

and

Tremblay

É.

(

2010

)

Knowledge exchange processes in organizations and policy arenas: a narrative systematic review of the literature

Milbank Q.

444

–

483

Cook

T., E.

(

1998

)

Governing with the News: The News Media as a Political Institution

University of Chicago Press

Das

Dabbish

and

Hong

J. I.

(

2018

)

Breaking! A typology of security and privacy news and how it’s shared

. In

Proc. of the 2018 CHI Conf. on Human Factors in Computing Systems

, pp.

–

Paper No. 1

De Nicola

and

Villani

M. L.

(

2021

)

Smart city ontologies and their applications: A systematic literature review

Sustainability

5578

10.1002/(SICI)1097-4571(199009)41:6∖(〈∖)391::AID-ASI1∖(〉∖)3.0.CO;2-9

Deerwester

Dumais

S. T.

Furnas

G. W.

Landauer

T. K.

and

Harshman

(

1990

)

Indexing by latent semantic analysis

J. Amer. Soc. inform. Sci.

391

–

407

Deveaud

Sanjuan

and

Bellot

(

2014

)

Accurate and effective latent concept modeling for ad hoc information retrieval

Doc. Numer.

–

10.3166/dn.17.1.61-84

10.1016/j.physa.2010.09.034

Dreibelbis

R. C.

Martin

Coovert

M. D.

and

Dorsey

D. W.

(

2018

)

The looming cyber-security crisis and what it means for the practice of industrial and organizational psychology

Industr. Organ. Psychol.

346

–

365

Fan

and

Yeung

K. H.

(

2011

)

Online social networks—paradise of computer viruses

Phys. A Stat. Mech. Appl.

390

189

–

197

Flesch

(

2007

)

Flesch-Kincaid readability test

Retrieved

Forget

Pearman

Thomas

Acquisti

Christin

Cranor

L. F.

Egelman

Harbach

and

Telang

(

2016

)

Do or do not, there is no try: user engagement may not improve security outcomes

. In

Twelfth Symposium on Usable Privacy and Security (SOUPS)

, pp.

–

111

Frey

Rashid

Anthonysamy

Pinto-Albuquerque

and

Naqvi

S. A.

(

2017

)

The good, the bad and the ugly: a study of security decisions in a cyber-physical systems game

IEEE Trans. Softw. Eng.

521

–

536

Fulton

K., R.

Gelles

McKay

Roberts

Abdi

and

Mazurek

M. L.

(

2019

)

The effect of entertainment media on mental models of computer security

In Proceedings of the 15th Symposium on Usable Privacy and Security (SOUPS)

–

10.1016/S1361-3723(09)70139-3

Furnell

(

2024

)

Usable cybersecurity: a contradiction in terms?

Interact. Comput.

–

Furnell

and

Thomson

K.-L.

(

2009

)

Recognising and addressing ‘security fatigue’

Comput. Fraud Secur.

2009

–

10.1017/S0022381609990910

Gadarian

S. K.

(

2010

)

The politics of threat: How terrorism news shapes foreign policy attitudes

J. Politics

469

–

483

Garrick

(

1998

)

Informal learning in corporate workplaces

Hum. Resour. Dev. Q.

129

–

144

10.1002/hrdq.3920090205

Goldstein

Kantrowitz

Mittal

and

Carbonell

(

1999

)

Summarizing text documents: Sentence selection and evaluation metrics

. In

Proc. of the 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval

, pp.

121

–

128

Grinberg

(

2018

)

Identifying modes of user engagement with online news and their relationship to information gain in text

. In

Proc. of the 2018 World Wide Web Conf.

, pp.

1745

–

1754

Groš

(

2021

)

A critical view on cis controls

. In

The 16th Int. Conf. on Telecommunications (ConTEL)

, pp.

122

–

128

IEEE

Guan

and

Wong

M. H.

(

2018

)

Regulations and brain drain: Evidence from wall street star analysts’ career choices

(Forthcoming).

Management Science

Halevi

Memon

Lewis

Kumaraguru

Arora

Dagar

Aloul

and

Chen

(

2016

)

Cultural and psychological factors in cyber-security

. In

Proc. of the 18th Int. Conf. on Information Integration and Web-based Applications and Services

, pp.

318

–

324

Hamilton

W., L.

Clark

Leskovec

and

Jurafsky

(

2016

)

Inducing domain-specific sentiment lexicons from unlabeled corpora

. In

Proc. of the 2016 Conf. on Empirical Methods in Natural Language Processing, volume 2016

, pp.

595

–

605

NIH Public Access

Haney

J., M.

and

Lutters

W. G.

(

2018

)

”It’s scary it’s confusing it’s dull”: How cyber-security advocates overcome negative perceptions of security

. In

The 14th Symposium on Usable Privacy and Security (SOUPS)

, pp.

411

–

425

Heaps

H., S.

(

1978

)

Information Retrieval, Computational and Theoretical Aspects

Academic Press

Hendrix

Al-Sherbaz

and

Victoria

(

2016

)

Game based cyber security training: are serious games suitable for cyber security training?

Int. J. Serious Games

–

10.17083/ijsg.v3i1.107

Herley

C., E.

(

2009

)

So long, and no thanks for the externalities: The rational rejection of security advice by users

. In

Proc. of the 2009 Workshop on New Security Paradigms Workshop, NSPW ’09

, pp.

133

–

144

Association for Computing Machinery

New York, NY, USA

Cormac

E. Herley

Brian

Keogh

Aaron Michael

Hulett

Adrian M.

Marinescu

Jeffrey S. Williams

, and

Stanislav

Nurilov

US patent 9,021,590: Spyware detection mechanism

2015

Hight

S. D.

(

2005

)

The importance of a security, education, training and awareness program, November 2005

Security

27601

–

10.1016/j.ins.2009.01.028

Höchstötter

and

Lewandowski

(

2009

)

What users see–structures in search engine results pages

Inform. Sci.

179

1796

–

1812

10.1007/s13369-019-04319-2

Howe

A., E.

Ray

Roberts

Urbanska

and

Byrne

(

2012

)

The psychology of security for the home computer user

. In

The2012 IEEE Symp. on Security and Privacy

, pp.

209

–

223

IEEE

Humayun

Mahmood Niazi

N. Z.

Jhanjhi

M. A.

and

Mahmood

(

2020

)

Cyber-security threats and vulnerabilities: a systematic mapping study

Arab. J. Sci. Eng.

3171

–

3189

10.1016/j.jksues.2016.04.002

Hussein

D. M. E.-D. M.

(

2018

)

A survey on sentiment analysis challenges

J. King Saud Univ. Eng. Sci.

330

–

338

10.1016/j.jcss.2014.02.005

Ion

Reeder

and

Consolvo

(

2015

)

”...no one can hack my mind”: Comparing expert and non-expert security practices

. In

The 11th Symp. on Usable Privacy and Security (SOUPS)

, pp.

327

–

346

Jang-Jaccard

and

Nepal

(

2014

)

A survey of emerging threats in cyber-security

J. Comput. Syst. Sci.

973

–

993

Kalra

and

Prasad

J. S.

(

2019

)

Efficacy of news sentiment for stock market prediction

. In

The 2019 Int. Conf. on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon)

, pp.

491

–

496

IEEE

Kanungo

and

Orr

(

2009

)

Predicting the readability of short web summaries

In Proc. of the 2nd ACM Int. Conf. on Web Search and Data Mining

202

–

211

Kendall

E., F.

and

McGuinness

D. L.

(

2019

)

Ontology Engineering

Morgan & Claypool Publishers

Kerckhoffs

(

1883

)

La cryptographie militaire

Journal des Sciences Militaires

–

Kher

Johnson

and

Griffith

(

2017

)

Readability assessment of online patient education material on congestive heart failure

Adv. Prev. Med.

2017

–

Khoo

Robertson

and

Deibert

(

2019

)

Installing fear: A Canadian legal and policy analysis of using, developing, and selling smartphone spyware and stalkerware applications

University of Toronto Citizen Lab Report

10.1016/j.infsof.2015.11.001

Kosar

Bohra

and

Mernik

M. A.

(

2016

)

Protocol of a systematic mapping study for domain-specific languages

J. Inform. Softw. Technol.

–

10.1016/j.jebo.2005.03.010

Kuang

X. J.

Weber

R. A.

and

Dana

(

2007

)

How effective is advice from interested parties? An experimental test using a pure coordination game

J. Econ. Behav. Organ.

591

–

604

Lahlou

Langheinrich

and

Röcker

(

2005

)

Privacy and trust issues with invisible computers

Commun. of the ACM

–

10.1145/1047671.1047705

Lasswell

H., D.

(

1948

)

The structure and function of communication in society

. In

Bryson

(ed),

The Communication of Ideas

, pp.

–

Harper and Row

10.1016/j.cose.2016.02.004

Lawson

S., T.

Yeo

S. K.

Haoran

and

Greene

(

2016

)

The cyber-doom effect: The impact of fear appeals in the us cyber-security debate

. In

The 8th International Conference on Cyber Conflict (CyCon)

, pp.

–

IEEE

Lee

C. C.

and

Kim

(

2016

)

Understanding information security stress: Focusing on the type of information security compliance activity

Comput. Secur.

–

Convertino

Tayi

R. K.

and

Kazerooni

(

2019

)

What data should i protect? recommender and planning support for data security analysts

. In

Proc. of the 24th Int. Conf. on Intelligent User Interfaces

, pp.

286

–

297

Lim

Jatowt

and

Yoshikawa

(

2018

)

Understanding characteristics of biased sentences in news articles

. In

CIKM Workshops

, pp.

121

–

128

Lindner

A. M.

Pryciak

and

Elsner

(

2020

)

Tor and the city: MSA-level correlates of interest in anonymous web browsing

Surveill. Soc.

507

–

521

10.24908/ss.v18i4.13235

10.1108/13665620310504783

Macdonald

Frank

Mei

and

Monk

(

2015

)

Identifying digital threats in a hacker web forum

. In

Proc. of the 2015 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining 2015

, pp.

926

–

933

Maddux

J., E.

(

1995

)

Self-efficacy theory

. In

Self-Efficacy, Adaptation, and Adjustment

, pp.

–

Springer

Malcolm

Hodkinson

and

Colley

(

2003

)

The interrelationships between informal and formal learning

J. Workplace Learn.

313

–

318

10.1016/j.jfineco.2007.02.001

Malmendier

and

Shanthikumar

(

2007

)

Are small investors naive about incentives?

J. Fin. Econ.

457

–

489

10.1080/17530350.2013.772070

Marres

and

Weltevrede

(

2013

)

Scraping the social? issues in live social research

J. Cult. Econ.

313

–

335

Laughlin

G. H. M.

(

1969

)

Smog grading—a new readability formula

J. Read.

639

–

646

McCombs

M. E.

and

Shaw

D. L.

(

1972

)

The agenda-setting function of mass media

Public Opin. Q.

176

–

187

Meyer

(

2010

)

The rise of the knowledge broker

Sci. Commun.

118

–

127

10.1177/1075547009359797

10.1016/j.cose.2016.08.001

Miller

Wagner

Aickelin

and

Garibaldi

J. M.

(

2016

)

Modelling cyber-security experts’ decision making processes using aggregation operators

Comput. Secur.

229

–

245

10.1111/j.1745-6606.2009.01148.x

Milne

G. R.

Labrecque

L. I.

and

Cromer

(

2009

)

Toward an understanding of the online consumer’s risky behavior and protection practices

J. Consumer Affairs

449

–

473

Mindermann

(

2016

)

Are easily usable security libraries possible and how should experts work together to create them?

Proc. of the 9th Int. Workshop on Cooperative and Human Aspects of Software Engineering

, pp.

–

Nicholson

Coventry

and

Briggs

(

2019

)

”If it’s important it will be a headline” cyber-security information seeking in older adults

. In

Proc. of the 2019 CHI Conf. on Human Factors in Computing Systems

, pp.

–

Paper No. 349

Nthala

and

Flechais

(

2017

)

”If it’s urgent or it is stopping me from doing something, then i might just go straight at it”: a study into home data security decisions

. In

Int. Conf. on Human Aspects of Information Security, Privacy, and Trust

, pp.

123

–

142

Springer

Oldehoeft

A., E.

(

1992

)

Foundations of a Security Policy for Use of the National Research and Educational Network

U.S. Department of Commerce, National Institute of Standards andTechnology

Ollis

(

2011

)

Learning in social action: The informal and social learning dimensions of circumstantial and lifelong activists

Aust. J. Adult Learn.

248

–

268

Oltramari

Henshel

D. S.

Cains

and

Hoffman

(

2015

)

Towards a human factors ontology for cyber security

STIDS

2015

–

10.1016/j.jet.2004.08.005

Ottaviani

and

Sørensen

P. N.

(

2006

)

Professional advice

J. Econ. Theory

126

120

–

142

10.29214/damis.2012.31.1.007

Park

J.-y.

(

2012

)

An analysis on training curriculum for educating information security experts

Manag. Inform. Syst. Rev.

149

–

165

10.1016/j.cose.2011.12.010

Pfleeger

S. L.

and

Caputo

D. D.

(

2012

)

Leveraging behavioral science to mitigate cyber-security risk

Comput. Secur.

597

–

611

Pfleeger

S. L.

Angela Sasse

and

Furnham

(

2014

)

From weakest link to security hero: Transforming staff security behavior

J. Homel. Secur. Emerg. Manag.

489

–

510

10.1515/jhsem-2014-0035

10.1016/j.clsr.2021.105542

Piasecki

Urquhart

and

McAuley

(

2021

)

Defence against the dark artefacts: Smart home cyber crimes and cyber-security standards

Comput. Law Secur. Rev.

, 105542.

10.1108/00330330610681286

Porter

M. F.

(

2006

)

An algorithm for suffix stripping

Program

211

–

218

Rader

and

Wash

(

2015

)

Identifying patterns in informal sources of security information

J. Cyber Secur.

tyv008

–

tyv144

10.1093/cybsec/tyv008

Rader

Wash

and

Brooks

(

2012

)

Stories as informal lessons about security

. In

Proc. of the 8th Symp. on Usable Privacy and Security

, p.

ACM

Redmiles

E., M.

Kross

and

Mazurek

M. L.

(

2016a

)

How I learned to be secure: A census-representative survey of security advice sources and behavior

. In

Proc. of the 2016 ACM SIGSAC Conf. on Computer and Communications Security, CCS ’16

, pp.

666

–

677

Association for Computing Machinery

New York, NY, USA

Redmiles

E., M.

Kross

and

Mazurek

M. L.

(

2019

)

How well do my results generalize? Comparing security and privacy survey results from mturk, web, and telephone samples

. In

The 2019 IEEE Symp. on Security and Privacy (SP)

volume 00

, pp.

227

–

244

IEEE

Redmiles

E., M.

Malone

A. R.

and

Mazurek

M. L.

(

2016b

)

I think they’re trying to tell me something: Advice sources and selection for digital security

. In

The IEEE Symp. on Security and Privacy (SP)

, pp.

272

–

288

IEEE

Reeder

R. W.

Ion

and

Consolvo

(

2017

)

152 simple steps to stay safe online: Security advice for non-tech-savvy users

IEEE Secur. Privacy

–

10.1109/MSP.2017.3681050

Renaud

and

Dupuis

(

2019

)

Cyber-security fear appeals: Unexpectedly complicated

In Proc. of the New Security Paradigms Workshop

–

10.1001/jama.1994.03520020045012

Roberts

J. C.

Fletcher

R. H.

and

Fletcher

S. W.

(

1994

)

Effects of peer review and editing on the readability of articles published in annals of internal medicine

JAMA

272

119

–

121

Ruighaver

A. B.

Maynard

S. B.

and

Chang

(

2007

)

Organisational security culture: Extending the end-user perspective

Comput. Secur.

–

10.1016/j.cose.2006.10.008

10.1080/19331681.2019.1616646

Ruohonen

and

Kimppa

K. K.

(

2019

)

Updating the Wassenaar debate once again: Surveillance, intrusion software, and ambiguity

J. Inform. Technol. Politics

169

–

186

Ruoti

Monson

Zappala

and

Seamons

(

2017

)

Weighing context and trade-offs: How suburban adults selected their online security posture

In Proc. of the 13th Symp. on Usable Privacy and Security (SOUPS)

211

–

228

Saks

A. M.

and

Ashforth

B. E.

(

1996

)

Proactive socialization and behavioral self-management

J. Vocat. Behav.

301

–

323

10.1006/jvbe.1996.0026

Satyapanich

Finin

and

Ferraro

(

December 2019

)

Extracting rich semantic information about cyber-security events

. In

The 2019 IEEE International Conference on Big Data (Big Data)

, pp.

5034

–

5042

Schatz

Bashroush

and

Wall

(

2017

)

Towards a more representative definition of cyber-security

J. Digit. Forensics Secur. Law

–

10.15394/jdfsl.2017.1476

Schirrmacher

N.-B.

Ondrus

and

Tan

F. T. C.

(

2018

)

Towards a response to ransomware: Examining digital capabilities of the Wannacry attack

In PACIS

210

10.1257/000282803321947047

Schotter

(

2003

)

Decision making with naive advice

Amer. Econ. Rev.

196

–

201

Senter

R. J.

and

Smith

E. A.

(

1967

)

Automated readability index

Technical report,

AMRL-TR. Aerospace Medical Research Laboratories

Shillair

Cotten

S. R.

Tsai

H.-Y. S.

Alhabash

LaRose

and

Rifon

N. J.

(

2015

)

Online safety begins with you and me: Convincing internet users to protect themselves

Comput. Hum. Behav.

199

–

207

10.1080/13523260.2019.1670006

Shires

(

2020

)

Cyber-noir: Cyber-security and popular culture

Contemp. Secur. Policy

–

107

10.1108/14684520410543670

Smith

A. D.

(

2004

)

Cybercriminal impacts on online business and consumer confidence

Online Inform. Rev.

224

–

234

Šorgo

Bartol

Dolničar

and

Podgornik

B. B.

(

2017

)

Attributes of digital natives as predictors of information literacy in higher education

Brit. J. Educ. Technol.

749

–

767

Souag

Salinesi

Mazo

and

Comyn-Wattiau

(

2015

)

A security ontology for security requirements elicitation

. In

Engineering Secure Software and Systems: 7th Int. Symp., ESSoS 2015, Milan, Italy, March 4-6, 2015. Proceedings 7

, pp.

157

–

177

Springer

Sowndarajan

and

Binu

(

2017

)

Android security issues and solutions

. In

The 2017 Int. Conf. on Innovative Mechanisms for Industry Applications (ICIMIA)

, pp.

686

–

689

IEEE

Stanton

Theofanos

M. F.

Prettyman

S. S.

and

Furman

(

2016

)

Security fatigue

IT Professional

–

Steinel

Abele

A. E.

and

De Dreu

C. K. W.

(

2007

)

Effects of experience and advice on process and performance in negotiations

Group Process. Intergroup Relat.

533

–

550

10.1177/1368430207081541

10.1016/0001-8791(87)90037-6

Stumpf

S. A.

Brief

A. P.

and

Hartman

(

1987

)

Self-efficacy expectations and coping with career-related events

J. Vocat. Behav.

–

108

Taylor

and

Wolff

(

2004

)

The Victorians Since 1901: Histories, Representations and Revisions

Manchester University Press

10.1109/MITP.2018.2881373

Cyber-security in social media

(

2019

)

Challenges and the way forward

IT Professional

–

Theofanos

(

2020

)

Is usable security an oxymoron?

IEEE Comput.

–

10.1109/MC.2019.2954075

10.1016/S1363-4127(01)00304-1

Tregear

(

2001

)

Risk assessment

Information Security Technical Report

–

10.1016/j.cose.2013.04.004

Viet

H., N.

Van

Q. N.

Trang

L. L. T.

and

Shone

(

2018

)

Using deep learning model for network scanning detection

. In

Proc. of the 4th Int. Conf. on Frontiers of Educational Technologies

, pp.

117

–

121

von Solms

and

van Niekerk

(

2013

)

From information security to cyber-security

Comput. Secur.

–

102

10.1080/02684527.2012.708530

Wagner

Şahin

C. Ş.

Pena

and

Streilein

W. W.

(

2019

)

Automatic generation of cyber architectures optimized for security, cost, and mission performance: A nature-inspired approach

. In

Advances in Nature-Inspired Computing and Applications

, pp.

–

Springer

Wang

Zhang

Wang

Yan

and

Huang

(

2016

)

Targeted online password guessing: An underestimated threat

. In

Proc. of the 2016 ACM SIGSAC Conf. on Computer and Communications Security

, pp.

1242

–

1254

Wang

Liu

and

Huang

(

2014

)

A network gene-based framework for detecting advanced persistent threats

. In

The 2014 Ninth Int. Conf. on P2P, Parallel, Grid, Cloud and Internet Computing

, pp.

–

102

IEEE

Warner

(

2012

)

Cyber-security: A pre-history

Intell. Natl. Secur.

781

–

799

Weinstein

B. D.

(

1993

)

What is an expert?

Theoret. Med.

–

Wenger

(

1998

)

Communities of practice: Learning as a social system

Syst. Thinker

–

10.1017/CBO9780511803932

West

(

2008

)

The psychology of security

Commun. ACM

–

10.1145/1330311.1330320

10.1186/s13174-017-0059-y

Wiederhold

B. K.

(

2014

)

The role of psychology in enhancing cyber-security

Cyberpsychol. Behav. Soc. Netw.

131

–

132

10.1089/cyber.2014.1502

Woods

Agrafiotis

Nurse

J. R. C.

and

Creese

(

2017

)

Mapping the coverage of security controls in cyber insurance proposal forms

J. Internet Serv. Appl.

–

Yasaka

T., M.

Lehrich

B. M.

and

Sahyouni

(

2020

)

Peer-to-peer contact tracing: Development of a privacy-preserving smartphone app

JMIR mHealth and uHealth

, e18936.

10.2196/18936

Yuan

Fernando

and

Klonoff

D. C.

(

2018

)

Standards for medical device cyber-security in 2018

746

10.1177/1932296818763634

A Appendix A Search terms

TABLE A.7