Abstract

Digital Humanities has an increasing need for widely applicable and easy-to-use methods across disciplinary boundaries. We believe that the cross-disciplinary methods we have developed for the study of the ancient Near East are useful for the study of other research questions outside the field of “Digital Assyriology”. The article presents an overview of automated language processing for lexical-semantic analysis, social network analysis, and content analysis. We draw on the work in our research group, which aims to address how changing empires affect social group identities and lifeways in the first millennium BCE in Mesopotamia. The article places the methods our research team has developed into the larger methodological context of Digital Humanities (DH). The article presents an overview of the tools our research group has found useful in our study of ancient social groups. Sections 1 and 2 give the necessary background for the reader to understand the particular challenges related to the study of ancient Mesopotamia and how we have overcome them. The concrete case studies presented in Sections 3 and 4, however, are kept as general as possible, to facilitate similar approaches in adjacent fields of study. Our methodology and approaches are well documented and openly available, and in our view can be used on similar text materials, by other groups interested in DH approaches.

1. Introduction

Digital Humanities (DH) has an increasing need for widely applicable and easy-to-use methods across disciplinary boundaries. We believe that the cross-disciplinary methods we have developed for the study of the ancient Near East are useful for the study of other research questions outside the field of “Digital Assyriology.” The article presents an overview of automated language processing for lexical-semantic analysis, social network analysis (SNA), and content analysis. We draw on the work in our research group, which aims to address how changing empires affect social group identities and lifeways in the first millennium BCE in Mesopotamia. The article places the methods our research team has developed into the larger methodological context of DH.

Written history in Mesopotamia begins ca 3000 BCE. The first language written with wedge shaped cuneiform signs on clay was Sumerian, but since ca 2500 BCE a language belonging to another language family, Akkadian, started using the same writing system. Sumerian and Akkadian texts are the majority of texts written in cuneiform script, but other languages adapted cuneiform for their use as well (e.g. Old Persian and Hittite). More than 500,000 texts written in cuneiform script have survived and been excavated from the Middle East. Some of this material has been published, transliterated, and formed into large electronic text corpora.

Dealing with large amounts of textual data, projects like the Open Richly Annotated Cuneiform Corpus (Oracc)1 and Cuneiform Digital Library Initiative (CDLI)2 have made the ancient texts themselves available, but we aim to study semantic domains, social groups, and content networks in these reasonably large text materials. Because of the large datasets, we need tools, visualizations, and quantitative approaches. These approaches enable historical researchers to pinpoint interesting groups of individuals, relevant concepts describing groupness, and, more importantly, their change over time. Emphasis on change is necessary, as the historical period which is well-documented by digitized cuneiform texts covers ca. 500 years. Our research focuses on the end of the cuneiform culture: from the Neo-Assyrian Empire (934-612 BCE) to the Neo-Babylonian Empire (612-539 BCE) and the following Persian period (539-331 BCE), up until the end of most large-scale cuneiform archives (484 BCE).

The history of studying the ancient Near East (roughly 3000 BCE to 300 BCE in the area of modern Middle East) has largely relied on traditional philological methods since the decipherment of cuneiform writing in the 1850s. In recent decades, Assyriology has progressed and much of the text material is now transliterated and even lemmatized. For an overview of the field, see Sahala (2021). Our research group has since 2017 developed ways to apply language technological methods to ancient Near Eastern material in order to examine networks of lexemes, people, and material culture, mainly between 934 BCE and 484 BCE.

After the introduction (Section 1), we present a short description of our data and the processing methods (Section 2). We discuss the application of the DH resources (Section 3), which we have developed for studying social groups through semantic domains (Section 3.1), networks of people (Section 3.2), and material artifact networks (Section 3.3). We also provide a comprehensive case study to more thoroughly evaluate the benefits and shortcomings of the tools (Section 4). We conclude with a discussion of the main results and some remarks about future research (Section 5).

1.1 Background and goals

Our research group, the Centre of Excellence in Ancient Near Eastern Empires (ANEE) at the University of Helsinki, Finland, aims to address the research question of how changing empires impact social group identities and lifeways during the first millennium BCE. For this purpose, ANEE consists of three teams, focusing on DH Approaches, Sociology, and Archaeology. Geographically, ANEE’s area of interest stretches from the Mesopotamian heartland to Southern Levant.

One of the goals of ANEE is to study the imperial elite and to identify the subgroups belonging to the social elite in the 1st millennium BCE. By emphasizing some features or roles, some individuals belong to, and others are excluded from, a particular elite. We get various elites by highlighting features like power, trade, ownership, hierarchy, residence, etc. Many individuals still belong to a common core of all the elites if we emphasize the mutually well-connected individuals, as Larsen and Ellersgaard (2017) did for the Danish elite in modern times. We can use this core as an operational definition of the imperial elite. A relevant research problem is to find the minimal conditions for defining the Ancient Near East imperial elite and how to extend this research to data from recently published sources.

There are many possible ways to define social group identity. In this article, we use a straightforward and flexible definition: group identity can be hypothesized based on the position of an individual in the social environment and the connections created between individuals participating in events together. For example, a working definition of the imperial elite can refer to its power and wealth, whereas the relations to the elite define the marginal groups. If you were among the imperial elite, you did business with various individuals, and there are deeds, transactions, letters, and so forth to document this. The ancient texts therefore (1) reveal the networks between individuals, documenting in-groups and out-groups, (2) record the semantic range of Akkadian words, and (3) yield information on the material culture and artifacts associated with the individuals. The team focusing on Archaeology provides information on material evidence in the form of artifacts and ecofacts to which historians provide context through the study of written documentation. The written documentation lets us look at networks of artifacts associated with the imperial elite in contrast to artifacts typically associated with other individuals.

From a scientific methodology perspective, we can compare our approach with that of Lagus, Pantzar, and Ruckenstein (2018) who bring big data into consumer research in their study of rhythms of fear and joy in online discussion fora. They point out that the

sociologist DominiqueBoullier (2017),painted a radical change of direction for consumer research. A survey study has questions prepared by researchers, and the consumer responses are the basis for a statistical sampling analysis. However, big data analysis proceeds in the opposite direction. eg, research based on online discussions looks for questions that the data can answer instead of looking for answers to predefined questions. The researcher changes perspectives to test the big data until the data condenses in various relevant ways (Frické, 2015). Ideally, big data can trace origins, accumulation, and transfer of ideas from an individual or a group to another.

We find that our approach to big data in Akkadian is similar. We offer a lexical portal3 by Sahala et al. (2022) giving a bird’s eye view to the exploration of Akkadian data, through which researchers can preliminarily test their hypotheses to see if there are enough digital data to answer specific research questions.

The big data approach is methodologically new compared with a purely philological, theoretical, or qualitative expert analysis. We also agree with Lagus, Pantzar, and Ruckenstein (2018) that combining various data extraction methods will produce both more robust methods and more replicable research results in the long run. The goal is not to remove the researcher from the research process but to find a new role for the qualitative researcher when analyzing massive data collections.

In addition, our work strives to see the larger language-independent picture to make our research applicable to similar research topics using unstructured text in other languages. The article highlights the solutions developed for Akkadian while contrasting them with what is necessary for other languages in general. A language-independent description of the language processing methods is needed to facilitate a cross-language methodological comparison.

Assyriologists find The Prosopography of the Neo-Assyrian Empire (PNA) (eds. Radner and Baker 1998–2011) extremely relevant for the study of individual and group identities. Prosopographical information on more than 25,000 individuals that lived in the Neo-Assyrian Empire allows a historian to reconstruct archives, social contexts, and social groupings. What if we could automatically or semi-automatically update this derived dataset as more sources become available? Section 2 suggests approximating PNA by combining the ANEE’ lexical portal, the corpus search tool Korp, and the Oracc database. However, PNA often goes further, disambiguating person names and providing a synthesis of the contexts in which the names appear. Some solutions are offered by Sahala (2021), who documents and further develops methods for processing cuneiform documents from tablets to texts in his dissertation. Additionally, Adam Anderson (2018) documents and further develops automated prosopographical disambiguation of Akkadian names.

The methods developed by Sahala and Anderson offer relevant tools when studying concordances of names in texts that were still unavailable when PNA was compiled, and when applied in an automated way, the methods can give us a first overview of the imperial elite as it is conveyed by the raw data.

2. Data and processing methods

In this section, we show what cuneiform data look like originally and describe the content of Oracc and PNA (2.1). While cuneiform writing is rather complicated compared with current writing standards, it has one great advantage compared with modern writing: the names are usually clearly indicated and put in some well-defined category by determinatives. The determinatives guide the reader to interpret the syllable and logogram combinations according to the indicated context. However, modern computational methods can benefit from this mark-up of names. Names are usually also left uninflected while other words of the language may have an intricate morphology. Before we benefit from a more involved linguistic analysis, we look at what we can get directly from the named-entity mark-up provided by the writing system (2.2). This already allows us to visualize networks of names co-occurring in texts to perform SNA, which is also what the digital version of PNA enables in a much more fine-grained and reliable fashion as the PNA has been manually curated (2.3).

When looking at networks of other words than names, it is necessary to lemmatize the words because in most cases we do not have enough data to make relevant observations directly on surface forms (2.4). In the networks of lemmatized words, we draw the researcher’s attention to surprisingly often co-occurring lemmas which constitute semantic domains. We look at two different kinds of co-occurrences: the surface neighbors (syntagmatic context) and the friends-of-friends (paradigmatic contexts) (2.5). We use network visualization methods from SNA to get a bird’s eye view of the semantic domains which we offer through the ANEE lexical portal. However, the primary lexicographical technique to study words in context is to make a concordance. Current database environments such as Korp offer the flexibility to apply complex search criteria to select the contexts for the concordance, compute frequency tables based on metadata, and the opportunity to visualize the findings on geographical maps. We link from the ANEE lexical portal to Korp to let the researcher refer to the sources to understand what the connections in the lexical portal are based on (2.6). Finally, we present some caveats about the methods we have used (2.7).

2.1 DH data on the ancient Near East

To get a feeling for what the original data that is included in Oracc and PNA looks like, you can have a glimpse of a cuneiform inscription and its transliteration in Figs. 1 and 2.

A fragment of a prism documenting campaigns of King Sennacherib (reign 705-681 BCE), museum number VK6400:6 (Ethnographic Collection, National Museum of Finland), photo by Annukka Debenjak-Ijäs. A photogrammetric 3D-digitization of the prism fragment, made by Debenjak-Ijäs, can be found here: https://skfb.ly/oqwQt (accessed 11 May 2023). See also Debenjak-Ijäs, Bonnie, and Saari 2021 (under VK6400:6) for a high resolution 3D scan. The 3D-digitization project was funded by the Finnish Cultural Foundation. The fragment’s left-hand side was identified as being a duplicate of RINAP 3/1, 23 lines iii 17-27, a Sennacherib inscription, by Dr Johannes Bach, who is in the process of publishing the fragment. For the full transcription and translation of this prism fragment, see Bach and Bonnie 2023, 10–16.
Figure 1

A fragment of a prism documenting campaigns of King Sennacherib (reign 705-681 BCE), museum number VK6400:6 (Ethnographic Collection, National Museum of Finland), photo by Annukka Debenjak-Ijäs. A photogrammetric 3D-digitization of the prism fragment, made by Debenjak-Ijäs, can be found here: https://skfb.ly/oqwQt (accessed 11 May 2023). See also Debenjak-Ijäs, Bonnie, and Saari 2021 (under VK6400:6) for a high resolution 3D scan. The 3D-digitization project was funded by the Finnish Cultural Foundation. The fragment’s left-hand side was identified as being a duplicate of RINAP 3/1, 23 lines iii 17-27, a Sennacherib inscription, by Dr Johannes Bach, who is in the process of publishing the fragment. For the full transcription and translation of this prism fragment, see Bach and Bonnie 2023, 10–16.

A detail of line A 11' of Fig. 1 and its transliteration, transcription and translation:
Figure 2

A detail of line A 11' of Fig. 1 and its transliteration, transcription and translation:

    11' […] qé-rebURUur-sa-li-mu

    qereb Ursalimmu

     “inside the city of Jerusalem.”

The whole passage documents Sennacherib’s victory over the king of Jerusalem: “As for him (Hezekiah), I confined him inside the city Jerusalem, his royal city, like a bird in a cage. I set up blockades against him and made him dread exiting his city gate. I detached from his land the cities of his that I had plundered and I gave them to Mitinti, the king of the city Ashdod, Padî, the king of the city Ekron, and Ṣilli-Bēl, the king of the city Gaza, and thereby made his land smaller. To the former tribute, their annual giving, I added the payment of gifts in recognition of my overlordship and imposed it upon them.” The translation is from RINAP 3/1, 23 lines iii 24-32 (see https://oracc.museum.upenn.edu/rinap/Q003497/) with reconstructions treated as certainties.

The ultimate goal is to have a fully automated procedure taking a 3D picture of a clay tablet, offering a transliteration, a transcription, and a translation for anyone wishing to read the source. Despite recent progress, we still need some technological development to realize this goal.

Scholars have been transcribing Akkadian for almost 200 years, so we have a good deal of material to start from, but many texts are untranscribed in museum collections, and a large part of the tablets remain unearthed. Over the last 30 years, many of the texts have been made openly available in various databases. The foremost among them is Oracc as it is the only corpus in which a large portion of the data has been richly linguistically annotated.

The two most sizable datasets we are currently working with in Helsinki are Oracc and PNA. Oracc brings together the results of several projects publishing online editions in cuneiform texts and provides rich metadata for words and texts in the database. The Oracc metadata includes genre, provenance, and period for each text. For a majority of the texts, Oracc also provides the transliteration, transcription, lemma, part-of-speech, and translation of each word. This metadata already allows automated lexicographical context analysis. PNA comprises 8,000 different names of more than 25,000 individuals with references to the written sources of the Neo-Assyrian period (ca 934-612 BCE), many of which are available in Oracc.

2.2 Name clusters

To identify a group, we need to identify its individuals. In Language Technology, this is called named-entity recognition (NER) including among others person, place, and organization name recognition. Having identified the individuals or named entities, we can look at which ones appear together and how often. To accurately identify all co-occurrences of individuals in sentences, we also need pronoun resolution.

Examples 1–3 are three sentences from two royal inscriptions of Tiglath-pileser III (Q003414 and Q003420). They tell us how he went on a conquest against the Kings Mitāki and Bātānu. He did not do so on his own behest of course but on behalf of the god Aššur. Below, we used mark-up customary for Language Technology to mark the person names and pronouns with PER, and for the pronouns, we have also indicated to which person they refer, the place names with LOC, and the name of a deity with MYTH. These correspond to PN (Personal Name), GN (Geographical name), and DN (Divine Name), respectively, which are the abbreviations customarily used in Assyriology.

Example inscriptions:

  1. “At the beginning of my[PER: Tiglath-pileser III] reign, in my[PER: Tiglath-pileser III] first palû, in the fifth month after I[PER: Tiglath-pileser III] sat in greatness on the throne of kingship, [MYTH: the god]Aššur, my lord, encouraged me[PER: Tiglath-pileser III] and I[PER: Tiglath-pileser III] marched against … [the Aramean tribes] …” (Tiglath-pileser III 04 in http://oracc.museum.upenn.edu/rinap/rinap1/Q003414/, accessed 11 May 2023).

  2. [PER]Mitāki … entered [LOC: the city]Uršanika. I [PER: Tiglath-pileser III] captured

    [LOC: the city]Uršanika and [LOC: the city]Kianpal, and … him[PER: Mitāki], his[PER: Mitāki] wife, his[PER: Mitāki] sons, his[PER: Mitāki] daughters, …”

    (Tiglath-pileser III 07 in http://oracc.museum.upenn.edu/rinap/rinap1/Q003420, accessed 11 May 2023).

  3. [PER]Bātānu of the [LOC: the land]Bīt-Kapsi …, submitted, and became

    my [PER: Tiglath-pileser III] vassal so that his[PER: Bātānu] district would not be dispersed.

    I [PER: Tiglath-pileser III] left him[PER: Bātānu] [LOC: the city]Karkariḫundir. …”

    (Tiglath-pileser III 07 in http://oracc.museum.upenn.edu/rinap/rinap1/Q003420, accessed 11 May 2023).

In the examples, some annotations are prefixes while some are suffixes. The distinction between the two is that the Akkadian scribes already provided the prefixed annotations in the original text, whereas the suffixed annotations are added based on the interpretation. The original scribal convention using determinatives helps significantly with the NER. A recognized name should ideally be disambiguated (Anderson 2018) before extracting it into an adjacency matrix to let the Gephi software (Bastian, Heymann, and Jacomy 2009) calculate and visualize the clustering of co-occurring names.

Even without using any particular linguistic processing or name disambiguation, we can, for example, look at the names of deities in Neo-Assyrian texts. The deity names are tagged as divine names in the Oracc metadata. We still needed to standardize the spelling of the deity names across different Oracc projects. Deities are invoked for various purposes in the same document, so we studied the names of deities in a context window (Alstola et al. 2019). With this surface approach, we were still able to show that although Aššur was the top god in the Assyrian pantheon, he was not in a central position from the perspective of SNA based on the sources available in Oracc in 2018. The available sources mention him so frequently among the other deities that additional data is unlikely to significantly change the picture. However, the network analysis corroborates that the pantheon was much older and fully formed when the Assyrians gained prominence. They only put their god Aššur in front and on top of the pantheon.

2.3 Context annotation, event analysis and visualizing name clusters

Letters and legal documents in Akkadian often describe a single event, for example, deeds that document information sharing, money lending, or property acquisitions with witnesses present. Such events have actors in predefined roles. The same actors in several events connect the events into two-mode affiliation networks, which serve as the basis for further analysis of the social network structure formed by the individuals attending the events (Newman 2018: 60–2, 115–8). One can simplify such two-mode networks based on the properties of the actors to study the direct flow of information, money, or goods through directed one-mode social networks. The mutual connections formed by the individuals present at events reinforce their belonging to the same social group in an undirected fashion. That they are invited, present, and mentioned in predefined roles is a sign that they belong not only to a particular event but also to one or more social groups.

In general, language technological tools for extracting events or frames require some syntactic processing of the data for good performance. The Akkadian language has a rich inflectional morphology indicating word functions and a writing system indicating person and place names. These features of Akkadian allow actors and their activities to be extracted from the Akkadian data using hfst-pmatch (Hardwick, Silfverberg, and Lindén 2015). Hfst-pmatch is a tool for matching regular expressions on linguistically rich data.

As a starting point for the event analysis, and to demonstrate its usefulness before embarking on dedicated tool development, we have converted the PNA data into digital format. We provide interactive access to it through the ANEE portal as two-mode and one-mode networks. The network can also be downloaded as a Gephi project and further processed into directed or undirected social networks. For access to the dataset and details on how the PNA was processed into a Gephi project, see Jauhiainen and Alstola (2022). Researchers and developers of derived networks may publish4 their Gephi projects through the ANEE portal hosted by the Language Bank of Finland.5

2.4 Language processing of Akkadian

When looking at networks of other words than names, it is necessary to lemmatize the words because in most cases we do not have enough data to make relevant observations directly on the surface forms. The analysis using Korp and Gephi is already possible based on the manually transcribed data available in Oracc. However, a substantial portion of digitized Akkadian documents in Oracc has only been transliterated. As we wanted to dig deeper into all the Akkadian texts, we needed to create full-fledged language processing tools for Akkadian. For this purpose, we used the morphological analyzer for Akkadian developed by Sahala (2020). In addition, we have annotated parts of the RIAO treebank for Akkadian (Luukko et al. 2020).

Using the morphological analyzer and the treebank, we have created a lemmatizer for Akkadian. The treebank was published in November 2020 and is currently available on the Universal Dependencies web page.6 The lemmatizer also selects the part-of-speech in context, and as mentioned previously, we get most of the Akkadian named entities for free. In the processing pipeline, we first process all the transliterated Oracc data to add lemmas to the untranscribed data with our tools for Akkadian,7 and upload the processed data into Korp. For an overview of tools for processing Akkadian, see Sahala (2021).

2.5 Context analysis and semantic domains

A traditional way of doing context analysis with language technology is by pointwise mutual information (PMI)8 to identify words that co-occur surprisingly often. We call this syntagmatic context analysis as it uses a context window, for example, “capture” and “Kianpal” in Example 4, for calculating how likely the words co-occur as an indication of how likely they are neighbors in the same semantic domain.

In recent years, it has become customary to represent a keyword by its word contexts, where each word in all the contexts of the keyword is given a weight. For each keyword, this is represented as a list of all context words with their weight. Mathematically, this is a context vector. The similarity of two words is calculated by the similarity of their context vectors, which has become the dominant paradigm for lexical context analysis on large textual data collections. With this representation, we can theoretically find similarities between words appearing in similar contexts, for example, “besiege” and “conquer” in Examples 5–7. Although the words never appear next to each other, they have friends in common. We call this a paradigmatic context. Examples 4 and 5–7 illustrate the differences between the two types of contexts introduced by Svärd et al. (2018).

For calculating context vector similarity, we primarily used fastText (Bojanowski et al. 2017) although Aleksi Sahala later developed software for calculating PMI embeddings,9 that is, context vectors using PMI weights. For an introduction to the data preprocessing and parameter selections for fastText, see Jauhiainen and Alstola (2023).

Syntagmatic context example:

(4) I[nsubj:Tiglath-pileser]capturepast [the city]Uršanikaobj and [the city]Kianpalobj

Paradigmatic context example:

(5) I [nsubj:Tiglath-pileser]besiegepast [the city]Uršanikaobj and [the city]Kianpalobj

(6) I [nsubj:Tiglath-pileser]conquerpast [the city]Uršanikaobj and [the city]Kianpalobj

(7) I [nsubj:Tiglath-pileser]capturepast [the city]Uršanikaobj and [the city]Kianpalobj

The syntagmatic similarity is related to the paradigmatic similarity. The syntagmatic similarity measures the expectation that a particular word appears in the vicinity of another word. The paradigmatic similarity compares the similarity of the word contexts ignoring syntactic structure. If the words often appear in each other’s vicinity, they will be both syntagmatically and paradigmatically similar. For a low-frequent word, the syntagmatic similarity can only yield words from the few local contexts in which the word appears. However, a high-frequent context word indirectly connects a low-frequent word to many other words in paradigmatic contexts. As paradigmatic similarity compares contexts and not only individual co-occurring words, it requires more data than syntagmatic similarity to yield relevant results.

We initially looked at the group identity of individuals identified by their proper names and their document context. Akkadian proper names usually do not have inflectional endings, so they are easier to deal with from a language processing perspective than regular words. However, regular nouns, verbs, or adjectives have a rather complicated system of inflections in Akkadian, which requires more elaborate language processing (Sahala 2021) before identifying their group identity. When clustering the lemmas of regular words, one usually talks about discovering their semantic domains.

We created semantic domains using both syntagmatic (PMI) and paradigmatic (fastText) similarity. We visualized the word similarities with Gephi and uploaded the files to the ANEE lexical portal, where they are available for searching by Akkadian words or by English sense words.

2.6 Visualizing word clusters and word contexts

To visualize word clusters in the ANEE portal, the words have ties to other words based on the strength of their context similarity. We also provide links to the concordances of word instances in their sentence contexts in Korp with further links to the original texts in Oracc and pictures in CDLI. In Fig. 3, we show our lexical portal based on the Gephi software, in which one can search for an English word and its translations into Akkadian or vice versa. The links to Korp and the document context is provided for names as well as regular words in their semantic domains.

Translation of the English word “to fear” into Akkadian.
Figure 3

Translation of the English word “to fear” into Akkadian.

In Fig. 4, we can look closer at the ego graph of an Akkadian lexeme, palāhu (eng. to fear). The figure is a close-up of the most significantly co-occurring words of palāhu in the Oracc database. We have created a portal with the Gephi web interface10 where one can click on a word and see its most significant relations to all other words in the Oracc database. We created two separate views in the portal11: one based on PMI and the other on fastText (Jauhiainen et al. 2021). One can navigate through this web of words by clicking on the lexemes in the portal or the sidebar.

Ego graph of the Akkadian word palāhu (eng. to fear).
Figure 4

Ego graph of the Akkadian word palāhu (eng. to fear).

As can be seen in the sidebar in Fig. 3, there is also a link to the online search tool Korp, which gives a concordance of the sentences in which the word has occurred in the original texts. When clicking on “Search in Korp,” you are automatically taken to the Korp search interface shown in Fig. 5 with the concordance for palāhu. We see that palāhu has several inflected forms, and all the instances are concordantly listed. In the Korp sidebar, you also see the metadata of the selected sentence.

Korp search interface displaying a concordance of the Akkadian word palāhu (eng. to fear).
Figure 5

Korp search interface displaying a concordance of the Akkadian word palāhu (eng. to fear).

In addition, the findings are visualized on a map in Korp by viewing the source location of the texts as seen in Fig. 6 visualizing the geographical spread of some of the synonyms to fear: palāhu, puluhtu, adīru, and adirtu. The map is interactive, that is, it can be enlarged and words can be hidden to reduce clutter. You can also go from the map to a concordance of the instances represented by the circles on the map.

Distribution of texts mentioning synonyms of fear: palāhu, puluhtu, adīru, and adirtu.
Figure 6

Distribution of texts mentioning synonyms of fear: palāhu, puluhtu, adīru, and adirtu.

From the sidebar in the Korp search interface, you can go further to the original text in the Oracc database, as shown in Fig. 7. From there, you can in many cases link to a picture of the original clay tablet in CDLI.

Displaying an instance of the Akkadian word palāhu (eng. to fear) in the original text in Oracc.
Figure 7

Displaying an instance of the Akkadian word palāhu (eng. to fear) in the original text in Oracc.

2.7 Some observations on our methods

From a computational point of view, Akkadian datasets are still rather small, and the data is repetitive in its nature as the writing is often formulaic. To remedy this, we introduced a method for downscaling the impact of the repetitiveness on PMI (Sahala et al. 2020), which we have used to improve the relevance of the word clusters and the semantic domains they represent. The processing method has been published in GitHub.12

Our second observation is that co-occurrence (PMI) and context vector (fastText) similarity should, by definition, give very different results, but in practice, the results are often relatively similar. FastText just seems to be noisier than PMI. We believe this to be a consequence of having a relatively small dataset for fastText, and we cannot afford to discard words with a frequency below five. Low-frequency words tend to become very significant in the context vectors. Unrelated words having a rare word in common in their context will seem to be similar.

The third observation is that since we have relatively small datasets with only a few million words, we can still count a full co-occurrence matrix replacing the word counts in the co-occurrence matrix with PMI values. In addition, we can use the downscaling proposed by Aleksi Sahala. On the full co-occurrence matrix, we can then use single value decomposition for dimensionality reduction. As a consequence, the vector clustering will be much cleaner. However, we still have the problem that vector clustering introduces spurious similarities.

A fourth observation is that syntagmatic methods like PMI are already able to pick up words with synonymous or very similar meanings due to natural language using words with similar meaning in coordination constructions. For example, the words adirtu, gilittu, and pirittu all mean “fear” or “terror” and are regularly attested together (Svärd et al. 2021; Alstola & Svärd 2024). This may in some cases reduce the need to use noisier paradigmatic methods.

3. Using the ANEE lexical portal for studying social groups

In the following, we look at social groups from a linguistic, sociological, and archaeological perspective. We also wish to show how one can investigate semantic domains, social group identities, and related artifacts with the same language processing and network analysis tools.

3.1 Conceptual networks or semantic domains

We first discuss the possibilities offered by semantic domains. In our published work, semantic domains have been used to study the deities of the era. The supreme elite in the Ancient Near East was the pantheon. It served to confirm and support the ruling elite. We studied this by analyzing the semantic domains of Aššur (Alstola et al. 2019) using data from Oracc, which as mentioned above indicated that Aššur was a newcomer introduced in the pantheon with the expansion of the Assyrian Empire.

Another case study that used the semantic domains approach focused on terms for objects that were part of the royal insignia in Mesopotamia. We asked two questions: firstly, whether these terms appear as a unified “set” in the ANEE Lexical Portal, which may indicate their use as a fixed set of objects in specific contexts, and secondly, if the respective terms showed strong links to the word šarrūtu (kingship).

The answer to the first question was negative: the words used to denote the regalia do not appear as a “set” within the ANEE Lexical Portal. Most of these terms are not connected in the networks or are part of separate clusters. The lack of connection in the PMI network shows that the words for regalia never or seldom occur as collocates in Akkadian texts; their separation in the FastText network indicates that the terms have different textual contexts. In other words, Akkadian texts rarely or never talk about all the royal insignia as a set. When they mention an individual piece, they do so in widely disparate contexts.

Still, strong connections could be identified between some terms, indicating subsets, for example, between ḫaṭṭu (scepter) and agû (crown). The scepter and the crown appear together relatively often in texts, creating a tie between the two lexemes in the syntagmatic network. Moreover, also in the paradigmatic network, both nodes are linked, indicating that they have similar contexts. Although scholars had previously been aware of this strong tie between the two objects, the ANEE Lexical Portal demonstrated the measurable strength of this connection.

The connection between three words that indicate scepters or staffs (ie ḫaṭṭu, šibirru, and ušparu) was even tighter. The problem here is that the Portal offers no help in understanding whether these words are synonyms referring to the same object or not, as their respective semantic domains do not include terms relating to the objects’ range of use or appearance. The lack of usage information is due to both šibirru and ušparu having relatively few occurrences in the underlying corpus. The inclusion of more textual data would shed more light on the matter.

When answering the second question, we note that the words had diverging levels of connection to the word for “kingship” (šarrūtu) in the respective networks. For words denoting weapons, such as kakku (weapon) or qaštu (bow), there was no visible link to kingship nor to other insignia that are not weapons. Second, there were terms whose semantic field indicated the objects’ allegorical use, as was the case for agû (crown) and ḫaṭṭu (sceptre). Finally, there was the exceptional case of kussû (throne), which has a strong tie to the institution of kingship, not only sharing a tie with it but also having significant semantic overlap with šarrūtu (kingship). The findings in the Portal suggest that only a few objects held the inherent meaning of status symbol relating to kingship, whereas others were more relevant in specific settings.

The nature of the ANEE Lexical Portal set some limitations to the study. The results are diachronically flat13, as we cannot yet study chronological changes in the Portal although the chronological period can be used as a search criterion in Korp. In addition, the results are also somewhat skewed due to the underlying dataset14 and strongly influenced by the corpus of Neo-Assyrian royal inscriptions and archives. The lack of data also meant that there were many terms we could not study because they did not occur in the Portal, as they were not represented (often enough) in the underlying dataset. Another matter relates to only investigating Akkadian terms; it would be fruitful and relevant to also include Sumerian words. Currently, there are plans to import Sumerian texts during 2022–5 into Korp and the ANEE Lexical Portal.

These two case studies show that connections between named entities (Assyrian deities) or concrete items (king’s regalia) can benefit from the semantic domains approach. The case studies also show the limitations of the method, of which the historian must be aware to make full use of the approach.

3.2 Social groups or elites

In the PNA network, individuals tend to form clusters, and some individuals are more central than others. In addition, some clusters are more central than others. We can measure this by various centrality measures such as degree (how many “friends” does an individual have), betweenness (how influential is an individual), and eccentricity (how far away is the most distant individual). The eccentricity centrality seems like a potential measure for identifying the elite vs. the marginal groups. In addition, k-core is a centrality measure used by Larsen and Ellersgaard (2017).

The k-core measure says that an individual belongs to a k-core when connected to at least k other individuals. By gradually removing all individuals with lower k-core values, we can arrive at an elite of highly interconnected individuals. A preliminary study based on the k-core centrality without an additional weighting scheme or the additional parameters introduced by Larsen and Ellersgaard did not yield interesting results. However, we have found that by introducing a relatively simple weighting scheme that imposes as little bias on the dataset as possible while reflecting the event size, a weighted version of k-core centrality can aid us in identifying members of the elite in co-attestation networks based on the PNA and starting from the data (Bennett et al., 2024)

Another way to study social group dynamics, and even identify social groups in a social network, is to select a group of people sharing a characteristic and analyze whether the identified members occupy a different position than the other people in the network. Their positions may indicate a social boundary between the group and the rest of the network. In particular, we are interested in identifying elites and marginal groups in social networks. We have tested different techniques to divide a network into the core (or center) and periphery and compare the composition of these subgraphs to see if their social structure is different. As a case study, we investigated how “Arameans”—people designated as Arameans and people with Aramaic names—are distributed in the Neo-Assyrian prosopographical network. As the social location of these people has been studied with traditional historical methods using the same dataset (Nissinen 2014), we were able to compare our results with previous work on the subject.

To study the social location of Arameans, we divided the largest component (12,497 actors) of our PNA network into three parts using the eccentricity measure. The eccentricity of an actor in a network equals the maximum geodesic distance between the actor and any other actor in the network (Hage and Harary 1995). In our network, the eccentricity scores are between 9 and 18. Based on these scores, we divided our network into three parts: the core (actors with eccentricity scores of 9–10; 356 actors), middle ground (11–13; 10,051), and periphery (14–18; 2,090). This division has the assumption that more actors should belong to the middle ground than to the core or periphery. This division seems to capture at least some aspects of the social hierarchy of the Neo-Assyrian Empire, as the four great Sargonid kings of Assyria are at the core.

If Arameans were attested in all social spheres of the Neo-Assyrian Empire, their relative distribution in the core, middle ground, and periphery should be similar to the overall distribution of actors in these three parts of the network. However, we found that the distribution is not equal as Arameans are attested less often than expected (4.46 per cent) in the core (3.09 per cent) and middle ground (3.99 per cent) but more often in the periphery (6.94 per cent). This very slightly skewed distribution suggests that all social spheres in Neo-Assyrian society were not equally open to Arameans. The result nuances those of Nissinen (2014: 296) view that Arameans “could be found at all levels of Assyrian society.” They certainly could, but they were, nevertheless, statistically under-represented in the highest echelons of the empire.

More generally, if we assume that 3%–4% of the actors belong to the core elite, and approx. 15 per cent to the periphery, we can study the individuals of the three groups based on their eccentricity score. For further analysis, we still need to see how well the eccentricity division correlates with what a qualitative study of elite vs. periphery says about a random sample of individuals from each group. The time-saving aspect is that instead of going through all the 12,000 individuals one by one, we can now look at the characteristics of the 350 most prominent individuals with the lowest eccentricity score. By contrasting the properties of the core group with the properties of the other groups, we are likely to find some defining characteristics. A problem we observed with the eccentricity score is that it is a rather blunt tool for separating the core elite from the periphery.

We also present a more comprehensive study of Egyptians active in Assyria in the late 8th and 7th c. BCE based on the Korp data available in 2020.15 The key difference to the Aramaean case study is that three additional levels of complexity will be addressed: (1) the fact that in the period in question political and geographical Egypt differ from each other, (2) that this is a recurring phenomenon for centuries by this time, and (3) that the official and the individual perception of one’s origin, status, and agency may be at variance.

Overall, it has to be acknowledged that the operability of Oracc, Korp, PNA, and the ANEE Lexical Portal for the topic in question is currently rather limited. Each one allows access only to a fraction of the available data, thus heavily multiplying the amount of effort for extracting the information plus enforcing an additional time-consuming check of whether the extracted information is representative for the currently easily available information and the principally available information. For the paper at hand, the question is whether this is largely due to the historiographical approach of Assyriology and Ancient Near Eastern Studies (ANE studies; separation of philological and archaeological data; primary focus on the ruling strata of society/ies), due to the preliminary stage of feeding in the available information (inclusion of DH approach in ANE studies) or also due to the design of the tools (DH and crossover to ANE studies).

The Korp version of Oracc (as provided by the Finnish Language Bank) is designed to allow simple and complicated searches across the various sub-projects. The primary search, compilation, and download functions work effectively. For socio-historical questions exceedingly helpful is the additional metadata information, for example, on the provenance of the objects, which can be seen, searched for, and downloaded for further analysis for a bulk of documents.

Regarding Egyptians in Assur, substantially more data than provided across the different digital platforms tend to be available within a single print volume (e.g. Faist 2007), making it much easier to find the available and needed information and to realize their complexity than the digital tools currently allow. However, the socio-historically challenging questions sketched out above require getting beyond what is published in the print volumes, that is, the basic archaeological and philological data compilation. This is where the immense additional potential of the digital tools presented in this paper come into play. In Korp, the combination of meta-level information across the various Oracc projects invites complex socio-historical searches—though, obviously, only to the degree of collected basic data. Thus, on the level of methodological potential, Korp allows to search, for example, for all explicitly denoted “Egyptians” recorded in the different (imperial and/or private) archives and stray-finds from the urban center of Assur. In addition, via semantic and syntactic comparison (e.g., as introduced above via the tools for syntagmatic and paradigmatic analysis PMI, fastText) indications should be possible of whether “Egyptian” is likely to be associated with a political, a geographical, or a cultural entity, and whether the perception is that of the scribe, one of the principal parties of the contract, deed, etc, or of the “Egyptian” themself. However, it does not. For our “Egyptians,” this currently results in the discrepancy of sixty-one matches for “Egypt or Egyptian” in the sub-project ATAE (Archival Texts of the Assyrian Empire; still partially un-lemmatized, and marked as unfinished and not fully exportable from Oracc), including ten from Assur, versus only three matches exported to Korp, none of which from Assur. The majority of the currently known explicitly denoted “Egyptians” in Assyria are by now included in Oracc (cf the list in Wasmuth 2016: 105–12), but not yet exportable to Korp and its search and analysis tools.16

Another challenge to be solved is thus what is meant by “Egypt” in the Assyrian sources from the 7th c. BCE. Interestingly, the royal inscriptions of King Esarhaddon, whose army managed to conquer parts of Lower Egypt and to reach the capital of Memphis at the traditional administrative border of Upper and Lower Egypt reflect a perception of three entities: “Egypt,” “Upper Egypt,” and “Kush” (see Fig. 8). While “Egypt” (Akk. miṣir) does not match local terms (especially kmt “the black land,” t3-mrj “the beloved land,” t3.wj “the two lands [ie t3-rsj + t3-mḥw],” jdbwj “the two river banks”), “Upper Egypt” (Akk. paturisu) goes back to Eg. (p3) t3-rsi “the south land” and “Kush” to Eg. k3š.

Upper Egypt in Oracc in Korp.
Figure 8

Upper Egypt in Oracc in Korp.

The socio-historically relevant question is here, whether the textual sources allow assessing whether the terms used reflect administrative, political, or geographical entities, to which degree they match ancient realities at the time, and to which degree potential results from the royal inscriptions concerning royal actions within geographical Egypt during the first half of the 7th c. BCE are indicative also for private archival sources from the second half of the 7th c. BCE, and––for the contribution at hand––how the digital tools can help us out of the dilemma. Also here, the current answer to the latter is that they cannot—not because the tools are deficient for assessing the questions, but because the data available for processing via the tool is still too limited. A huge advantage of Korp over Oracc is the systematic and stable search and filter function across different genres, and the relatively easy implementation of complex search and filter parameters. These include not only the syntactic and semantic information from within the individual ancient text document, but also metadata like the archaeological—especially the archival—context, and the separation of the base form and the sense of translation for each word.

A key characteristic of the ANEE Lexical Portal and the ANEE PNA tool is that they inherently aim at standards, not the unusual. Therefore, the research questions need to match the research tools. The set-up of the Lexical Portal and ANEE PNA are oriented toward macro patterns, elements that are––or have become––standard at a certain point in time or within a specific text genre, etc. Furthermore, the research question needs to be oriented toward a large number of occurrences (large at least from the perspective of a historian doing qualitative work). Using digital tools introduced in this article is at its best when, for example, one is comparing 10–30 different lexemes, each occurring 50–1000 times in the corpus, or when one needs a graph of which genres of text have most occurrences of a single lexeme that occurs 3000 times (Svärd et al. 2018; Svärd et al. 2021). Such research questions would require a very long time to answer with traditional methods, yet can be easily solved with digital methods presented in this article. If we, for example, had wanted to investigate the currently more than 700 mentions of Egypt and Egyptians in the Oracc in Korp dataset, the Korp search tool could have provided metadata, characteristic co-occurrences, map visualizations, and frequency summaries. However, for the case focusing on the Egyptians in Assur with its characteristically limited direct textual person identifiers, and in general for studies searching for more vague preliminary indications of social change and hidden social complexities, a more traditional philological (plus stratigraphic plus visual-material) approach proves currently still more effective (Wasmuth, M., forthcoming; Wasmuth and Debourse, in press).

3.3 Archaeological studies

From the ANEE Lexical Portal, it is possible to download and customize the network of events by extracting people who trade or possess particular material items. These individuals form networks dealing with material artifacts or goods. Visualizing the locations of sources mentioning these individuals and the locations mentioned in such sources on a map in the Korp corpus server may also highlight important trade or supply routes. Furthermore, investigating the link between these materials (e.g. woods, precious stones, minerals, exotic foods) and people may also provide information on who belongs to the elite, their relationship with central power, how the networks were structured (e.g. trades, roads), and how the consumption occurred.

Paralleling textual artifact locations with archaeologically identified geographic artifact clusters or distribution patterns can provide historical contextualization for intercommunal material activities, and vice versa, paralleling archaeological context with textual mentions of such events, may potentially identify user groups of archaeological artifacts. Even mundane artifacts can be symbols of group identity when they are manufactured and exchanged under elite control. Thus, correlating archaeological and textual data can confirm interpretive models of both datasets.

In archaeology, scientific sourcing can link geochemical or isotopic signatures of artifacts to specific source areas and reveal intergroup goods transport, yet offering limited insight into the reasons for exchange. Ancient textual evidence mentioning, for instance, economic activities, market places, exchanged products, travel, conflicts, diet, and religious activities, can explain material culture characteristics and variation (Holmqvist 2019: 125–32; Lorenzon and Wallis, 2023). Events affecting elite-controlled factors such as availability of resources, supply or distribution networks, and consumer preferences, can initiate changes in artifact technologies that are recognizable in the archaeological artifact data. Thus combining different approaches can yield novel insights.

As a brief example, we can compare the distribution of texts mentioning ivory (šinnu, literally “tooth”) and texts mentioning silver (kaspu). Ivory was a very specific imported luxury material in Mesopotamia, whereas silver was used as currency all over the Empire, at least in wealthy circles of the society. The distribution map (Fig. 9) shows the difference in the distribution of these two words in the texts and the result is what we would expect. Silver is written about all over the Empire (red) whereas the word ivory (green) appears in the major urban centers only. This study could be expanded by displaying the location of all the period-specific material artifacts that were made of silver or ivory and comparing their geographical distributions.

Distribution of silver used as currency and ivory imported as a precious material. Created in Korp.
Figure 9

Distribution of silver used as currency and ivory imported as a precious material. Created in Korp.

4. A comprehensive evaluation

The methods developed by the Centre of Excellence were put to the test by Ellie Bennett (2023). The methods were replicated and used to investigate the intersection of age and masculinities as expressed through the preserved (and digitized) Neo-Assyrian documents. This section uses the age and masculinities data to evaluate the methods.

The immediate challenge of digital techniques in Assyriology is their suitability for any proposed research project. Digital methods cannot—and should not—be universally applied to every research topic. The present methodology should therefore be seen as one tool in a toolbox of different methods which Assyriologists can use. As Bennett’s focus was on the intersection of two different identities (age and gender), and how they were expressed in Akkadian documents from the Neo-Assyrian period, a digital method was deemed appropriate for the study even if the tools and methods developed at ANEE would allow for an overview of a more complex intersection of identities. However, Bennett also utilized the more traditional close-reading of texts that is more familiar to Assyriologists.

When Bennett began her study, there was no precompiled subset of the ANEE lexical portal built on only Neo-Assyrian texts.17 In addition, the lexical portal aims to display the relationships between all words in Akkadian, so Bennett chose to use ANEE methods to build her own lexical network focusing on the relationships between words related to masculinities and other Akkadian words.18

Bennett began by downloading all texts from Oracc whose metadata were tagged as “Neo-Assyrian” and “Akkadian.”19 This dataset included 6,235 texts. The words in the texts were then annotated with: lemma[guideword]EPOS. Bennett identified 524 words that, based on previous scholarship,20 were related to Assyrian masculinities. This stage was heavily influenced by hegemonic masculinities theory, as Bennett included words that not only referred to ideal ways of being a man, but also how not to be a man.21 These words were the “target words” that would form the basis of Bennett’s lexical network representing masculinities expressed in Assyrian texts. Bennett chose to use PMI (specifically PMI2) for measuring the co-occurrences of the 524 masculinities words.22

The pmizer2 script produced the top fifteen PMI results for each target word, which resulted in a list with thousands of results. Following the Semantic Domains workflow, Bennett chose to import the results list into the visualization software Gephi to visualise broader patterns than the top 15 lists would allow for each word. Nodes represented words—both masculinities and non-masculinities words were included. Edges represented the PMI2 scores.23 Bennett used the ForceAtlas2 layout algorithm with no-overlap and LinLog mode to aid the visibility of clusters and results. She chose to size the nodes according to their weighted degree score. Weighted degree sums the weight of the edges surrounding a node. In this network, it functions as an indicator of the words that have the strongest ties in the network. Weighted degree can therefore indicate which words appeared most prominently with masculinities.24

Bennett then posed the question: what is the role of age in the lexcial network of Neo-Assyrian masculinities? She first identified fifteen Akkadian words related to age, as shown in Table 1.

Table 1.

Table of the Akkadian “age” words Bennett wanted to investigate.

“Young” words“Old” words
  • ayyaru, “young man”

  • batūlu, “young man (adolescent)”

  • duppussû, “younger brother”

  • eṭlu, “young man/man”

  • šerru “baby, infant, young child”

  • šubultinbi “young”a

  • ṣuḫurtu​ “adolescent”

  • labāriš, “become old”

  • labāru, “growing to old age”

  • labīru, “old”

  • labīrūtu, “old age”

  • littūtu, “extreme old age”

  • puršumu, “elder”

  • šību, “old man”

  • šibūtu, “old age”​

“Young” words“Old” words
  • ayyaru, “young man”

  • batūlu, “young man (adolescent)”

  • duppussû, “younger brother”

  • eṭlu, “young man/man”

  • šerru “baby, infant, young child”

  • šubultinbi “young”a

  • ṣuḫurtu​ “adolescent”

  • labāriš, “become old”

  • labāru, “growing to old age”

  • labīru, “old”

  • labīrūtu, “old age”

  • littūtu, “extreme old age”

  • puršumu, “elder”

  • šību, “old man”

  • šibūtu, “old age”​

a All translations given in this section follow the Assyrian Dictionary of the Oriental Institute of the University of Chicago (CAD). CAD Š/3 187 s.v. “šubulti inbi” is given as “apprentice.”

Table 1.

Table of the Akkadian “age” words Bennett wanted to investigate.

“Young” words“Old” words
  • ayyaru, “young man”

  • batūlu, “young man (adolescent)”

  • duppussû, “younger brother”

  • eṭlu, “young man/man”

  • šerru “baby, infant, young child”

  • šubultinbi “young”a

  • ṣuḫurtu​ “adolescent”

  • labāriš, “become old”

  • labāru, “growing to old age”

  • labīru, “old”

  • labīrūtu, “old age”

  • littūtu, “extreme old age”

  • puršumu, “elder”

  • šību, “old man”

  • šibūtu, “old age”​

“Young” words“Old” words
  • ayyaru, “young man”

  • batūlu, “young man (adolescent)”

  • duppussû, “younger brother”

  • eṭlu, “young man/man”

  • šerru “baby, infant, young child”

  • šubultinbi “young”a

  • ṣuḫurtu​ “adolescent”

  • labāriš, “become old”

  • labāru, “growing to old age”

  • labīru, “old”

  • labīrūtu, “old age”

  • littūtu, “extreme old age”

  • puršumu, “elder”

  • šību, “old man”

  • šibūtu, “old age”​

a All translations given in this section follow the Assyrian Dictionary of the Oriental Institute of the University of Chicago (CAD). CAD Š/3 187 s.v. “šubulti inbi” is given as “apprentice.”

Bennett then located these words in her network. She noticed that the following three “age” words were not present in her network: ayyaru, duppussû, and šerru. Bennett utilized Oracc in Korp to understand why they were not there. After a search for each of these words, it was clear that these words did not meet the parameters used for the pmizer2 script. They either did not occur more than twice in the dataset, or did not co-occur with another word more than five times. For these words, it was appropriate to only carry out a close reading based on the Korp results. It is important to remember that the parameters intended to reduce noise in the lexical network development process may also hide some information, so researchers must ask why certain words may not appear in the results.

For those words that were in the masculinities network, Bennett undertook two layers of analysis: macro and micro. For the macro approach, Bennett identified network-wide measures to contextualize ‘age’ words within the wider masculinities network. This would contextualize ‘age’ within the construction of Assyrian masculinities. Only two age-related words were found in the core network: eṭlu (‘young man’) and puršumu (‘old man’). This suggests that when Assyrian scribes wrote about masculinities, these two age words were the most intertwined with this concept. Age was therefore a minor component of the construction of masculinities in the preserved (and digitized) Neo-Assyrian texts, but of the age words, eṭlu (‘young man’) and puršumu (‘old man’) were the most likely to be used when discussing age and masculinities. After identifying these words, Bennett used Oracc in Korp to find the texts that were the basis of this result. Bennett found that an important strength of Korp was the ability to conduct complex searches in the Oracc dataset. For these searches, she used the following formula: base form = “age word” AND period = Neo-Assyrian; any word 0–10 times; base form = “other word.” She then repeated the search with the “age word” and “other word” swapped, in order to best reflect the symmetrical window used in the pmizer2 script.25

This workflow was most important for understanding the co-occurrence results for the words labāru (‘long duration, longevity, growing to old age, disrepair’), šibūtu (‘old age’), and littūtu (‘extreme old age’). The words in their co-occurrence lists suggested that old age was a desirable outcome for Assyrian men, and particularly royal men. Bennett used Oracc in Korp to explore the texts that were the basis for these results, and discovered that it was not enough to simply reach old age – men had to have achieved a full healthy life, whilst kings’ long lives were tied to the longevity of their reign.

Not all of the “age” words are related to masculinities. After using Oracc in Korp for labīru (‘old one’). Bennett found that labīru was attested 168 times, but was only used to describe items, rites, buildings, and documents as “old.” Neither the PMI approach, the lexical network nor the close reading of Assyrian texts revealed any connection to masculinities. Labīru therefore acts as a reminder that lexical networks can only point to patterns, and close reading is the only method that can be used to explain these patterns.

After utilizing the micro (or ego network) approach to the lexical network, Bennett came to two key conclusions. The first is that there were differences in how young and old men were conceptualized by Assyrians. Young men in the prime of their life were prone to anger and were upright or tall. Old men were in a time of life others should aspire to, but old age was bestowed upon them by the gods.

As with the more traditional approaches to Assyriology, digital approaches are only as good as the data. Bennett’s data was taken from Oracc, which is quite comprehensive for Neo-Assyrian data, with 8,659 texts available. This is not the case for other time periods. Furthermore, the Neo-Assyrian data in Oracc is also skewed toward royal inscriptions with 2,588 texts in the RIAo (1,457), RIBo (126), and RINAP (1,005) sub-projects. There are 5,058 other texts that were found in the royal archives of Assyria in the SAAo sub-project.26 That 30 per cent of all available digitized Neo-Assyrian texts in Oracc are royal inscriptions surely cannot reflect daily language use. Even when looking at the SAAo sub-project, these were not documents written by laypeople, but texts largely written by scribes on behalf of someone, or by a scholar for a particular purpose. It is therefore important in any digital lexical project to keep data representativity in mind—digital lexical approaches do not magically create new data, but only offer an alternative perspective to that which we already have.

As a final note, Bennett’s article underscores the key finding from developing and using lexical networks for Akkadian corpora: digital approaches should accompany traditional methods, not replace them. Lexical networks are simply one tool available for Assyriologists to study ancient texts, and should complement other, more traditional methods like close reading of texts.

5. Discussion and conclusion

The article has presented an overview of the tools our research group has found useful in our study of ancient social groups. Sections 1 and 2 gave the necessary background for the reader to understand the particular challenges related to the study of ancient Mesopotamia and how we have overcome them. The concrete case studies presented in Sections 3 and 4, however, have been kept as general as possible, to facilitate similar approaches in adjacent fields of study. Our methodology and approaches are well documented and openly available, and in our view can be used on similar text materials, by other groups interested in DH approaches.

In Section 3, we discussed the application of the resources we have developed on three fronts. Section 3.1 demonstrated how analyzing semantic range (semantic domains) of Assyrian deities or material objects (king’s regalia) can be very useful on occasion. The limitations of data and applicable research questions become apparent. In Section 3.2, we demonstrate how pre-existing prosopographical data can be utilized to study social groups, e.g. with a weighted k-core centrality measure. As a first case study, we presented a small study on Aramaic names in the Assyrian Empire, with the preliminary result that people with Aramaic names seem to be further removed from the halls of power when measured with eccentricity scores. The second case study in 3.2 presents a thorough evaluation of our methods for the study of Egyptians in the ancient Near East. This reveals how important it is to realize the limitations of these methods as well. It is all but impossible to study the micro-history of a few Egyptians with any of the methods highlighted in Section 3 for three main reasons: there are too few data, the existing data are not rich enough, and the complex and nuanced research questions are not well suited for a quantitative approach. In a nutshell, you cannot kill mosquitoes with a shotgun. The test case with Aramaic names (numbering thousands in our material) illustrates that a quantitative approach can be useful to study social groups, but the case of Egyptian micro-history highlights the need to consider limitations of these methods already at the level of the research question. Finally, in Section 3.3, we highlight how the semantic domain of precious materials known from archaeology in combination with textual evidence can corroborate well-known facts about the material culture and residence of the elite.

Section 4 outlines a case study where some of the methods presented in this article were used to study different masculinities during the Neo-Assyrian Period. The lexical network used in the case study was based on masculinities, the use of Korp, and close-reading of Akkadian words relating to age. This allowed for a more nuanced approach to which Akkadian words were relevant for constructing masculinity during the Neo-Assyrian Period.

In conclusion, we hope that the cross-disciplinary methods that our research group has developed for the study of the ancient Near East can be useful outside the field of “Digital Assyriology.” Combining DH with more traditional research methods in history and other areas of humanities is an evolving field that benefits from collaboration across disciplinary boundaries. While DH methods are not a deus ex machina by any means, when judiciously used they can contribute significantly to research questions related to social group identities, semantic domains, and the networks of material culture, which are at the core of many research agendas in humanities.

Acknowledgements

We wish to acknowledge the ANEE Centre of Excellence for its gracious support of this work as well as the FIN-CLARIN research infrastructure for making available its resources through the Language Bank of Finland.

Author contributions

Krister Lindén (Conceptualization, Methodology, Supervision, Visualization, Writing—original draft, Writing—review & editing), Saana Svärd (Conceptualization, Formal analysis, Funding acquisition, Investigation, Supervision, Writing—original draft, Writing—review & editing), Tero Alstola (Data curation, Formal analysis, Investigation, Writing—review & editing), Heidi Jauhiainen (Data curation, Formal analysis, Resources, Writing—review & editing), Sam Hardwick (Software, Visualization, Writing—review & editing), Aleksi Sahala (Data curation, Methodology, Software, Writing—review & editing), Céline Debourse (Formal analysis, Investigation, Validation, Writing—original draft, Writing—review & editing), Ellie Bennett (Formal analysis, Investigation, Validation, Writing—original draft, Writing—review & editing), Lena Tambs (Formal analysis, Investigation, Resources, Writing—original draft, Writing—review & editing), Melanie Wasmuth (Formal analysis, Investigation, Validation, Visualization, Writing—original draft, Writing—review & editing), Elisabeth Holmqvist-Sipilä (Investigation, Validation, Writing—original draft, Writing—review & editing), Rick Bonnie (Investigation, Writing—review & editing), Marta Lorenzon (Investigation, Validation, Writing—original draft, Writing—review & editing)

Funding

Funding support for this article was provided by the Research Council of Finland through the Center of Excellence funding for ANEE and infrastructure funding for FIN-CLARIN (Award numbers: 298647, 358720).

Footnotes

1

http://oracc.museum.upenn.edu/(accessed 11 May 2023).

2

https://cdli.ucla.edu/(accessed 11 May 2023).

6

https://universaldependencies.org/ (accessed 11 May 2023).

7

https://github.com/asahala/BabyFST (accessed 11 May 2023); https://github.com/asahala/oraccnlp (accessed 11 May 2023).

10

https://github.com/raphv/gexf-js (accessed 11 May 2023).

12

https://github.com/asahala/(accessed 11 May 2023).

13

The new version of the Portal allows for analysis between 2nd millennium, 1st millennium, and Neo-Assyrian texts.

14

In summer 2021, the dataset was limited to the manually lemmatized Oracc projects, that is, many relevant texts were not yet included at the time of the case studies. This has been remedied and the Oracc texts without manual lemmatization have been autolemmatized and made available in Korp. https://www2.helsinki.fi/en/researchgroups/ancient-near-eastern-empires/research-data/anee-datasets (accessed 11 May 2023).

15

Note that a substantial addition of autolemmatized data was made in Autumn 2021, which may not be reflected in the provided analysis.

16

We note that when the Oracc in Korp was updated in 2021 and the autolemma was added to Oracc texts, we now find more than 700 mentions of Egypt and Egyptians in the Oracc in Korp dataset. However, the ATAE is still labeled as work in progress in Oracc, so the ATAE dataset is not fully included in the Oracc download distribution and is therefore not available through the ANEE Lexical Portal and Korp.

17

As of the time of writing (July 2024), the ANEE Lexical Portal has been updated to include several networks based on chronology (Sahala et al. 2023). One of these is based only on Neo-Assyrian documents.

18

The details of the construction of these networks can be found on the article’s accompanying Zenodo https://doi.org/10.5281/zenodo.7677180 (accessed 19 July 2024).

19

Bilingual texts were therefore excluded from this dataset. Bennett used the Python script developed by Niek Veldhuis for this stage. It is available at https://github.com/niekveldhuis/compass (accessed 11 May 2023).

20

The field of masculinities studies in the Neo-Assyrian period is still relatively new, but key studies include N’Shea 2016 and 2018 and Bennett 2019, and the edited volume Being A Man edited by Ilona Zsolnay (2017).

21

As explained by Raewynn Connell, there are many ways to ‘be a man’ in any given society, and is not contingent on biological sex, nor is it stable throughout time. Four key categories of men Connell identified in 1990s Australia are: hegemonic (idealized masculinity); complicit (upholding ideals of hegemonic masculinity, though not adhering to all of them); marginalized (similar to complicit masculinity, but collides with another, marginalized element of identity); and subordinate (men who embody elements of how not to be a man) (Connell 1995; Connell and Messerschmidt 2005).

22

Bennett used Aleksi Sahala’s pmizer2 python code. The Python script can be accessed on Sahala’s Github: https://github.com/asahala/Pmizer/blob/master/pmizer2.py (Sahala 2019).

23

PMI2 produces negative scores, which Gephi cannot visualize. Bennett therefore added an arbitrary number to bring the scores into the positive range to facilitate visualizations.

24

Bennett points out that more work is to be done regarding which formal Network Analysis measures are applicable to these lexical networks.

26

These are raw figures, and include duplicate texts that appear in multiple Oracc sub-projects. The removal of these duplicates is important in the data processing stage of research.

References

Anderson
A. G.
(
2018
). ‘The Old Assyrian Social Network: An Analysis of the Texts from Kültepe-Kanesh (1950-1750 BCE)’, Doctoral dissertation, Harvard University.

Alstola
T.
,
Svärd
S.
(2024). ‘Digital Humanities Meets Ancient Languages’ In M. Nissinen, & J. Jokiranta (Eds.), Changes in Sacred Texts and Traditions: Methodological Encounters and Debates. Atlanta, Society of Biblical Literature,
193
233
.

Alstola
T.
et al. (
2019
). ‘
Aššur and His Friends: A Statistical Analysis of Neo-Assyrian Texts’
.
Journal of Cuneiform Studies
,
71
:
159
80

Bach
J.
,
Bonnie
R.
(
2023
). ‘Three Cuneiform Texts from the National Museum of Finland – Provenance, Editions, and Commentary’, Advances in Ancient, Biblical, and Near Eastern Research,
3
(1). https://doi.org/10.35068/aabner.v3i1.1036.

Bennett
E.
(
2019
). ‘
“I Am A Man”: Masculinities in the Titulary of the Neo-Assyrian Kings in the Royal Inscriptions
’,
KASKAL
,
16
:
373
92
.

Bennett
E.
(
2023
). ‘Age and Masculinities during the Neo-Assyrian Period’, Journal of Cuneiform Studies,
75
, 123–154. http://hdl.handle.net/10138/578930.

Bastian
M.
,
Heymann
S.
,
Jacomy
M.
(
2009
). ‘Gephi: an open source software for exploring and manipulating networks’, in Proceedings of the International AAAI Conference on Weblogs and Social Media, pp. 361–62. Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icwsm.v3i1.13937 (accessed 16 March 2021).

Bennett
E.,
,
Tambs
L., and
,
Lindén
K.
(2024)
.
Letters have Weights: Weighted k-Shells in a Neo-Assyrian Co-Attestation Network
.
Journal of Historical Network Research
,
10
,
150
197
. DOI: http://doi.org/10.25517/jhnr.v10i1.95.

Bojanowski
P.
et al. (
2017
). ‘
Enriching Word Vectors with Subword Information
’,
Transactions of the Association for Computational Linguistics
,
5
:
135
46
.

Boullier
D.
(
2017
). ‘Big Data Challenge for Social Sciences and Market Research: From Society and Opinions to Replications’, in
Cochoy
F.
,
Hagberg
J.
,
Sörum
N.
Peterson-McIntyre
M.
(eds)
Digitalizing Consumption: How Devices Shape Consumer Culture
, pp.
20
40
.
London
:
Routledge
.

Connell
R.W.
(
1995
).
Masculinities
.
Cambridge
:
Polity Press
.

Connell
R.W.
,
Messerschmidt
J.W.
(
2005
). ‘
Hegemonic Masculinity: Rethinking the Concept
’,
Gender & Society
,
19
:
829
59
. https://10.1177/0891243205278639.

Debenjak-Ijäs
A.
,
Bonnie
R.
,
Saari
S.
(
2021
). Making Home Abroad 3D Digitizations. Faculty of Arts, University of Helsinki. https://doi.org/10.23729/196c8d4c-9cc4-4463-b644-11776cce0b3e.

Faist
B.
(
2007
).
Alltagstexte aus neuassyrischen Archiven und Bibliotheken der Stadt Assur. Studien zu den Assur-Texten 3
.
Wiesbaden
:
Harrassowitz
.

Frické
M.
(
2015
). ‘
Big Data and Its Epistemology
’,
Journal of the Association for Information Science and Technology
,
66
:
651
61
.

Hage
P.
,
Harary
F.
(
1995
). ‘
Eccentricity and Centrality in Networks
’,
Social Networks
,
17
:
57
63
.

Hardwick
S.
,
Silfverberg
M.
,
Lindén
K.
(
2015
). ‘Extracting semantic frames using hfst-pmatch, in Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pp.
305
8
. Linköping University Electronic Press, Sweden. https://aclanthology.org/W15-1842.pdf (accessed 19 July 2024).

Holmqvist
E.
(
2019
).
Ceramics in Transition: Production and Exchange of Late Byzantine—Early Islamic pottery in Southern Transjordan and the Negev
.
Oxford
:
Archaeopress
. https://www.archaeopress.com/Archaeopress/download/9781789692242

Jauhiainen
H.
,
Alstola
T.
(
2023
). ‘Fast(Text) Analysis of Mesopotamian Divine Names’, in
Bigot Juloux
V.
,
di Ludovico
A.
, and
Matskevich
S.
(eds)
The Ancient World Goes Digital (CyberResearch vol. 2): Case Studies on Archaeology, Texts, Online Publishing, Digital Archiving, and Preservation
. Digital Biblical Studies, 6.
Leiden
:
Brill
. http://hdl.handle.net/10138/563599.

Jauhiainen
H.
,
Alstola
T.
(
2022
). ‘
A Social Network of the Prosopography of the Neo-Assyrian Empire
’,
Journal of Open Humanities Data
,
8
:
8
. http://doi.org/10.5334/johd.74.

Jauhiainen
H.
et al. (
2021
). Lexicographical Portal—The Dataset. Zenodo. https://doi.org/10.5281/zenodo.4646661.

Lagus
K.
,
Pantzar
M.
,
Ruckenstein
M.
(
2018
). ‘
Kansallisen tunnemaiseman rakentuminen: Pelon ja ilon rytmit verkkokeskusteluissa. [Constructing the Citizens Mindscape: the rhythms of fear and joy in internet discussions]’,
Kulutustutkimus.Nyt
12
:
62
83
.

Larsen
A. G.
,
Ellersgaard
C. H.
(
2017
). ‘
Identifying Power Elites—K-Cores in Heterogeneous Affiliation Networks’
,
Social Networks
,
50
:
55
69
. https://doi.org/10.1016/j.socnet.2017.03.009.

Lorenzon,
M.
and ,
Wallis,
C.
(2023)
.
Building Walls, Social Groups and Empires: A Study of Political Power and Compliance in the Neo-Assyrian Period
.
Asia Anteriore Antica. Journal of Ancient Near Eastern Cultures
,
4
,
47
70
. .

Luukko
M.
et al. (
2020
).
Akkadian Treebank for early Neo-Assyrian Royal Inscriptions
. Treebanks and Linguistic Theories.
Prague
:
Czech Republic
.

Newman
M.
(
2018
).
Networks
. 2nd edn.
Oxford
:
Oxford University Press
.

Nissinen
M.
(
2014
). ‘Assyria’, in
Niehr
H.
(ed.)
The Aramaeans in Ancient Syria
, pp.
273
96
.
Leiden
:
Brill
.

N’Shea
O.
(
2016
). ‘
Royal Eunuchs and Elite Masculinity in the Neo-Assyrian Empire
’,
Near Eastern Archaeology
,
79
:
21421
.

N’Shea
O.
(
2018
). ‘Empire of the Surveilling Gaze: The Masculinity of King Sennacherib’, in
Svärd
S.
,
Garcia-Ventura
A.
(eds)
Studying Gender in the Ancient Near East
, pp.
315
35
.
University Park, PN
:
Eisenbrauns
.

Radner
K.
,
Baker
H. D.
(eds) (
1998
–2011).
The Prosopography of the Neo-Assyrian Empire
.
Helsinki
:
The Neo-Assyrian Text Corpus Project
.

Sahala
A.
,
Lindén
K.
(2020). Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting. In Fred, A., and Felipe, J. (eds.), Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KDIR.
SciTePress
,
Setúbal, Portugal
. November.

Sahala
A.
(
2019
). ‘Pmizer2’, https://github.com/asahala/Pmizer/blob/master/pmizer2.py, accessed 11 May 2023.

Sahala
A.
(
2021
). ‘Contributions to Computational Assyriology’, PhD Dissertation. University of Helsinki.

Sahala
A.
et al. (
2020
).
Towards a Finite-State Based Computational Model of Ancient Babylonian
. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 3886–3894, Marseille, France. European Language Resources Association, France.

Sahala
A.
et al. (
2023
). ‘ANEE Lexical Networks v.2.0’, http://urn.fi/urn:nbn:fi:lb-2022100301, accessed 11 May 2023.

Svärd
S.
et al. (
2018
). Semantic Domains in Akkadian Text.

Svärd
S.
et al. (
2021
). ‘Fear in Akkadian Texts: New Digital Perspectives on Lexical Semantics’, in
Hsu
S.-W.
Llop Raduà
J.
(eds)
The Expression of Emotions in Ancient Egypt and Mesopotamia). Culture and History of the Ancient Near East
, pp.
470
502
.
Leiden
:
Brill
. https://10.1163/9789004430761_019

Wasmuth
M.
(In preparation).
Living in an Ancient Urban Metropolis. Assur in the 7th c. BCE: the Potential of a Case Study Approach.

Wasmuth,
M., and
,
Debourse,
C.
(In press)
.
Semantic Domains in Digital Assyriology: The Case of the ‘Foreign Other’
.
Journal of Cuneiform Studies
.

Wasmuth
M.
(
2016
). ‘
Cross-Regional Mobility in ca. 700 BCE: The Case of Ass. 8642a/IstM A 1924’
,
Journal of Ancient Egyptian Interconnections
,
12
:
89
112
.

Zsolnay
I.
(ed.) (
2017
).
Being a Man. Negotiating Ancient Constructs of Masculinity
. Studies in the History of the Ancient Near East.
Abingdon, NY
:
Routledge
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.