-
PDF
- Split View
-
Views
-
Cite
Cite
Valentine Bernasconi, Leonardo Impett, Decoding early modern gestural patterns through hand networks, Digital Scholarship in the Humanities, 2025;, fqaf016, https://doi.org/10.1093/llc/fqaf016
- Share Icon Share
Abstract
In this article, we present a new methodology and visualization system for Italian early modern painted hands. The goal is to outline potential gestural systems over a large corpus of study, in opposition to many art historical works focusing on a single hand gesture or on a specific painter. Furthermore, as most of the actual research in art history is dedicated to symbolic hands, the global approach aims to shed light on less documented hands, such as functional gestures. We first perform unsupervised clustering on a collection of hands, before combining them with classes for symbolic gestures and creating a system of recurrent hand combinations according to their corresponding iconographies. We then introduce an interactive visualization with a network graph, where the user can select an iconography and see the type of hands used in the specific narrative context. Through this experiment, we provide an analysis of the most common types of hands found in our collection, as well as the way they can be associated for particular iconographies. We finally discuss the impact of machine learning models and subjective choices induced on the results at each step, as well as implications of the study of the pictorial detail at large scale.
1. Introduction
Early modern painted hands represent a complex object of research for digital art history. The early modern period saw the creation of compendiums, such as Chirologia or the natural language of the hand by John Bluwer (Bulwer 1974) or L’arte de’ cenni by Giovanni Bonifacio (Bonifacio 2018), and a growing interest in gestures mirroring that in classical rhetoric more widely. However, historical sources do not provide much information on the use of an adequate hand language system in early modern art. Most of the research on hands focuses on the analysis of a corpus of a single painter and on the social and cultural implications in the depiction process, with the works of Moshe Barasch on Giotto (Barasch 1987) or Rudolf Wittkower on El Greco (Wittkower 1977); around a specific type of representation, with for example the series of research of Mauro Zanchi on the specific gesture of the horn (Zanchi 2015, 2017); on the more general question of the body gesture, with the writings of Baxandall (1972, 1979, 1981) and Peter Burke (1992) on the language of gesture in early modern Italy; or on the medieval period, also rich in signs and gestural symbols, as outlined by Schmitt (1990) and Garnier (1989, 1996). We also note more experimental research on the use of the detail of the hand for attribution, with the controversial work of Giovanni Morelli in the late 19th century (Morelli 1893). The study of gestures as a resurgence of classical patterns is also at the heart of innovative methodologies, with the works of Andrea de Jorio in the 1830s for the reading of contemporary hand gestures(de Jorio 2007), or the Bilderatlas Mnemosyne of Aby Warburg (Ohrt 2020). This diversity of approaches in art history illustrates a plurality of analytical points of view for hand gestures. These many perspectives are also evidence of a change in the function of gestures that took place toward the 16th century (Eck 2007).
The recent work of the art historian Dimova (2020) represents the most comprehensive work on the subject to date. It proposes a novel approach based on a large corpus of paintings, mostly from the French baroque period, outlining the necessity to study hand gestures and their meaning through the confrontation of a multiplicity of images in relation to their context. Beyond the creation of a lexicon of hands, Dimova introduces the possibility of studying these hands according to their combination within a composition, as ‘gestural chords’,1 also called gestural eurythmics. It consists, among others, of the assessment of the variety of hands represented, typical associations among similar iconographies, and their spatial arrangement on the painting.
From a computational perspective, the study of hands relies on the use of machine learning models for human pose estimation (HPE), which are usually pre-trained on real-world images and present lower accuracies on paintings (Madhu et al. 2022; Bernasconi, Cetinić and Impett 2023). Due to the lack of accuracy of HPE models on artworks, some projects further address their sophistication through fine-tuning methods (Madhu et al. 2022). New annotation platforms are created (Bernasconi and del Castillo 2023), as well as new training datasets (Springstein et al. 2022; Ju et al. 2023; Schneider and Vollmer 2023; Zinnen et al. 2023). So far, HPE has been mostly used for body pose analysis in artworks in projects focusing on the understanding of the body in visual narratives of specific artists or periods (Marsocci and Lastilla 2021; Madhu et al. 2022; Zhao, Salah and Salah 2022). Additionally, the categorization system from Dimova has been translated into a proper classification task with the definition of keypoints features (Bernasconi, Cetinić and Impett 2023), but which does not present convincing predictions yet due to the visual complexity of these images and the small amount of training data. The latter is a consequence of the weak performance of HPE models, the important manual cleaning process required (Bernasconi, Cetinić and Impett 2023) and the fact that the categories proposed by Dimova focus on symbolic gestures. They do not take functional gestures into consideration, which significantly reduces the proportion of hands detected by the machine used for the classification task. Another potential factor impairing the recognition of hands is the lack of contextual information, such as the character performing the gesture, the narrative of the scene and the historical context of the artwork, that is somehow difficult to capture in a computational model (Bernasconi, Cetinić and Impett 2023).
Yet, the number of hands that an HPE model can accurately detect on a corpus remains significant. The collection of hands obtained from such a process would deserve a better consideration in its entirety as there is a great potential to unveil gestural configurations that were not addressed in art history so far. Furthermore, the possibility to add more contextual information does not only depend on the available metadata from the dataset used. It can also be created from commonly found information such as the titles of the artwork, from which the iconography represented can be inferred through well-established classification system for iconography (Couprie 1983; Sherman 1987).
In this work, we propose the definition of a new method and visualization system to better disentangle gestural systems in play in early modern representations.2 Hands retrieved with the help of the OpenPose model are clustered and grouped according to their corresponding iconography before being linked based on the amount of occurrences on the same painting. An interactive visualization system representing hand combinations as a network is then created.
We also produce a network of hand combinations merging our clusters and predefined categories of symbolic hands from art history (Bernasconi, Cetinić and Impett 2023). Through this process, we aim to better showcase the diversity and use of hands often overlooked in art history and the way they are combined for the narrative purpose of a specific iconography. In other words, the goal of the present work is to showcase how early modern visual narratives can be expressed through hands.
Finally, we propose a new theoretical framework for the computational analysis of early modern hands, distinguishing between geometric and gestural patterns. In this context, geometric patterns serve the understanding of a visual rhythm and coherence within the composition, whereas gestural patterns serve the possibility to better define painted gestures derived from live gestures and their potential symbolic and functional meaning.
2. Materials and methods
2.1 The hands collection
Similarly to previous works on the computational analysis of hand gestures in art (Bernasconi 2022; Bernasconi, Cetinić and Impett 2023), hands were extracted from a corpus of paintings built from the photographic collection of the Bibliotheca Hertziana in Rome. The collection illustrates research interests of the many art historians who worked at the institute over the 20th century. Its content therefore holds a large amount of black and white photographs of architectural monuments of the city of Rome and their frescoes. It was also enriched by acquisitions from the different directors of the photographic collection, compiling a great majority of objects from the Italian Renaissance and Baroque periods, as well as a lesser proportion of artworks from Western Europe (Schallert and Röll 2014). The photographic collection holds about 23,784 digitized paintings, from which 5,234 from the 15th to the 17th centuries are available to download and were used. The pretrained machine learning model OpenPose (Cao et al. 2021) was directly used on the digital images to detect the different bodies. Based on the keypoints information given through coordinates, the hands were then automatically cropped and extracted. A total of 18,641 hands were detected by the model, which were then manually cleaned by removing misdetected hands. As a result, 5,995 hands and their corresponding keypoints information were obtained from the process. On average, the OpenPose model presents a detection rate of 56 per cent on early modern paintings and 32 per cent precision. This collection of 5,995 hands constitutes the foundation of various experiments towards the computational analysis of hand gestures in pictorial art and the existence of a language of hands in art.
2.2 Clustering hands
Based on the keypoints information, a set of features was defined following previous works on the geometric characterization of painted hands (Bernasconi, Cetinić and Impett 2023). It consists of unit vectors and angles to represent the pose of the hand with the direction of the fingers and their articulations. The hands were then clustered using these features as the use of other visual features would not yield good results based on the visual inspection of the content of the clusters. The unsupervised clustering process presented two challenges, with, on the one hand, the choice for a clustering method and, on the other, the definition of a right number of clusters. First, the features from the keypoints were pre-processed and standardized using a standard scaler algorithm. Then, different common unsupervised clustering techniques were tested for the creation of fifty clusters, including a K-Means algorithm, a Principal Component Analysis for dimensionality reduction combined with a K-Means algorithm, a Spectral clustering, a Gaussian Mixture model and an Agglomerative clustering. The performance of these different models was evaluated by the main author based on the following visual criteria: the general direction of the hand, the position of the fingers and the angles shaped by their articulations. To visualize the content of each cluster, various projections were used, such as the T-SNE projection as seen in Fig. 1 and in the Appendix A. From these visual inspections, we determined that the K-Means algorithm was offering the most convincing results. The optimal number of clusters was then refined for the use of the K-Means algorithm.

As demonstrated in Fig. 2, the elbow method, where the within-cluster sum of square value is calculated for each number of clusters, did not allow to define a clear elbow point. Moreover, although the lexicon of hands already defined by Dimova presents a total of thirty hand poses related to symbolic gestures, the clusters are not representative of the categorization proposed in art history and the number of clusters could not be established on this ground. After a visual inspection of the different clusters, it was decided to define a total of forty clusters, which appeared to be the right trade-off between a fair portrayal of the variety of hands in the dataset and the readability of future results.

Estimation of the right number of clusters for the k-Means algorithm with the help of the elbow method.
2.3 Flipping left hands
As we can see in the T-SNE projection based on the keypoints features on Fig. 3, left and right hands seem to be separable. A tendency that is confirmed by the distribution of the left and right hands among each cluster, as seen in Fig. 4a. Hand types are not equally represented within every cluster, with an average standard deviation of thirty-five samples in-between left and right hands. After a visual inspection of the clusters, such as the ones found in the projection in Appendix Fig. A.1, we determined that, although left and right hands with a similar hand shape and orientation are found in the same cluster, they do not necessarily represent the same gestural patterns. Similarly, a left and a right hand performing the same gesture but with different horizontal orientations, such as hands on chest, are not found in the same cluster. Indeed, the orientation of the hands plays an important role in the definition of the clusters, but their meaning can significantly differ. To create a proper clustering according to gestural patterns, left and right hands were split, and the original keypoints of the left hand were then horizontally flipped. The features were then computed again for the left hand according to Bernasconi, Cetinić and Impett (2023) and merged with the untouched right hand keypoints features. Based on the same process as stated above, forty clusters were created with a K-Means algorithm. The analysis of the left and right hands distribution for each cluster shows much more balanced clusters for each hand types, with an average standard deviation of 8.6 samples in-between left and right hands. Their visual examination confirmed a gestural coherence among the clusters. We therefore differentiate gestural patterns, obtained by flipping left hands, in confrontation to geometric patterns. By geometric patterns we understand the general orientation and shape of the hands independently from the hand type, and which rather illustrate the visual effect produced by these hands for the composition. For the creation of the networks, gestural patterns are used.

2D T-SNE projections (a) for left and right hands with their original orientation, (b) with left hands horizontally flipped.

Left and right hands distribution (a) with original hands orientation, (b) with left hands horizontally flipped and the addition of symbolic categories.
2.4 Adding symbolic categories
Many clusters hold symbolic gestures that were already described in previously published works (Bernasconi, Cetinić and Impett 2023) and introduced by Dimova (2020). We decided to integrate this existing knowledge to the network for an enhanced readability and analysis of the hand combinations. In the work (Bernasconi, Cetinić and Impett 2023), nine categories were manually populated in order to create a training dataset. These categories, which represent symbolic gestures, were added to the forty clusters, and the symbolic hands found in these clusters were re-labeled accordingly, as seen in Fig. 4b. Hence, clusters are labeled with a corresponding number from 0 to 39, and symbolic hands are labeled with their original description: benedictio, joint palms praying, hand on chest, opened palm forward, opened hand up, opened hand forward, fist, pointing index, and intertwined fingers.
2.5 Creating a network
The creation of an interactive interface for graph network visualization, as shown in Fig. 5, consists primarily of painted hands that are combined together based on keypoints features and iconography represented by their original painting. Nodes correspond to the clusters, and edges to the occurrence of these clusters within at least one painting from a specified iconography, such that:

Nodes is a cluster
Edges and and co-occur within an image of same iconography
The overall process can be broken down into the following steps:
Retrieval of an iconography for each painting.
Creation of pairs of hands based on their coexistence among the same painting.
Matching the hands with their corresponding cluster and with the iconography of the original painting.
Creation of a network for each iconography where the edges correspond to the representation of two clusters of hands among the same painting.
Ultimately, ensuring a straightforward and reproducible process allows for the creation of distinct clusterings—one for geometric patterns and the other for gestural patterns. For the presentation of the methodology, we focus on gestural patterns as previously defined.
2.5.1 Extracting the iconography
Because of the need to bring more context to understand gestural patterns, and based on the elements available from the metadata, it was decided to group hands according to the corresponding iconography of their paintings. The iconography is defined with the help of the title provided for each painting and the Iconclass system.3 The Iconclass system (Brandhorst and Posthumus 2016) consists of a tree-like structure, with ten main classes that are subdivided into finer categories. Each category is given an identification code, where each element of the code corresponds to the degree of affiliation with one of the main and subsequent classes. In order to simplify the readability of the results, only the third level of refinement was used, which corresponds to a total of 400 categories. The titles of the paintings were processed based on natural language processing techniques, and the sentences were tokenized with stopwords removed. The lists of tokens were then tagged4 and only proper nouns, plural nouns, singular nouns, and foreign words were kept. For each noun in the list, a request was sent to Iconclass through their application programming interface (API). The query result is a list of potential iconographies for the given keyword, which are provided with their identification code.
Overall, each keyword returned at least one iconography. These codes were then assembled for each title, reduced to their first three elements, corresponding to their broader category, counted and sorted in order to retrieve the most probable broad iconography for the given painting. Most titles present in the metadata are written in German, and a language detection library5 was used to detect other potential languages, such as Italian, English or French, and query the Iconclass API adequately. Because some German words are composite terms, a library for compound splitting6 was also used to split these keywords into pairs of keywords. As described in Fig. 6, there is a dominance of religious iconographies in the dataset, where more than half of the images are portraying the story of Saints, followed by the public life of Christ and representations of the Virgin Mary. In order to balance this majority of iconographies for Saints, these paintings underwent again a query process with Iconclass. This time a fourth degree of specificity was included when sorting and counting the most common iconography outputed for each title. The results in Table 1 show the way the Saints category was reduced by 10 per cent for the benefit of another category corresponding to Female Saints.7

Occurrence of the main iconographies from the Iconclass system found in the dataset.
Occurrence of the ten most common iconographies with third and fourth levels.
Iconography name . | Occurrence at 3rd level (%) . | Occurrence at 4th level (%) . |
---|---|---|
Saints (11H) | 55.97 | 45.63 |
Female saints (11HH) | None | 9.22 |
Public life of Christ: from his baptism until the Passion (73C) | 7.07 | 7.08 |
The Virgin Mary (11F) | 4.90 | 4.91 |
Male persons from classical history (98B) | 2.89 | 3.56 |
Passion of Christ (73D) | 2.51 | 2.51 |
(scenes from the life of) John the Baptist and Mary (73A) | 1.94 | 1.94 |
The Greek heroic legends (II): heroes (95A) | 1.47 | 1.47 |
Genesis: the patriarchs (71C) | 1.07 | 1.39 |
Birth and youth of Christ (73B) | 1.25 | 1.25 |
Iconography name . | Occurrence at 3rd level (%) . | Occurrence at 4th level (%) . |
---|---|---|
Saints (11H) | 55.97 | 45.63 |
Female saints (11HH) | None | 9.22 |
Public life of Christ: from his baptism until the Passion (73C) | 7.07 | 7.08 |
The Virgin Mary (11F) | 4.90 | 4.91 |
Male persons from classical history (98B) | 2.89 | 3.56 |
Passion of Christ (73D) | 2.51 | 2.51 |
(scenes from the life of) John the Baptist and Mary (73A) | 1.94 | 1.94 |
The Greek heroic legends (II): heroes (95A) | 1.47 | 1.47 |
Genesis: the patriarchs (71C) | 1.07 | 1.39 |
Birth and youth of Christ (73B) | 1.25 | 1.25 |
Occurrence of the ten most common iconographies with third and fourth levels.
Iconography name . | Occurrence at 3rd level (%) . | Occurrence at 4th level (%) . |
---|---|---|
Saints (11H) | 55.97 | 45.63 |
Female saints (11HH) | None | 9.22 |
Public life of Christ: from his baptism until the Passion (73C) | 7.07 | 7.08 |
The Virgin Mary (11F) | 4.90 | 4.91 |
Male persons from classical history (98B) | 2.89 | 3.56 |
Passion of Christ (73D) | 2.51 | 2.51 |
(scenes from the life of) John the Baptist and Mary (73A) | 1.94 | 1.94 |
The Greek heroic legends (II): heroes (95A) | 1.47 | 1.47 |
Genesis: the patriarchs (71C) | 1.07 | 1.39 |
Birth and youth of Christ (73B) | 1.25 | 1.25 |
Iconography name . | Occurrence at 3rd level (%) . | Occurrence at 4th level (%) . |
---|---|---|
Saints (11H) | 55.97 | 45.63 |
Female saints (11HH) | None | 9.22 |
Public life of Christ: from his baptism until the Passion (73C) | 7.07 | 7.08 |
The Virgin Mary (11F) | 4.90 | 4.91 |
Male persons from classical history (98B) | 2.89 | 3.56 |
Passion of Christ (73D) | 2.51 | 2.51 |
(scenes from the life of) John the Baptist and Mary (73A) | 1.94 | 1.94 |
The Greek heroic legends (II): heroes (95A) | 1.47 | 1.47 |
Genesis: the patriarchs (71C) | 1.07 | 1.39 |
Birth and youth of Christ (73B) | 1.25 | 1.25 |
2.5.2 Creating combinations of hands
The different hands were then combined based on their corresponding cluster and iconography in order to produce a network. The overall process toward the creation of such a network, described in Fig. 5, is based on the occurrence of clusters among the same painting, that are then grouped based on their iconography. The primary creation of the combination of hands requires following definitions:
as the set of paintings in the corpus, where and
as the set of hands detected in each , with , and being the jth hand detected in .
as the set of clusters, where in with
Pairs as the list of all hand combinations found
Then, the set of clusters is sorted and a list of unique pair combinations of clusters is created for each painting, such that:
for each painting in do
while < do
while < do
end while
end while
end for
With this process, we are able to show most common pairs of clusters combinations, as shown in Fig. 7. Finally, each painting is then linked to a corresponding iconography. Based on graph theory, all pairs combinations for the paintings belonging to the same iconography are then merged, such that vertices correspond to clusters and edges to their occurrence within a painting. The occurrences of each pair of clusters for each iconography is calculated in order to produce a weighted graph. As we can see in Fig. 5, the graph expresses co-occurrences of types of hand gestures based on iconography.

Most common cluster combinations with left hands flipped and categories of symbolic gestures.
2.5.3 Building a network graph
A network graph was built with the Pyvis library8 and the visJS framework for a web rendering with Javascript (Perrone, Unpingco and Lu 2020). In this network, each node corresponds to a cluster, and each weighted edge represents the amount of occurrences of these two clusters among same paintings per iconography, as shown in Fig. 5. Additionally, a close study of these networks for each iconography is made possible through the implementation of an interactive graphic interface. First, we built the network in python, adding each nodes and weighted edges to the network object. The Pyvis library generates an html page, which we then transform in order to create an html template that can be reused with any network. As a result, we can see in Fig. 8 the simple structure chosen for the interactive interface. On the center of the page, the network is displayed, where each node is represented by the centroid image for each cluster, which size correlates with the cluster size. The color of the edges corresponds to the iconography they belong to, and the width to the amount of time the pair of clusters appears. Nodes can be dragged and moved around. On the left hand side, we have a checkbox list of all the iconographies encountered in the dataset. The latter is sorted according to the Iconclass ranking system. In order to gain in readability of the network graph displayed, the depth of the Iconclass classification was once again evaluated. Despite the dissociation between Saints and Female saints, considered as an important piece of information for a potential future analysis of male female differentiation, the iconographies that would be represented by less than ten paintings were updated to their parent class corresponding to the second degree of classification. With this process, the total number of iconographies was reduced from eighty-eight to fifty-seven. A single or multiple iconographies can be selected, in which latter case the color of the edges in the network graphs helps differentiate the different iconographies displayed. Thus, as seen in Fig. 9, it is possible to combine networks for a broader category type, to compare networks for different categories, or to have the possibility to study in more detail combinations from a specific iconography.


Detail of a network made of a sub-selection of iconographies, from which: material aspects of daily life (in purple), social and economic life, transport and communication (in turquoise), crafts and industries (in red) and the arts; artists (in light blue).
3. Results and discussion
3.1 Clusters
In both geometric and gestural clustering types, we can see that there is an unbalanced distribution of hands among the clusters. With a closer look at the content of these clusters, as represented in Fig. 4, we are able to see that more complex hand poses are found in smaller clusters, whereas large clusters present less complex positions. A first explanation can be found in the detection capability of the machine learning model used, which may better operate on less complex configurations made of hands laying flat with all the fingers distinctly visible. The performance of the model used also has to be taken into account when analyzing the results and impacts on any statement made out of the results.
Beyond the computational bias, another hypothesis to explain the correlation between the complexity of the hand represented and the proportion of hands found is that the most represented types of hands in early modern paintings are indeed simpler hand positions. This idea of a majority of simpler hand gestures contradicts theories about the great variety of complex gestures produced in the Renaissance period (Wittkower 1977; Barasch 1987; Dimova 2020), but also the emphasis of historical sources and the many preparatory studies on the morphology and the anatomy of the hand (Dimova 2020). Nevertheless, simple might not be the right adjective to describe these hands and, from a historical perspective, they might as well be described as more natural expressions of the hand in opposition to symbolic signs. Indeed, after a renewed interest of antique texts towards the end of the Middle Age in Europe (Schmitt 1990), as well as a renewed interest in gestures (Burke 1992), the Renaissance period saw the emergence of new types of pictorial representations, and, with them, the creation of new forms of gestural expressions to communicate these new narratives (Warburg 2017). A phenomenon of more natural expressions also noticed on early 14th century works of Giotto (Barasch 1987) and explained by the multiplicity of sources of inspiration for the production of these hand poses (Chastel 1986).
A last hypothesis to explain these most common hands is linked to the prevalence of religious iconographies among the dataset used. Many well-known symbolic hand gestures involve hands joined in prayer, as well as hands on chests (Dimova 2020). This gestural code was also used in various allegoric combinations found in the breviary Iconologia of Cesare Ripa in 1593 and might have been an important source for numerous painters at the time (Dimova 2020). This last hypothesis is in fact hereafter confirmed with the addition of symbolic categories of hands to the clusters. The process removed symbolic hands known in art history from their original clusters and reassigned them to symbolic categories as seen in Fig. 4. When comparing the graphs 4a and 4b, we can see how clusters 1 and 9 are significantly reduced in favor of the populating of the classes joint palms praying and hand on chest.
Finally, the high population seen in pointing index, the fourth most important class, could be seen as an indicator of the narrative complexity in place in early modern paintings, which fosters a need to guide the viewer (Dimova 2020). The mechanism of the pointing gesture to direct the gaze, also known as designation, was already prevalent in the Middle Age (Garnier 1989, 1996; Bell, Schlecht and Ommer 2013) and its efficiency has been confirmed by early modern humanists (Alberti 1956). On the other hand, the prevalence of the opened hand forward, the sixth most important class and which represents a speaking gesture, confirms the re-appropriation of rhetoric in early modern time (Schmitt 1990) and the need to express speech. It seems that the gesture is specific to that time period as the opened hand was endorsing a different meaning in medieval time (Garnier 1996), and might represents an iconic gesture of early modern paintings.
3.2 Flipping left hands
The results for the occurrence of hands in each cluster in Fig. 4a reveal that most clusters present a majority of either left or right hands. With a closer look at the hands present in these clusters, we realize that they have similar hand poses but not similar hand gestures. For example, a left hand in cluster 2 will have the same orientation as a right hand from that same cluster, as shown in Fig. 10. However, where the first shows the palm to the spectator, the other reveals the back of the hand. Therefore, the two hands present the same orientation features, as well as the shape of the fingers, but do not represent the same gesture. The left hand in 10a most probably represents an opened hand up, known as a sign of adoration and veneration of God (Dimova 2020), whereas the right hand in 10b portrays a hand on chest, generally thought as an act of auto-designation (Dimova 2020). Although we do not aim to reproduce the categorization already proposed in art history, the knowledge on symbolic hand gestures serves as a guide towards a proper clustering of hands according to the potential gesture they portray, whether symbolic or functional.

(a) Left hand found in cluster 2, (b) Right hand found in cluster 2, before flipping left hands
The fact that these two hands are clustered together is not wrong in a formal sense, and produces what we introduced as a geometric pattern. Yet, by horizontally flipping all the left hands, we obtained in Fig. 4a a much more balanced hand type distribution among the clusters, showing the use of specific hand gestures independently from the hand types. In this new context, we increase the chances of grouping together similar hand gestures, as seen with these left and right hands found in cluster 8 in Fig. 11, which are both holding the body of a child. Although the horizontal orientation does not seem to visually match anymore, the vertical orientation as well as the shape of the fingers is coherent and both hands show the same rotation, thus fostering the analysis of gestural patterns.

(a) Left hand found in cluster 8 after flipping left hands, (b) Right hand found in cluster 8 after flipping left hands.
These newly shaped clusters based on features from flipped left hands reveal a balanced distribution of hand types. Nevertheless, the assumption that no specific hand gesture belongs to a specific hand type has to be taken with caution. The benedictio hand gesture, for example, is known to be executed only with the right hand (Bernasconi, Cetinić and Impett 2023). In our original clustering, many of these benedictio gestures were actually found with hands up in Fig. 12, which shows the imperfection of these clusters and the potential need to increase the number of clusters for a finer gestural categorization.

(a) Benedictio hand gesture found in cluster 2, (b) a right hand up found in cluster 2, (c) a right hand up found in cluster 2.
The right number of clusters is a complex question and involves various parameters, between the proportion of images available to shape representative samples for each cluster, the degree of precision required for the analysis, and the readability of the results. The clusters have to be approved through visual inspection, which corresponds to choices based on personal judgment and knowledge of hand gestures, and further emphasizes on the inherently anachronistic and subjective nature of the work. Because we lack proper annotations of these hands from early modern humanists, the process implies a contemporary apprehension of these hands constrained by a contemporary culture and use of hand gestures.
3.3 Visualizing hand combinations
In this specific digital art history context, the format of a heatmap graph would quickly show its limits. Clusters are represented by numbers and lack visual information of their content, and the number of hand combination pairs is devoid of circumstances, such as the type of painting they belong to. Because it is indeed difficult to perform a proper visual analysis through statistical numbers only, the potential iconography for the paintings was added and an alternative model for the visualization of the data, an interactive network graph, was considered. The advantages afforded by the way we implemented the graphical interface are the possibility to select a network based on the type of iconography, thus understanding the types of hands most commonly used in specific contexts, as well as the use of the centroid images for each cluster in order to directly visualize the types of hands in question. Furthermore, the size of the nodes corresponds to the amount of data present in the cluster, which allows us to keep trace of the primary biases of the machine learning model for body pose estimation.
Yet, the possibility to outline specific trends through the network system remains difficult. The main reason is that what these clusters represent is a proposition of a potential hand gesture, not a specific sign. Hence, many hand gestures are found for all iconographies, either because they are most commonly found hand types, functional gestures, or because the cluster encompasses too many hand gestures that should be separated into smaller clusters.
After a first series of experiments and analytical attempts of the combination of hands, and in order to reduce the aforementioned problems, it was decided to differentiate well-known symbolic hands from undefined ones, also called functional gestures. Through this process, we are able to better perceive the diversity of hands that were not addressed in art history yet and to determine their potential associations with symbolic ones and their iconographic circumstances. With the comparison in Figs 7 and 13 of the most common hand combinations with and without the differentiation of symbolic hands, we can see the dominance of hands used for praying gestures, which were mostly found in cluster 9. This observation is a great indicator of the majority of religious content in the collection used, as well as the manifest use of this gesture for religious expression. These findings are further confirmed by the visual inspection of the network, where non-religious iconographies do not present joint palms praying but rather other symbolic gestures such as pointing indexes and opened hand forward, as shown in Fig. 9. Additionally, the analysis of the network shows that symbolic hands are most likely to be found in religious iconographies, whereas non-religious iconographies will present more functional hand gestures. The fact that most hands in non-religious contexts where not properly documented in art history so far could also potentially reveal a tendency to a greater art historical concern on religious art, partially explained by the important knowledge on symbolic gestures produced in medieval times (Schmitt 1990). It might also be that in the sacred context such gestures are less ambiguous and easier to interpret as many symbolic aspects have been preserved overtime through religious practice.

Overall, we must acknowledge the fact that these results are greatly influenced by the imbalanced distribution among the clusters, as seen in Fig. 4, also biased by the types of hands that were most commonly detected by the OpenPose model as described in section 3.1. Note that for the hand combinations, absolute values were taken into account and the results inevitably tend to showcase biggest clusters. Yet, relative values, because of the large amount of smaller clusters, were not showing significant differences.
4. Conclusion
We introduce in this paper a new visualization system for hand combinations in early modern paintings. The interactive network graph is highly sensitive to different critical steps, such as the definition of visual features for the images of hands and the right number of clusters, as well as the degree of depth in the iconographic classification system used. All these steps represent a trade-off between the readability and manual assessment of the results at each progression, as well as the granularity produced for the network graph.
Each of these steps also rely on the computational bias of the primary detection model used for the recognition of the hands on the paintings. Hence, from the very first phases, we are constrained by the amount of hands that the machine was able to detect, which raises the question of a way to engage with all the ones we cannot see. Indeed, if only two hands are detected on an image with five people, the final results illustrate only a small portion of what can actually be found on the painting. The use of pairs of hands instead of larger units for combinations is a way to alleviate the disparity produced by the machine learning model, but does not offer any sort of analytical perspective on the missing hands.
Another issue induced by the model, as well as the size of the original dataset, is the proportion of hands detected. In order to better outline trends and offer the possibility to researchers to make claims out of the results, more data would be needed. The possibility to compare the network with a synchronic, yet culturally divergent dataset, such as a collection of Flemish paintings, could also be an opportunity to further characterize the specificities of a collection in relation to its iconography and geography, as well as understanding the degree of impact of the computational bias on the resulted graph. Nevertheless, this iconography-centric methodology is based on western early modern representations and presupposes the use of a corpus encompassing this constraint.
Overall, the process of creation of a new visual framework and its analysis is limited by the accuracy of machine learning models for body pose estimation on paintings. There is a crucial need to enhance the accuracy of these models through fine-tuning methods and the production of large training datasets for artistic material.
The process we present also shows the complexity of finding proper representations of the results at each step. Portraying visual content through numbers only is very restrictive and such a great level of abstraction does not offer an appropriate analytical environment. As we have seen, textual description of the additional categories from Dimova seems to bring more clarity. Yet, the question of the textual description also represents a trade-off between considerations of length and precision of the text, and further addresses issues on the proper way to express and describe hand poses through common language. This issue was also raised by Dimova and is inherent to the study of hand gestures (Dimova 2020). Where some hand poses already benefit from naming conventions established over time in art history, which implies cultural appropriation, some others do not have strict significations and will depend on the context. We end up with gestures that are difficult to describe through their meaning, or that are also difficult to describe with words. There is therefore a disparity in naming conventions that introduces ambiguities and does not ease their study. A proper combination of images and written descriptions as it was initiated in the work of Dimova and implemented here should be further investigated to overcome this problem.
Therefore, the interactive network graph is a basis for future research tracks and demonstrates the importance of new modes of data visualization and narration for art historical material, especially with large data focusing on a pictorial detail.
More specifically on the notion of the pictorial detail and the art historical viewpoint, the process offers a very global perspective on hands. The automated detection system goes beyond what has already been seen and studied in art history. It introduces the possibility to focus on the detail of the hand at large scale, encompassing all types of hands, whether they are symbolic or natural gestures. Nevertheless, with this approach, we deviate from the singularity of the detail as theorized by the art historian Daniel Arasse (Arasse 1992; Longo 2022). Traditionally, art historical analysis of the detail starts from the painting—where the approach implies that the detail holds specific information that is not found elsewhere. The computer allows to start from the detail as well, but the latter is directly extracted from its milieu and put in comparison with other details, blurring the specificities of a particular hand in the mass of the corpus. Although the computational process represses the singularity of the detail, it seems to be the appropriate approach when considering the painted hand as a language. Languages are culturally bound communication technologies with norms of usage (Mufwene 2019). We therefore need to identify these norms and constitutive cultural factors—a process that involves a global comparative study, with a form of contextualization.
The attempt to contextualize these hands based on iconography puts the detail in relation to the overall meaning of the document in its historical context. Yet, this iconographic approach only uses a small portion of the painting as historical piece of evidence, and we rely on interpretive strategies from former art historians who labeled these artworks and shaped the Iconclass system. A more complex system involving other aspects in relation to the historical documents has to be envisioned, such as the time-period and origin of the painter.
Because of the great variety of perspectives offered by hands in early modern paintings, many enhancements for this interface can be imagined. A first extension to be considered is the possibility to display all the hands from a cluster of a specific iconography when hovering a node. The possibility to access the content of the clusters would not only help better understanding the specific gestures represented, but it would also bring more transparency regarding the primary detection bias. Another addition to the present work would be the access to the original artworks presenting specific hand combinations. The access to the original painting remains essential to better contextualize hands, as the reading of hand gestures and their meaning depends on multiple contextual factors that are partially missing in the present configuration. Such factors consist of the character performing the hand gestures, the position of the hand towards their body and other characters present in the composition. Ultimately, we believe that such computational approaches best work in combination with a more detailed analysis of the content of these clusters and their manual curation. Hence, the clusters proposed through the unsupervised method should undergo a refinement, with more precise categories of specific hand poses to ease the reading of the graph and to outline potential uncommon combinations from extensively used ones. It is through this back and forth between global or distant approaches of the computational methods and a closer manual analysis of these results that we can advance research on the understanding of hand gestures in art.
As we already started to demonstrate here, the production and use of such an interface, as well as the enhancement of the visualization system with a close reading of the results at each step, are essential aspects to deepen primary knowledge brought by traditional art historical methods. By providing a maximum of visual and contextual information to researchers in a clear manner, as well as including knowledge acquired along the process, our methodology has the potential to fully and positively change the study of western early modern hand gestures in art history and to allow a refinement of our comprehension of the core subject at stake, the painted hand.
Author contributions
Valentine Bernasconi (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Validation, Visualization) and Leonardo Impett (Conceptualization, Supervision)
Notes
Gestural chords, originally called ‘accords gestuels’ in French, is a descriptor used by Dimova in her book ‘Le language des mains dans l’art’ (Dimova 2020). It consists of a set of gestures usually used together in the same composition to support the narrative.
The visualization system has been made available on the online platform https://vbernasconi.github.io/.
The NLTK library was used to this end. https://www.nltk.org/api/nltk.tag.pos_tag.html.
Note that although Saints is numbered 11H and Female Saints 11HH, Female Saints is not a subcategory of the first. In the Iconclass system, both categories are at the same level and the Saints category only holds subcategories about male saints.
References
Appendix A. Hands projection

Map of painted hands using a T-SNE for dimensionality reduction based on the scaled keypoints features, with labels from K-Means algorithm represented by different colors. In order to make the graph more readable, only one third of the dataset is represented.