-
PDF
- Split View
-
Views
-
Cite
Cite
Han Xu, Jinghang Gu, Kanglong Liu, Qinyi Li, Can professional interpreters truly convey the speaker’s sentiment? Exploring the potential of a computational approach, Digital Scholarship in the Humanities, 2025;, fqaf017, https://doi.org/10.1093/llc/fqaf017
- Share Icon Share
Abstract
This study investigates whether sentiment analysis, a natural language processing technique, can be used to examine accuracy in interpreting. The data were obtained from a parallel bidirectional corpus of original speeches delivered at the United Nations and their simultaneous renditions provided by professional interpreters. Specifically, this study explores how much sentiment can be conveyed across languages via accurate renditions, how interpreting direction affects the conveyance of sentiment, and how sentiment analysis may help with accuracy assessment. The results show that the sentiment orientation and distribution expressed in the source text can be largely projected into the target text via accurate renditions. This finding confirms the validity of using translational language to create cross-lingual sentiment analysis tools. It also reveals the potential of integrating sentiment analysis into automated interpreting quality assessment frameworks. In addition, this study shows that the amount of sentiment conveyed in each direction seems to vary, suggesting that directionality has an impact on the emotional tone being communicated by the interpreters.
1. Introduction
In cross-lingual communication, the use of interpreting services is essential to enable effective communication between parties who do not share a common language. Interpreters play a vital role in facilitating mutual understanding by conveying messages accurately. To this end, professional codes of ethics for interpreters worldwide emphasize the importance of accuracy as a critical indicator of interpreting quality (Hale 2007). For example, the International Association of Conference Interpreters (AIIC) states that ‘interpreters shall strive to translate the message to be interpreted faithfully and precisely’ (AIIC 2022: 3). The Australian Institute of Interpreters and Translators (AUSIT)’s Code of Ethics identifies accuracy as an ethical principle, which requires that ‘interpreters and translators use their best professional judgement in remaining faithful at all times to the meaning of texts and messages’ (AUSIT 2012: 5). Regarding the achievement of accuracy, AUSIT’s Code further explains that it means ‘optimal and complete message transfer into the target language preserving the content and intent of the source message or text without omission or distortion’ (AUSIT 2012: 5).
While it is critical for interpreters to fulfil their ethical obligation, the codes of professional ethics do not seem to recognize the difficulty involved in producing accurate renditions (Jacobsen 2003; Hale 2007). As a matter of fact, interpreters may not always achieve accuracy in real practice, given that it is a very challenging task (Gile 1995; Hale 2007; Seeber and Zelger 2007; Xu 2022, 2024). The notion of accuracy, as shown in existing studies, features great complexity, and its achievement is subject to the influence of a wide range of factors. These factors may include the interpreter’s own attributes, such as level of training, experience, understanding of the professional role, and professional competence (Cheung 2007, 2016; Liu and Hale 2018). Empirical evidence suggests that professionally trained interpreters tend to outperform untrained ad hoc interpreters or learners in producing accurate renditions (Liu and Hale 2018; Stachowiak-Szymczak and Korpal 2019; Xu 2021, 2022; Hale et al. 2022a). This is because, due to years of training and experience, professional interpreters tend to develop a better understanding of their ethical role and have more linguistic resources to initiate effective coordination to convey messages across languages successfully. In addition, the achievement of accuracy is also related to various external factors, such as directionality, working conditions, interpreting user’s expectations, institutional constraints, and the specific requirement of each interpreting setting (Hale 2007; Xu, Hale, and Stern 2020; Xu 2021). For instance, accuracy may generate varying connotations in different interpreting settings. In court interpreting, accuracy indicates that interpreters should convey not only what is said but also how it is said by the speaker, as both the propositional content and the pragmatic force of the message may reveal the speaker’s character and credibility in a legal context (Hale 2004; Liu and Hale 2018). Therefore, it is crucial for court interpreters to strictly follow the accuracy norm without making any unjustified addition, omission, or distortion. Such an interpreting approach includes maintaining discourse markers in the source speech, such as the speaker’s tones, hesitations, pauses, hedges, and false starts, in their renditions (Hale 2004; Liu and Hale 2018). In comparison, when working in a conference or business setting, interpreters are often expected to improve on the speaker’s speech style by omitting self-corrections or hesitations to make sure the interpreted speech is smooth and fluent (Hale 2007)
Notably, the assessment of accuracy is a crucial step in previous research as its results reflect the interpreter’s ability to achieve accuracy. In addition, the way in which accuracy is assessed also has important practical implications for the interpreting profession, informing activities such as training, certification, and recruitment processes (Han 2022). A common approach to assessing accuracy is error-based analysis, which involves identifying and categorizing various types of interpreting errors, including omissions, additions, and distortions. (Gile 1999, 2003, 2011; Napier 2004; Turner, Lai and Huang 2010). This approach is primarily reliant on human assessors’ subjective evaluations, which is often a time-consuming and labour-intensive process. Han (2022: 41) once postulated that an ideal assessor should have relevant experience in practising, learning, teaching, and assessing interpreting. Yet, such qualified assessors may not always be available. Multiple assessor-related factors, such as fatigue, time pressure, inter-assessor disparity, inconsistent attention span across different assessment tasks, and order of assessment, may impinge upon the assessment results (Liu 2013; Shlesinger 1994). This makes it necessary to explore methods to assess accuracy in a more systematic and objective manner to increase reliability and rigour, as well as to corroborate the results of existing human-based accuracy assessment approaches.
Against this research background, this study aims to explore the possibility of using sentiment analysis, a natural language processing technique, to examine accuracy in interpreting. Based on a parallel bidirectional corpus of original speech delivered at the United Nations (UN) and their simultaneous renditions provided by highly professional interpreters, this study aims to explore how much sentiment expressed in the source text can be conveyed by the interpreters, how interpreting direction affects the transposition of sentiment across languages, and how the results of sentiment analysis may help to assess accuracy in interpreting. Following this introduction, Section 2 reviews existing literature that investigates accuracy in interpreting. Section 3 introduces the concept of sentiment analysis. Section 4 describes the corpus design, compilation, and data analysis methods. Sections 5 and 6 present the results and discussion, respectively. Section 7 concludes the study by summarising the key findings and pointing out future research directions.
2. The challenge of achieving accuracy in interpreting
The concept of accuracy has been widely examined in interpreting studies, as its conceptualization largely determines the interpreter’s practice approach (Pöchhacker 2004/2022). A key concern within this line of inquiry is defining what constitutes accuracy in the context of interpreting. While many theoretical frameworks were proposed, most researchers concur that the lexicosemantic interpreting approach, which concentrates on finding source-target correspondence at the lexical or semantic level, is not sufficient (Gile 1992, 1995; Hale 2004, 2007; Seeber and Zelger 2007). Interpreters need to consider the pragmatic function of the source message and understand the ‘text as discourse rather than as words or sentences strung together’ (Hale 2007: 23). Gile (1992, 1995), for instance, argues that accurate rendition indicates that both the informational content and the style of the message should be conveyed. Seeber and Zelger (2007: 290) view accuracy as a ‘truthful rendition’, which means interpreters should convey three principal message components, that is, verbal, semantic, and intentional. However, there are situations where the three levels are not congruent. Interpreters need to assess the weight of each level and prioritize the information that should be conveyed to achieve accuracy. Inadequate evaluation of the weight of each component may lead to renditions that seem accurate at the semantic level but fail to convey the speaker’s communicative intention. Similarly, Hale (2004, 2007) made a distinction between semantic and pragmatic interpreting approaches. Semantic rendition only concerns producing a rendition that is accurate at the semantic level. The rendition may only appear ‘correct’ on the surface but fail to capture the original intention and illocutionary force of the source text. The pragmatic interpreting approach, on the other hand, not only maintains the propositional content of the source text but also creates the same communicative effect as the source message does.
The many proposed interpreting approaches are undeniably useful in guiding interpreters’ practice from a linguistic perspective. However, achieving accuracy in real practice is much more complicated. Simply considering the linguistic components of accuracy can hardly reflect its dynamic nature, given that factors that go beyond the linguistic sphere affect its achievement. Studies have shown that how interpreters perceive their role and how they understand the interpreting user’s intention affects their interpreting approach (Hale 2007; Hsieh 2007; Liu and Hale 2018; Xu 2021, 2024). Xu (2021), focusing on interpreted lawyer-client interviews in Australia, found that when interpreters assumed the role of a lawyer’s helper, they modified the client’s message based on their own understanding of the context. For instance, an interpreter was found to intentionally omit what he believed was ‘irrelevant’ information from the client in order to ‘help’ the lawyer ‘save time’. However, such a ‘mediated’ interpreting approach (Hale 2008), that is, interpreters decide on what should or should not be conveyed, was strongly spoken against by the lawyer as it obstructed the lawyer from having direct communication with the client. By contrast, Seeber and Zelger (2007) reported a case of several conference interpreters’ intentional omission of offensive remarks from the host towards a head of state. This time, the interpreters’ practice was justified because the speaker should not have the intention to insult the head of a state. Therefore, the interpreters’ unanimous omission helps to avoid a potential face-threatening act towards a guest of honour, which should be in line with the host’s intention in that context.
Existing studies have shown that accurate rendition should involve successful conveyance of the speaker’s intention across languages. Interpreters can obtain the intentional component of a message based on contextual manifestations of the speaker’s intention, such as their expressed sentiment, attitudes, and emotions. However, unless interpreters are directly informed by the speakers of their intention, interpreters’ understanding is always assumptive, which may not be correct all the time (Seeber and Zelger 2007). When there is a mismatch, it can be difficult for interpreters to achieve accuracy. Seen from this perspective, the subjective nature of the speaker’s intention adds yet another layer to the complexity of achieving accuracy, which also makes the assessment of accuracy difficult.
3. Sentiment analysis
The present study proposes to use sentiment analysis to examine accuracy in interpreting. Sentiment analysis sits at the intersection of natural language processing, machine learning and computational linguistics (Buscemi and Proverbio 2024). It is the process of using computational methods to examine subjective information, such as opinions, appraisals, feelings, evaluations, and attitudes expressed in texts, sentences or words (Serrano-Guerrero et al. 2015; Taboada 2016; Liu and Lei 2018; Mäntylä, Graziotin, and Kuutila 2018). Ever since its emergence in the early 2000s, sentiment analysis has been known by many names, such as opinion mining, opinion extraction, or subjectivity analysis (Nasukawa and Yi 2003; Liu 2022). The aim of sentiment analysis is to identify the semantic orientation of language in use by classifying its sentiment polarity, that is, whether its emotional disposition is positive, negative, or neutral. Largely due to its efficiency in determining subjective information in a systematic and automated way, sentiment analysis has been applied in different domains to address real-world problems. Its applications may include analysing social media commentary, film reviews, or consumer feedback to evaluate the attitudes of the public or to predict the results of certain social events, such as political elections (Medhat, Hassan, and Korashy 2014; Wankhade, Rao, and Kulkarni 2022).
Conventionally, sentiment analysis is conducted via two approaches: lexicon-based and supervised machine learning-based methods. The lexicon-based approach relies on a sentiment lexicon to determine the sentiment polarity of a given text. The lexicon is a list of words that have already been categorized in terms of sentiment polarity and relative strength. A set of linguistic rules are often embedded in the lexicon to increase the accuracy of the analysis. There are both domain-specific lexicons and general-purpose lexicons depending on the target of analysis. Domain-specific lexicon tools are designed specifically for texts in a particular field, while general-purpose tools can be applied across domains but may fail to recognize semantic features unique to certain domains or genres (Lei and Liu 2021; Mukhtar, Khan, and Chiragh 2018; Taboada et al. 2011). The supervised machine learning-based approach can be further divided into traditional machine learning methods and deep learning methods. The traditional machine learning methods mainly rely on classification techniques to identify the sentiment polarity of a given text. First, a classifier needs to be built using a set of training data, which contains texts annotated by humans with regard to their sentiment polarity. The classifier can then be used to analyse new unlabelled data, which is called test data, and obtain a sentiment score. Commonly used classification models may include decision trees, random forests, support vector machines (SVMs), and logistic regression. The machine learning-based approach is highly effective for domain-specific sentiment analysis. However, a classifier that is trained for one domain may not perform well when being used in other domains (Taboada 2016). Deep learning methods, on the other hand, utilize deep neural network architectures that can automatically learn feature representations from raw input data (Prabha and Srikanth 2019). Ain et al. (2017) introduced deep learning techniques like Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Deep Belief Networks (DBN) for tasks including sentiment classification, cross-lingual problems, and product review analysis. Deep learning models have the ability to learn complex, hierarchical representations of the input data, which can lead to superior performance compared to traditional machine learning methods, especially for large and complex datasets. However, they generally require more training data and computational resources (Zhang, Wang, and Liu 2018).
More recently, pre-trained models, a new paradigm in natural language processing, have gained rapid growth in the field of sentiment analysis (Mathew and Bindu 2020). A pre-trained model is a model that is trained on a massive amount of data containing labelled examples of sentiment polarity. BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), and GPT (Radford and NarasimHan 2018) models are all popular pre-trained models for sentiment analysis. Notably, many pre-trained models are built on Large Language Models (LLMs). LLMs are trained on vast datasets to analyse, understand, and produce human-like language, making them powerful tools in the field of natural language processing (Wei et al. 2022). Compared to the conventional machine learning-based approach, the LLM-based sentiment analysis tools show a higher level of contextual understanding and robustness across domains. It is capable of capturing intricate language patterns and semantic nuances for any new data with an increased level of accuracy, reliability, and generalizability. Yet, at the same time, pre-trainered models, including LLM-based sentiment analysis tools, may struggle with specialized terminology or domain-specific contexts that were underrepresented in their training data. Moreover, bias in the training dataset may also impact the results of sentiment predictions.
Given the successful utilization of sentiment analysis across fields, there is an emerging research interest in extending its application to the study of languages (Taboada 2016; Wen and Lei 2022). However, most sentiment analysis tools are only available for resource-rich languages, such as English. For low-resource languages, for which there is a scarcity of annotated corpora and sentiment lexicon, machine translation is often used to convert the available sentiment resources into target language to enable cross-lingual sentiment analysis (Vilares, Alonso, and Gómez-Rodríguez 2017: 596; Wan 2008; Xu et al. 2022). This approach is based on the presumption that when translation is accurate, sentiment should be conveyed across languages (Lu et al. 2011). However, the quality of machine translation varies for different language pairs, which may affect the performance of cross-lingual sentiment analysis tools (Al-Shabi et al. 2017).
At the same time, some researchers argued that due to cross-cultural differences, inherent structural disparities between languages, and the context where languages are used, parallel words, sentences or even texts might not share the same sentiment orientation even when they are accurately translated (Demirtas and Pechenizkiy 2013; Ghorbel and Jacot 2011). This is understandable, as sentiment is a culturally and linguistically sensitive conception. A positive expression in one language may be perceived negatively in another linguistic context due to cultural reasons (Lei and Liu 2021). Largely supporting this view, Chen and Zhu (2014) posited that it is more difficult to translate text with more sentiment. Demirtas and Pechenizkiy (2013) even argued that cultural differences generate a greater impact than inaccurate machine translation on the performance of sentiment analysis tools. However, there is a lack of empirical evidence to show how much sentiment can be conveyed via translation and how potential cross-lingual sentiment variation is related to accuracy.
Focusing on simultaneous interpreting, an oral form of translation, this study sets out to fill this research gap by exploring the extent to which sentiment can be conveyed across languages via accurate rendition and discussing its implications for interpreting accuracy assessment. The data were obtained from a corpus consisting of professional interpreters’ accurate renditions at UN Security Council meetings. Specifically, this study attempts to address the following research questions:
RQ1: How does the sentiment score of a speech vary when the speech is accurately interpreted into another language?
RQ2: If sentiment score variation is observed, how does interpreting direction affect the variation?
RQ3: How much sentiment can be conveyed across languages via accurate renditions?
4. The study
4.1 Data description
The data used in the present study were obtained from the UN Chinese-English Simultaneous Interpreting Corpus (UNSI) (Xu and Liu 2024). UNSI collects speeches given by Chinese and English speakers at the UN Security Council meetings and their simultaneous renditions. UN Security Council meetings are high-stakes international meetings where delegates of different countries and regions discuss issues related to global governance, crisis management, and international cooperation. As the delegates speak different languages, simultaneous interpreting is provided to enable real-time communication. Given that miscommunication in such a setting can generate serious consequences, highly competent professional interpreters are engaged to ensure the quality of interpreting services. These interpreters can access the scripts of the speech beforehand to better prepare themselves for the assignment (Cheung 2019). Due to UN interpreters’ professional competence and their sufficient pre-interpreting preparation, their performance featured a very high level of accuracy. Since accurate rendition includes a complete transfer of both the propositional content of the message and the speaker’s intention (Hale 2007; Seeber and Zelger 2007), it is expected that sentiment should be conveyed across languages.
The UNSI consists of four sub-corpora: original English speeches (NE) and their simultaneous renditions into Chinese (IC), as well as original Chinese speeches (NC) and their simultaneous renditions into English (IE). The English speeches were delivered by delegates from the United Kingdom, while the Chinese speeches were delivered by delegates from China. These speeches were sampled from onsite UN Security Council meetings that featured both British and Chinese delegates, specifically focusing on those that included significant discussions relevant to international and regional affairs. Consequently, the original Chinese and English speeches form pairs, with each pair delivered at the same meeting and addressing the same agenda. This pairing ensures that the four sub-corpora are comparable in terms of genre, topic, and speaker identity. As these speeches reflect the delegates’ attitudes, opinions, and perceptions regarding various international and regional affairs, the UNSI serves as a suitable corpus for sentiment analysis. A total of 207 pairs of original speeches and their simultaneous renditions were found from meetings held between 2021 and 2022. The transcripts of all the speeches in their original languages and renditions can be freely downloaded from the UN Digital Library1. The accuracy of the transcription was manually verified against the audio files of these meetings. The detailed information about the four sub-corpora is summarized in Table 1.
Sub-corpus . | Texts count . | Token no. . | Source . | Producer . |
---|---|---|---|---|
NE | 207 | 100,137 | Original English speech | Native English speakers |
IE | 207 | 143,498 | Interpreted English speech | Interpreters |
NC | 207 | 189,866 | Original Chinese speech | Native Chinese speakers |
IC | 207 | 161,992 | Interpreted Chinese | Interpreters |
Sub-corpus . | Texts count . | Token no. . | Source . | Producer . |
---|---|---|---|---|
NE | 207 | 100,137 | Original English speech | Native English speakers |
IE | 207 | 143,498 | Interpreted English speech | Interpreters |
NC | 207 | 189,866 | Original Chinese speech | Native Chinese speakers |
IC | 207 | 161,992 | Interpreted Chinese | Interpreters |
Sub-corpus . | Texts count . | Token no. . | Source . | Producer . |
---|---|---|---|---|
NE | 207 | 100,137 | Original English speech | Native English speakers |
IE | 207 | 143,498 | Interpreted English speech | Interpreters |
NC | 207 | 189,866 | Original Chinese speech | Native Chinese speakers |
IC | 207 | 161,992 | Interpreted Chinese | Interpreters |
Sub-corpus . | Texts count . | Token no. . | Source . | Producer . |
---|---|---|---|---|
NE | 207 | 100,137 | Original English speech | Native English speakers |
IE | 207 | 143,498 | Interpreted English speech | Interpreters |
NC | 207 | 189,866 | Original Chinese speech | Native Chinese speakers |
IC | 207 | 161,992 | Interpreted Chinese | Interpreters |
4.2 Sentiment analysis
The present study employed the leading pre-trained and LLM-based sentiment analysis tools: multilingual BERT (Devlin et al. 2019) and Llama2 (Touvron et al. 2023). Multilingual BERT is a pre-trained language model built on the transformer encoder architecture developed by Google. Llama2, standing for Large Language Model Meta AI, is a series of LLMs developed by Meta AI. Llama2 has been pre-trained on diverse datasets to enhance its generalization capabilities, making it suitable for tasks such as text classification. These two models were chosen for their strong adaptability to multilingual contexts, reliable performance across domains, and open-access availability (Xu et al. 2022; Zhao et al. 2024). To provide a fine-grained representation of the semantic polarity expressed in the speeches (Lei and Liu 2021: 14), sentiment analysis was conducted at the sentence level using these two models. The two sets of parallel sub-corpus (NE vs IC and NC vs IE) were aligned at the sentence level. This yields 4400 pairs of parallel sentences for NE and IC, as well as 5891 pairs of parallel sentences for NC and IE. Multilingual BERT and LLaMA2 provide a trisected sentiment polarity for each sentence, which is positive, neutral, and negative. A numerical value is assigned for each sentiment polarity to facilitate the later statistical analysis: positive is 1, neutral is 0, and negative is −1. The overall sentiment of a text was calculated by averaging the sentiment scores of the involved sentences.
To test the accuracy and reliability of the sentiment analysis results, 200 sentences were randomly selected from each sub-corpus for manual sentiment coding. As shown in Table 2, the accuracy performance rate of multilingual BERT was much higher than Llama2 against the human benchmark. It seems despite their wide generalisability across domains, multilingual BERT is more suitable for analysing the bilingual dataset in this study. In addition, multilingual BERT’s performance is reliable for both Chinese and English sub-corpora. Interestingly, a slightly higher accuracy performance rate was observed for original speech (NE and NC) compared to interpreted speech (IE and IC). This discrepancy may stem from the use of a pre-trained multilingual BERT model, as there were no parallel sentiment-annotated corpora for both original texts and their renditions. The lack of domain-specific fine-tuning likely accounts for the lower accuracy in interpreted texts, as the model may have difficulty capturing the linguistic variations, semantic shifts, and contextual adaptations inherent in translational language. Unlike original speeches, interpreted renditions may involve paraphrasing, condensation, and structural reformulation, which can introduce subtle sentiment modifications that the model is not explicitly trained to recognize. Despite this limitation, the significantly higher accuracy scores for both original and interpreted speeches indicate that multilingual BERT is a suitable model for this analysis. The results from multilingual BERT were exported to a spreadsheet for further statistical analysis to address the relevant research questions.
Sub-corpus . | Accuracy performance rate of multilingual BERT . | Accuracy performance rate of Llama2 . |
---|---|---|
NE | 0.80 | 0.32 |
IE | 0.67 | 0.25 |
NC | 0.71 | 0.23 |
IC | 0.66 | 0.37 |
Sub-corpus . | Accuracy performance rate of multilingual BERT . | Accuracy performance rate of Llama2 . |
---|---|---|
NE | 0.80 | 0.32 |
IE | 0.67 | 0.25 |
NC | 0.71 | 0.23 |
IC | 0.66 | 0.37 |
Sub-corpus . | Accuracy performance rate of multilingual BERT . | Accuracy performance rate of Llama2 . |
---|---|---|
NE | 0.80 | 0.32 |
IE | 0.67 | 0.25 |
NC | 0.71 | 0.23 |
IC | 0.66 | 0.37 |
Sub-corpus . | Accuracy performance rate of multilingual BERT . | Accuracy performance rate of Llama2 . |
---|---|---|
NE | 0.80 | 0.32 |
IE | 0.67 | 0.25 |
NC | 0.71 | 0.23 |
IC | 0.66 | 0.37 |
4.3 Data analysis
The average sentiment score of the texts in the four sub-corpora will be calculated and compared to each other to explore how the sentiment score of a speech varies after it is interpreted and how directionality may affect the variation of sentiment score, addressing the first and the second research questions. Pairwise comparisons will be employed to see whether the observed variation has statistical significance. To explore how much sentiment can be conveyed across languages via accurate renditions, linear regression analysis will be conducted to investigate how the sentiment polarity distribution in the source speech is related to that in the target speech, addressing the third research question. The findings of this study will be discussed in line with previous research to explore the potential of integrating sentiment analysis into the interpreting accuracy assessment process.
5. Results
5.1 Sentiment score variation across the four sub-corpora
The average sentiment score of the texts in the four sub-corpora was calculated. The result is presented in Fig. 1. The descriptive statistical information of the calculation is summarized in Table 3. The result shows that the average sentiment scores at the corpus level are above zero, indicating that speakers maintained a positive tone most of the time in their speeches. This overall positive tone was also maintained in the rendition. In addition, it was found that the average sentiment score of IC (M = 0.34, SD = 0.32) is close to that of NE (M = 0.36, SD = 0.24), with a very slight decrease. In the other direction, the average sentiment score of IE (M = 0.51, SD = 0.26) is notably higher than that of NC (M = 0.36, SD = 0.30). This result shows that when the same sentiment tool was used to assign sentiment scores for parallel sentences, the values varied between source and target text. This is understandable as the same tool may demonstrate rating differences across languages due to the use of different training routines, data labelling approaches and training data of varying quality (Buscemi and Proverbio 2024). Therefore, the increased or reduced sentiment score found in IC and IE does not necessarily mean the interpreted speech is more or less positive than the original speech.

Sub-corpus . | Mean . | Standard Dev. . | Median . | Maximum . | Minimum . |
---|---|---|---|---|---|
NE | 0.36 | 0.24 | 0.39 | 0.89 | −0.5 |
IE | 0.51 | 0.26 | 0.55 | 1.00 | −0.59 |
NC | 0.36 | 0.30 | 0.39 | 0.91 | −0.59 |
IC | 0.34 | 0.32 | 0.38 | 1.00 | −0.72 |
Sub-corpus . | Mean . | Standard Dev. . | Median . | Maximum . | Minimum . |
---|---|---|---|---|---|
NE | 0.36 | 0.24 | 0.39 | 0.89 | −0.5 |
IE | 0.51 | 0.26 | 0.55 | 1.00 | −0.59 |
NC | 0.36 | 0.30 | 0.39 | 0.91 | −0.59 |
IC | 0.34 | 0.32 | 0.38 | 1.00 | −0.72 |
Sub-corpus . | Mean . | Standard Dev. . | Median . | Maximum . | Minimum . |
---|---|---|---|---|---|
NE | 0.36 | 0.24 | 0.39 | 0.89 | −0.5 |
IE | 0.51 | 0.26 | 0.55 | 1.00 | −0.59 |
NC | 0.36 | 0.30 | 0.39 | 0.91 | −0.59 |
IC | 0.34 | 0.32 | 0.38 | 1.00 | −0.72 |
Sub-corpus . | Mean . | Standard Dev. . | Median . | Maximum . | Minimum . |
---|---|---|---|---|---|
NE | 0.36 | 0.24 | 0.39 | 0.89 | −0.5 |
IE | 0.51 | 0.26 | 0.55 | 1.00 | −0.59 |
NC | 0.36 | 0.30 | 0.39 | 0.91 | −0.59 |
IC | 0.34 | 0.32 | 0.38 | 1.00 | −0.72 |
What is interesting to note is that the amount of sentiment score variation differs in the two interpreting directions. There seems to be a greater sentiment gap between NC and IE compared to an almost negligible one between NE and IC. Pairwise comparisons between the average sentiment scores across the four sub-corpora reveal that the notable difference between NC and IE has statistical significance, while the small difference between NE and ID does not have statistical significance. This is shown in Table 4. This result suggests a sentiment variation disparity between the two directions at the corpus level. To test whether the disparity applies to each pair of source and target text, the difference between the sentiment scores of each pair of source and target texts was calculated and compared via similarity tests. This study first applied the standard Euclidean similarity test, considering the underlying sentiment data are numerical. To avoid potential biases caused by the relatively low dimensions, the Cosine similarity measure was also applied. As shown in Table 5, the two similarity tests confirm a greater sentiment gap between NC-IE, that is when interpreters interpret from Chinese into English. From a statistical point of view, this finding seems to suggest that interpreting direction has an impact on the conveyance of sentiment during interpreting.
Direction . | Euclidean . | Cosine . |
---|---|---|
NE-IC | 3.42 | 0.85 |
NC-IE | 3.58 | 0.91 |
Direction . | Euclidean . | Cosine . |
---|---|---|
NE-IC | 3.42 | 0.85 |
NC-IE | 3.58 | 0.91 |
Direction . | Euclidean . | Cosine . |
---|---|---|
NE-IC | 3.42 | 0.85 |
NC-IE | 3.58 | 0.91 |
Direction . | Euclidean . | Cosine . |
---|---|---|
NE-IC | 3.42 | 0.85 |
NC-IE | 3.58 | 0.91 |
5.2 Conveyance of sentiment across languages
To explore how much sentiment can be conveyed across languages via accurate renditions, this study compared the sentiment polarity distribution of the source text and that of the target text. Given accurate rendition includes the successful transfer of the speaker’s communicative intention, including their attitudes, emotions, and perceptions (Hale 2007; Seeber and Zelger 2007), it is expected that the same sentiment polarity distribution in the source text should be maintained in the target text. To this end, two separate linear regression analyses were conducted to examine the relationships between the average sentiment scores of the source and target texts. The results are shown in Table 6 and visualised in Figs. 2 and 3. Statistically significant correlations were found for both analyses. Specifically, the correlation coefficient (r value) in both analyses is above zero, showing that the average sentiment scores of the source texts are positively related to that of the target texts. This result indicates that the sentiment orientation can be successfully conveyed into the target text. In addition, the r values in both analyses are 0.64 and 0.78, respectively, showing a strong linear relationship between the two variables. This finding suggests that when it comes to accurate rendition, the sentiment orientation and distribution of the source speech can be largely projected into the target texts.
Direction . | r . | p value . |
---|---|---|
NE-IC | 0.64 | < 0.01 |
NC-IE | 0.78 | < 0.01 |
Direction . | r . | p value . |
---|---|---|
NE-IC | 0.64 | < 0.01 |
NC-IE | 0.78 | < 0.01 |
Direction . | r . | p value . |
---|---|---|
NE-IC | 0.64 | < 0.01 |
NC-IE | 0.78 | < 0.01 |
Direction . | r . | p value . |
---|---|---|
NE-IC | 0.64 | < 0.01 |
NC-IE | 0.78 | < 0.01 |


6. Discussion
Generating data from a parallel bidirectional corpus of the original speeches delivered at the UN and their simultaneous renditions provided by highly professional interpreters, this study mainly explored the extent to which sentiment can be conveyed by the interpreters via accurate renditions, how interpreting direction affects the transposition of sentiment across languages, and how the results of sentiment analysis may help to assess accuracy in interpreting.
6.1 Sentiment conveyance across languages
To start with, the present study shows that when it comes to accurate rendition, the sentiment orientation and distribution of the source text can transcend language barriers and be largely projected into the target language. This may indicate that sentiment can be systematically conveyed across languages via accurate translation, providing empirical evidence to the common approach that leverages translation to create cross-lingual sentiment analysis tools (Gopaldas 2014). This forms an interesting comparison to previous research, which shows that certain emotional dispositions may not be easily translated across languages due to cultural or contextual differences (Ghorbel and Jacot 2011; Demirtas and Pechenizkiy 2013). This different result may be because this study examined the speeches delivered at the UN and their simultaneous rendition. These speeches feature explicit expressions of sentiment so that the representatives’ attitudes towards various international affairs can be effectively communicated to an international audience. Therefore, this type of text makes it easier for interpreters to convey sentiment across languages than more nuanced and culturally embedded forms of communication. This result has implications for developing and applying cross-lingual sentiment analysis tools. When designing these tools, it is important to use a more varied corpus that covers different genres. For texts that require more contextual and cultural understanding, human annotation may be added to increase the cultural sensitivity of cross-lingual sentiment analysis (Buscemi, A. and Proverbio 2024). When using these tools, it is important to recognize that they may work better for certain text genres, like news articles or business reports, where the sentiment is more overt. However, they may struggle with implicit, contextual, or culturally specific communications, which means the analysis results require careful interpretation and may need to be compared against human benchmarks to ensure the reliability of results.
6.2 Sentiment conveyance and directionality
At the same time, this study shows that while sentiment orientation and distribution can be largely interpreted across languages, the amount of sentiment conveyed in each interpreting direction seems to vary. The impact of directionality on interpreter’s performance has been a widely investigated topic. The interpreting profession traditionally holds that interpreters should work into their first language (L1) rather than their second language (L2) (Seleskovitch 1978; Donovan 2004). This preference stems from the recognition that L2 language production demands greater cognitive efforts, leading to challenges such as reduced accuracy and fluency (Ortega 2014). Consequently, interpreters are often seen as having a natural advantage when interpreting into their L1, where they possess greater linguistic and cognitive resources. Over the years, research has largely confirmed the impact of directionality on interpreter’s performance. For instance, examining professional interpreters’ performance in English-Chinese simultaneous interpreting, Chang and Schallert (2007) found that interpreters adjusted strategies to cope with demands in different directions. When they need to render the message into their L2, for which they may have less linguistic proficiency, interpreters tend to adopt a meaning-based interpreting approach by using generalization, transformation and inferencing. In contrast, when they work into their L1, they rely on existing phrases and idioms to convey meanings rather than relying on generalizations. However, counter-evidence keeps emerging, revealing that the impact of directionality on interpreter’s performance may be related to interpreter’s qualification. Nicodemus and Emmorey (2015) found that professional interpreters’ renditions in both directions are equally good. In the present study, the results show that interpreting direction affects the emotional tone or cultural nuance being communicated by the interpreters. This finding is consistent with previous research, which shows that due to interpreters’ asymmetric command of the two working languages, they may present varying performance patterns in different directions (Sandrelli and Bendazzoli 2005; Chang and Schallert 2007; Dayter 2018). These findings underscore the importance of interpreters recognising the impact that direction can have on their performance. Interpreters should be more cognizant of how the intended emotional impact can be preserved in each direction. Given that different cultures may understand sentiment and emotion in distinct ways (Buscemi and Proverbio 2024: 4), this requires interpreters to carefully evaluate how sentiment is perceived by the target audience so that specific strategies can be developed to convey the emotional dispositions across languages.
6.3 Sentiment conveyance and accuracy assessment
In addition, this study reveals that sentiment analysis is effective in detecting the systematic conveyance of sentiment in accurate renditions. This aligns with the theoretical conception of accuracy, which delineates that interpreters should convey the intentional content of the message in addition to its semantic content (Hale 2007; Seeber and Zelger 2007). This finding has practical implications for research that explores automated approaches to assessing interpreting quality (Yu and van Heuven 2017; Ouyang, Lv, and Liang 2021; Lu and Han 2023). An important line in this research direction is to use linguistic or paralinguistic features that can be automatically extracted from interpreted speech to predict certain aspects of quality (Yu and van Heuven 2017; Ouyang, Lv, and Liang 2021). Considering that sentiment analysis is effective in measuring how much sentiment can be conveyed across languages, the sentiment score of a given speech may be used as an indicator to reflect its level of accuracy, which is a major measure of interpreting quality. Yet, it is worth pointing out that the sentiment score can hardly serve as a standalone indicator for accuracy. This is because sentiment can only reflect whether the semantic polarity is conveyed rather than the transfer of the remaining information contained in a message. Messages with comparable sentiment levels may still differ tremendously in their semantic meaning and substance. Therefore, it is essential for automated quality assessment models to include multiple indicators to accommodate the various dimensions of accuracy.
7. Conclusion
Sentiment analysis has been widely adopted across a variety of domains to address real-world problems, yet its application to study the use of language in multilingual contexts is only an emerging area of research. Adopting a corpus-based computational approach, this study explored the potential of using sentiment analysis to objectively evaluate the transfer of semantic polarity across language barriers. Based on a parallel bidirectional corpus consisting of speeches delivered at the UN and their simultaneous renditions, the study shows that despite interpreters’ varying performance in different directions, the sentiment orientation and distribution expressed in the source text can be largely projected into the target language via accurate renditions. This finding shows the effectiveness of sentiment analysis in measuring the transfer of the speaker’s communicative intention, an important component of accuracy. It highlights the promise of integrating sentiment analysis into interpreting accuracy assessment frameworks and advances the use of computational linguistic methods to assess quality automatically. In addition, the findings of this study hold significance for the field of digital humanities as it bridges the gap between natural language processing and the nuanced understanding of human sentiment in cross-linguistic communication. By employing sentiment analysis to examine the accuracy of professional interpreters’ renditions, this research contributes to the growing body of knowledge that intersects technology with humanistic inquiry. The findings underscore the importance of integrating computational methods into the analysis of translational language, thus facilitating a deeper understanding of how sentiment is conveyed across languages.
The present study has its limitations. It focuses on a single domain where speakers’ sentiments and attitudes are stated clearly. It would be interesting to test the effectiveness of sentiment analysis in assessing interpreted speech that contains irony, sarcasm, or other contextually ambiguous information. In addition, this study only examined one language pair, namely English and Chinese, using two sentiment analysis tools. Both Chinese and English are resource-rich languages for sentiment analysis (Xu et al. 2022). Further research is needed to explore the generalizability of these findings across diverse interpreting contexts, language combinations, and sentiment analysis tools. Moreover, due to the inherent limitations of sentiment analysis, such as bias introduced by the training data and potentially insufficient understanding of deep semantic and pragmatic meaning beyond superficial sentiment (Liu 2022), the results of sentiment analysis should always be interpreted with caution. Therefore, ongoing research efforts are required to explore the predictive power of accuracy as compared to human-assigned accuracy scores.
Author contributions
Han Xu (Conceptualization, Data curation, Funding acquisition, Methodology, Supervision), Jinghang Gu (Formal analysis, Methodology, Software, Validation), Kanglong Liu (Conceptualization, Investigation, Methodology), Qinyi Li (Conceptualization, Formal analysis, Investigation, Methodology, Visualization)
Notes
UN Digital Library can be accessed via https://digitallibrary.un.org/?ln=en.
Conflict of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Data availability
The data underlying this article will be shared on reasonable request to the corresponding author.
Funding
This study was funded by The Hong Kong Polytechnic University (Projects No. P0043847, P0051009).