Abstract

This article investigates the authorship question surrounding William Shakespeare’s works using a novel approach called the ‘Deep Impostor’ methodology. The approach uses a set of known impostor texts to analyze the origin of a target text collection. Both the target texts and impostors are divided into an equal number of word segments. A deep neural network, either a Convolutional Neural Network (CNN) or a pre-trained BERT transformer, is then trained and fine-tuned to differentiate between impostor segments. Once assigned, each target text is transformed into a numerical signal by averaging its segment assignments. The Dynamic Time Warping distance is afterward evaluated between these signals to measure their similarity. The Isolation Forest algorithm identifies outliers within the target text collection for each impostor pair by assigning appropriate scores to each tested text. In the summarizing step, the tested creations are clustered into two groups. In the case of a CNN-based model, the first resulting cluster contains fifteen creations. General evaluations lead to the conclusion to suggest that these are not authored by Shakespeare. The remaining documents are classified as authentic Shakespearean works.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)
You do not currently have access to this article.