Reference:
Zenkov A.V..
The numbers reveal the author: a stylometric comparison of German-language modernist texts
// Philology: scientific researches.
2024. № 11.
P. 50-62.
DOI: 10.7256/2454-0749.2024.11.72167 EDN: PDWIOX URL: https://en.nbpublish.com/library_read_article.php?id=72167
Abstract:
The present study pertains to stylometry (and, more broadly, to quantitative linguistics). The novel quantitative method of studying the author's style of literary texts, based on the analysis of statistics of numerals found in them, is applied to literary texts in German. A computer program has been developed to search in the text for cardinal and ordinal numerals expressed both in numbers and verbally (in different word forms). The program automatically removes phraseological units and stable combinations from the text that accidentally (without the author's intention) contain numerals. Previously, the text is manually cleared of auxiliary numerals such as pagination, chapter numbers, etc. It is shown that the numerals used by the author in the (artistic) text are individual for each author; their totality is a characteristic feature (author's invariant, "fingerprint") that distinguishes the texts written by different authors. A comparative stylometric analysis of a number of literary works by Thomas Mann, Hermann Broch, Robert Musil, and Elias Canetti – the representatives of German-language literary modernism of the 20th century – is performed. Substantial authorial differences in the manner of using numerals were discovered. The results of the analysis were subjected to hierarchical clustering process (the Manhattan metric; Complete linkage and Between-groups methods). The cluster analysis correctly distributed the texts according to their authorship. The use of various clustering methods for text analysis enhances the significance of the results obtained and confirms their non-random nature. This demonstrates that the novel method of stylometry is able to accurately attribute literary texts to their correct authors.
Keywords:
E. Canetti, H. Broch, R. Musil, T. Mann, numerals in texts, authorship of texts, attribution of texts, quantitative linguistics, stylometric, stylometry
Reference:
Severina E.M., Fyodorov N.A..
The Chekhov Digital project: semantic markup of a parallel corpus of translations of Chekhov's fiction into German
// Philology: scientific researches.
2024. № 4.
P. 73-82.
DOI: 10.7256/2454-0749.2024.4.70560 EDN: PXMQSB URL: https://en.nbpublish.com/library_read_article.php?id=70560
Abstract:
The article discusses the issues of developing the principles of a semantically marked parallel corpus of translations of Chekhov's fiction into German within the framework of the Chekhov Digital project, a digital academic publication of the writer's collected works in TEI (Text Encoding Initiative) format. The parallel corpus project is focused on creating a digital infrastructure for studying the writer's works, allowing researchers to analyze and compare original texts with their translations. Difficulties were identified related to the interpretation of significant elements of the writer's works, the specifics of their translation into German and the semantic markup of translations of fiction, for example, difficulties arose with defining the boundaries and relationships between the elements of semantic markup. Ways to overcome them are proposed, including the use of digital methods and natural language processing technologies. The project uses digital methods and technologies of natural language processing, the standard of digital publication Text Encoding Initiative (TEI). The text markup structure based on the TEI standard makes documents machine-readable, which allows to develop tools for complex semantic information retrieval. The inclusion in the Chekhov Digital project of parallel corpora of translations of A. P. Chekhov's works into different languages makes it possible to expand research tools in the field of translation studies, making it possible to compare texts of translations and originals, detect similarities and differences in vocabulary, grammar, style and cultural references, as well as automate routine research processes, which makes search and analysis much more effective information on large volumes of texts. The results of the project will contribute to the development of the digital humanitarian environment, contributing to the preservation and popularization of the literary heritage of A.P. Chekhov. The creation of a semantically marked parallel corpus of translations will be important for literary critics, linguists and translators, allowing them to study the specifics of translations of Chekhov's works and develop new forms of text analysis and interpretation. The experience gained during the project will be valuable for future research and practical applications, demonstrating the effectiveness of digital technologies in humanitarian research and education.
Keywords:
Natural Language Processing, Digital Technologies, Semantic Search, Machine-readable Markup, Text Encoding Initiative, Parallel Corpora, Chekhov, Digital Edition, Chekhov Digital project, Parsing