Artificial Intelligence and Data Science
Reference:
Debenova , Z.A., TSipilova, S.S., Tsyrenova, N.D. (2025). Monuments in Mongolian Writing: An Experience of Creating a Parallel Corpus. Historical informatics, 2, 1–10. https://doi.org/10.7256/2585-7797.2025.2.73930
Abstract:
This article highlights the results of the work on creating a parallel corpus of Buryat sources in Mongolian script. The project is being carried out with the support of the Russian Science Foundation, based on the archival materials from the Center for Eastern Manuscripts and Xylographs of the IMBT SB RAS. The subject of the research is the process of creating a database for the corpus, the specifics of compiling it, particularly the selection of materials. Currently, the developing corpus includes the following documents from the archival funds of the CVRK IMBT SB RAS: texts of historical content—"A Brief Outline of the History of Khori-Mongolian Buryats," "On the History of the Zugalai Region"; an official document "Protocol of the All-Buryat Assembly in Chita in 1917"; an ethnographic composition "Narrative of Samdan Noyon," a medical work "Notes of Tibetan Doctor Donduba Munkuyev"; a work of Buddhist didactic literature "Subhashita" translated by Galsan-Jimba Tuguldur. General scientific and source study methods were applied to the analysis of handwritten, printed, and xylographic texts in Mongolian script. The processes of material selection, their transliteration and translation, as well as substantive (thematic, lexical) and technical aspects (typos, pagination, numerals) were examined. The parallel Russian-language version is being created by the research group. The authors emphasize the significance of creating a parallel corpus as a resource for further research in the field of Buryat linguistics, translation studies, and cultural studies, as well as its role in promoting Old Mongolian script among the general public and preserving the intangible heritage of the Baikal region. The corpus represents a unique database for further research in various fields of science, etc. The texts considered will serve as a basis for the development of machine translation algorithms, and the work being conducted at this stage will help future developers create more effective algorithms. The creation of a specialized database that is open not only to researchers but also to representatives of the educational sector, professional translators, and anyone showing a scientific or cultural interest in written heritage appears promising.
Keywords:
machine translation, intangible heritage, Baikal region, Center of Oriental Manuscripts and Xylographs, Buryatia, written sources, parallel corpus, Mongolian script, digitization, text corpus
Computerized analysis of historical texts
Reference:
Buranok, S.O. (2025). Palestine in the US Press 1918: a computerized analysis of historical texts. Historical informatics, 2, 11–22. https://doi.org/10.7256/2585-7797.2025.2.72395
Abstract:
The subject of the article is the study of the American periodical press on Palestine in 1918 using databases and computer programs to analyze statistical indicators of texts. This makes it possible to solve several problems. The first task: to find out the process of changing interest in the Middle East geographically (in every American state). The second task is to analyze the evolution of interest in Palestine in American newspapers in a historical and chronological aspect. The third task is to study the statistical indicators of the texts of the 10 most relevant articles on Palestine in 1918. The object of research: American information discourse. The analysis of statistical indicators of the US periodical press on Palestine allows us to more accurately determine the content and place of the Middle East problem in the US information discourse. A quantitative calculation of mentions of Palestine in newspapers was made using the Chronicling America portal created by the Library of Congress. Online newspaper repositories of each state and individual newspapers were used to verify the information received. Statistical analysis of the text was carried out using the "Leximancer" program. The main contribution of the author is that as a result of the study it was found: the number of mentions of Palestine in 1918 shows that democratic newspapers in the United States were leaders in covering the Middle East topic; three concepts ("Palestine", "Britain", "Jews") were key in 1918 The Middle East conflict was not yet considered in America as a conflict between Arabs and Jews, but it was no longer considered as a confrontation between two empires: the British and the Ottoman. The periodization of the evolution of American interest in Palestine in 1918 was determined: 1) January – February; 2) June – August; 3) November – December. At each of the indicated stages, there is an increase in the number of mentions of the "Plate" and keywords in American newspaper publications.
Keywords:
Great Britain, statistics, USA, imperialism, press, Middle East, information discourse, colonialism, Palestine, zionism