Ðóñ Eng During last 365 days Approved articles: 2057,   Articles in work: 298 Declined articles: 785 
Library

Batura T.V. Techniques of determining author’s text style and their software implementation

Published in journal "Software systems and computational methods", 2014-2 in rubric "Systems analysis , search, analysis and information filtering", pages 197-216.

Resume: the article presents a review of formal methods of text attribution. The problem of determining the authorship of texts is present in different field and is important for philologists, literary critics, historians, lawyers. In solving the problem of text attribution the main interest and the main complexity is in the analysis of syntactic, lexical/idiomatic and stylistic levels of text. In a sense, a narrower task is in the text sentiment-analysis (defining the tone of the text). Techniques for solving the task can be useful for identifying authorship of the text. Unfortunately, expert analysis of author’s style is complex and time consuming. It’s desirable to find new approaches, allowing at least partially automate experts’ work. Therefore the article pays special attention exactly to the formal methods of author’s identification and software implementation of such methods. Currently, algorithms of data compression, methods of mathematical statistics, probability theory, neural networks algorithms and cluster analysis algorithms are applied for text attribution. The article describes the most popular software systems for author’s style identification for Russian language. Author attempts to make a comparative analysis, identify features and drawbacks of the reviews approaches. Among the problems hindering researches in text attribution there are a problem of selecting linguostylistic parameters of the text and a problem of selecting sample texts. The author states that there is a need in further researches, aimed at finding new or improving existing methods of texts attribution, at finding new characteristics allowing to clearly separate author’s style, including cases of short texts and small number of sample texts.

Keywords: text attribution, defining authorship, formal text parameters, author’s style, text classification, machine learning, statistical analysis, computer linguistics, identification of author’s style, analysis of textual information

DOI: 10.7256/2305-6061.2014.2.11705

This article can be downloaded freely in PDF format for reading. Download article

Bibliography:
1. Romanov A.S. Metodika i programmnyy kompleks dlya identifikatsii avtora neizvestnogo teksta:
Avtoref. dis. kand. tekh. nauk. Tomsk, 2010. 26 s.
2. Marusenko M.A. Atributsiya anonimnykh i psevdonimnykh literaturnykh proizvedeniy metodami teorii
raspoznavaniya obrazov. L.: LGU, 1990. 164 s.
3. Rodionova E.S. Metody atributsii khudozhestvennykh tekstov // Strukturnaya i prikladnaya lingvistika:
Mezhvuzovskiy sbornik. SPb.: SPbGU, 2008. Vyp. 7. S. 118–127.
4. Markov A.A. Ob odnom primenenii statisticheskogo metoda // Izvestiya Imperatorskoy Akademii nauk.
Ser. 6. 1916. T. 10, ¹ 4. S. 239–242.
5. Fomenko V.P., Fomenko T.G. Avtorskiy invariant russkikh literaturnykh tekstov // Novaya khronologiya Gretsii:
Antichnost' v Crednevekov'e. M.: MGU, 1995. 422 s.
6. Khmelev D.V. Klassifikatsiya i razmetka tekstov s ispol'zovaniem metodov szhatiya dannykh // Vse o szhatii
dannykh, izobrazheniy i video. 2003. URL: http://compression.ru/download/articles/classif/intro.html
(data obrashcheniya: 17.04.2014)
7. Khmelev D.V. Raspoznavanie avtora teksta s ispol'zovaniem tsepey A.A. Markova // Vestnik MGU. Ser. 9:
Filologiya. 2000. ¹2. S. 115–126.
8. Kukushkina O.V., Polikarpov A.A, Khmelev D.V. Opredelenie avtorstva teksta s ispol'zovaniem bukvennoy
i grammaticheskoy informatsii // Problemy peredachi informatsii. M.: Nauka, 2001. T. 37. ¹ 2. S. 96–108.
9. Shevelev O.G. Razrabotka i issledovanie algoritmov sravneniya stiley tekstovykh proizvedeniy: Avtoref.
dis. kand. tekh. nauk. Tomsk, 2006. 18 s.
10. Timashev A.N. Atributor // Tekstologiya. ru. 1999–2007. URL: http://www.textology.ru/atr_resum.html
(data obrashcheniya: 17.04.2014)
11. Informatsionnaya sistema «Statisticheskie metody analiza literaturnogo teksta». 2004. URL: http://smalt.
karelia.ru (data obrashcheniya: 16.04.2014) .
12. Rogov A.A., Sidorov Yu.V., Korol' A.V. Avtomatizirovannaya sistema obrabotki i analiza literaturnykh
tekstov SMALT // Trudy i materialy II-go Mezhdunarodnogo kongressa issledovateley russkogo yazyka
«Russkiy yazyk: istoricheskie sud'by i sovremennost'». M: MGU, 2004. S. 485–486.
13. Antiplagiat. 2005–2014. URL: http://www.antiplagiat.ru (data obrashcheniya: 16.04.2014)
14. Shevelev O.G. Metody avtomaticheskoy klassifikatsii tekstov na estestvennom yazyke: Uchebnoe posobie.
Tomsk: TML-Press, 2007. 144 s.
15. Romanov A.S., Meshcheryakov R.V. Identifikatsiya avtora teksta s pomoshch'yu apparata opornykh vektorov / A.S.
Romanov, R.V. Meshcheryakov // Komp'yuternaya lingvistika i intellektual'nye tekhnologii: Po materialam
ezhegodnoy Mezhdunarodnoy konferentsii «Dialog 2009». M.: RGGU, 2009. Vyp. 8, ¹15. S. 432–437.
16. Pang B., Lee L. Opinion mining and sentiment analysis // Foundations and Trends in Information Retrieval.
Vol. 2, No 1-2. 2008. P. 1–135.
17. Pazel'skaya A.G., Solov'ev A.N. Metod opredeleniya emotsiy v tekstakh na russkom yazyke // Komp'yuternaya
lingvistika i intellektual'nye tekhnologii: cb. nauchnykh statey. M.: Izd-vo RGGU, 2011. Vyp. 10, ¹17.
S. 510–522.
18. Yi J., Nasukawa T., Bunescu R., Niblack W. Sentiment analyzer: extracting sentiments about a given topic using
natural language processing techniques // Proc. of the Third IEEE International Conference on Data Mining
(ICDM 2003), 2003. P. 427–434.
19. Ostin Dzh. Slovo kak deystvie // Novoe v zarubezhnoy lingvistike. M.: Progress, 1986. Vyp. 17. S. 22–130.
20. Onlayn entsiklopediya «Krugosvet». 1997–2014. URL: http://www.krugosvet.ru/enc/gumanitarnye_nauki/lingvistika/
RECHEVO_AKT.html (data obrashcheniya: 15.04.2014)
21. Serl' Dzh. Chto takoe rechevoy akt? // Novoe v zarubezhnoy lingvistike. M., 1986. Vyp. 17. S. 151–169.

Correct link to this article:
just copy this link to clipboard