'Technique of normalization of the alphabet of search for quality improvement of entity identification based on data frequency characteristics' Software systems and computational methods nbpublish.com
Eng Cn Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Mal'shakov G.V., Mal'shakov V.D. Technique of normalization of the alphabet of search for quality improvement of entity identification based on data frequency characteristics

Published in journal "Software systems and computational methods", 2015-4 in rubric "Software for innovative information technologies", pages 407-413.

Resume: Using frequency distributions of data as identifier it is possible to find data of one system in other systems intended for interaction and coordinate their work. In this case entity identification of a subject domain is done using the alphabet of search. An alphabet of search is a set of lexemes with frequencies of their use in the data, stored as records of a relational database. Object of the research is a technique of normalization of the alphabet of search for improvement of quality of entity identification in a subject domain using frequency characteristics of their data. The technique requires deleting lexemes of the alphabet found in other lexemes of the alphabet with similar frequency of repetition in entity. The methods of the research include the system analysis, the theory of the information, the theory of algorithms, algebra of logic, the theory of sets, the comparative analysis, methods of the intellectual analysis of data and methods of development of the software and databases. The authors prove experimentally (on an example 178 entity), that the given technique allows to reduce the volume of the alphabet of search in 5 times on average, that considerably increases speed of identification entity under frequency characteristics of their data. By reducing the quantity of shorter lexemes the technique of normalization allows to reduce an error of recognition on average by 0.02036 per identification as shown by experiments.

Keywords: correlation, frequency analysis of data, entity, search, the alphabet, normalization, database, software, identification, method

DOI: 10.7256/2305-6061.2015.4.17813

This article can be downloaded freely in PDF format for reading. Download article

Bibliography:
Mal'shakov G.V. Metodika povysheniya interoperabel'nosti prikladnogo programmnogo obespecheniya na osnove chastotnogo analiza dannykh // Elektrotekhnicheskie kompleksy i sistemy upravleniya.-2015.- 3.-S. 67-70.
Mal'shakov G.V. Issledovanie oshibok identifikatsii sushchnostey prikladnogo programmnogo obespecheniya, vypolnyaemoy na osnove chastotnogo analiza dannykh // Naukoemkie tekhnologii.-2015.- 10.-S. 24-28
GOST R 55062-2012 Informatsionnye tekhnologii. Sistemy promyshlennoy avtomatizatsii i ikh integratsiya. Interoperabel'nost'. Osnovnye polozheniya
Bashmakov A.I., Bashmakov I.A. Intellektual'nye informatsionnye tekhnologii: Ucheb. Posobie. M.: Izd-vo MGTU im. N.E. Baumana, 2005. 304 s.
Khomonenko A. D., Tsygankov V. M., Mal'tsev M. G. Bazy dannykh: Uchebnik dlya vysshikh uchebnykh zavedeniy / Pod red. prof. A. D. Khomonenko.-6-e izd., dop.-SPb.: KORONA-Vek, 2009.-736 s.
Sistemy upravleniya bazami dannykh i znaniy: Sprav. izd. / A.N.Naumov, A.M.Vendrov, V.K.Ivanov i dr.; Pod. red. A.N.Naumova. M.: Finansy i statistika, 1991. 352 c.: il.

Correct link to this article:
just copy this link to clipboard