Reference:
Rudometkin V.À..
Monitoring and troubleshooting in distributed high-load systems
// Cybernetics and programming. – 2020. – ¹ 2.
– P. 1-6.
DOI: 10.25136/2644-5522.2020.2.32996.
DOI: 10.25136/2644-5522.2020.2.32996
Read the article
Abstract: The subject of the research is the problem of monitoring and troubleshooting in distributed high-load systems. The most common mistakes in design and development, methods of their forecasting and solutions are described. In this article, the author describes the most popular tools that are currently used in the development of high-load systems and the main mistakes when working with them from a developer's point of view.This article describes a set of tools, the implementation of which can significantly reduce the time spent searching for vulnerabilities, describes the difficulties in choosing a set of metrics technologies - ELK / EFK, describes their advantages and disadvantages. The analogs of the tools used are analyzed in detail. The main conclusions in the work are:- the need to develop the infrastructure for monitoring the system from the beginning of the project development, due to which it is possible to correct the high complexity of the project at the stage of its development.- it is necessary to use the most popular tools for which there is a large amount of information in open sources, for example, on the Internet. This approach will reduce the time spent on fixing errors that can be caused by a specific set of tools.- the company needs not to save on highly qualified personnel, which in the future will save a lot of time on fixing problems, reduce the time for developing new functionality and allow spending a minimum of time to support and test the already developed functionality.- when analyzing problems, it is worth paying attention to public resources in which other companies, most likely, have already solved similar problems. For example, the Facebook company has been dealing with the monitoring problem for a long time and has developed a large number of tools to solve this problem. They also collect a large number of system records for analyzing the behavior of the system under any circumstances.
Keywords: quality control, testing, black box, white box, EKF, ELK, metrics, hightload system, monitoring, architecture
References:
Elektronnyy resurs, Sistemnoe administrirovanie, rezhim dostupa https://serveradmin.ru/ustanovka-i-nastroyka-elasticsearch-logstash-kibana-elk-stack/
Sil'nov D.S. Aktual'nost' sovremennykh sistem udalennogo monitoringa vychislitel'nykh resursov [TEKST]/Sil'nov D.S. - Sankt-Peterburg, izvestiya RGPU im. a.i. Gertsena, 2020, 55-59
Elektronnyy resurs, ofitsial'nyy sayt Kibana, rezhim dostupa https://www.elastic.co/kibana
Elektronnyy resurs, ofitsial'nyy sayt Zabbix, rezhim dostupa https://www.zabbix.com/ru/
Elektronnyy resurs, zhurnal pol'zovatel'skikh publikatsiy, stat'ya monitoringa v Google https://habr.com/ru/post/484246/
Elektronnyy resurs, ofitsial'nyy sayt github - razrabotki Facebook, rezhim dostupa https://github.com/facebook
Elektronnyy resurs, ofitsial'nyy sayt MTS POISK, rezhim dostupa https://poisk.mts.ru/
Petrov V.V. Sbor, Analiz i filtratsiya bol'shikh dannykh s pomoshch'yu steka ELK [TEKST]/ Petrov V.V. - Colloquium-journal, 2019 - Ukraina, Golaya Pristan', Colloquium-journal, 2020
https://cyberlenin