Рус Eng Cn Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Screenshot testing as a multi-aspect type of automated dynamic verification for web applications

Makarov Konstantin Sergeevich

PhD in Technical Science

Head of the Department; Department of Software and Information Systems Administration; Kursk State University

33 Radishcheva str., Kursk, Russia, 305000

makarov_ks@kursksu.ru
Fatkin Ruslan Igorevich

ORCID: 0009-0006-3481-6122

Postgraduate student; Department of Software and Information Systems Administration; Kursk State University

305014, Russia, Kursk region, Ryabinovaya str., 26B, sq. 96

ruslan4631@yandex.ru

DOI:

10.7256/2454-0714.2025.1.73535

EDN:

UVGEBC

Received:

02-03-2025


Published:

03-04-2025


Abstract: The subject of this study is multi-aspect screenshot testing as a modern method of automated dynamic verification of web applications, combining functional testing and user interface (UI) validation. Contemporary testing methods face challenges such as high labor intensity, false positives, and low scalability, especially in complex projects. The main objective of the research is to develop and implement a method that improves defect detection accuracy, reduces testing time, and lowers test case development costs. The study explores image comparison algorithms, dynamic element filtering techniques, and automated UI analysis approaches to enhance efficiency and standardization in the web application verification process. Unlike functional and UI testing conducted separately, the proposed method enables simultaneous analysis of multiple aspects of the interface and functionality, minimizing labor costs and increasing testing reliability. The approach employs automated comparison of reference and test screenshots at the pixel, structural element, and content levels using Python, Selenium, PIL, and Pytest-xdist for parallel test execution, effectively addressing the challenges of web application verification. Some researchers in the field of testing agree that the testing process lacks standardization and clear evaluation criteria. The proposed method ensures the achievement of verification objectives even under evolving strategies and approaches to system performance assessment by creating a flexible and precise validation system that integrates various testing types into a unified structure, making it suitable for modern software development challenges. The experimental section demonstrates the advantages of multi-aspect screenshot testing over other methods, including reduced testing time, improved defect detection accuracy, and enhanced analysis of test reports. This approach can be adapted to various testing scenarios and is particularly beneficial for high-load projects requiring regular regression testing.


Keywords:

screenshot testing, multi-aspect screenshot testing, automated testing, dynamic verification web applications, Python in testing, Selenium and PIL, UI testing, testing optimization, testing criteria, verification tasks

This article is automatically translated. You can find original text of the article here.

Introduction

In the process of developing any software, an important step is testing the solutions being created. Modern applications, especially web and mobile, require regular updates and improvements in the face of rapidly changing requirements from both the market and the introduction of new technologies. To ensure a high level of quality and maintain stable operation, it is important to regularly analyze the state of the system, as even minor errors can lead to loss of stability, deterioration of user experience, etc., and as a result to economic costs. Thus, annual economic losses due to low-quality software in the United States alone are estimated at tens of billions of dollars [1, p. 14], which underlines the urgency of developing reliable software testing methods.

Web applications have become an integral part of life for billions of people around the world. And most of these solutions have a commercial focus. For a product whose ultimate goal is to make a profit, the economic losses associated with low-quality software can become one of the main sources of problems, since any errors that lead to a deterioration in the user experience lead to losses in the number of real users. It is product testing that is designed to solve a number of such problems.

Screenshot testing, focused on automatic interface comparison, is actively used in dynamic software verification. However, traditional approaches such as functional or separate UI testing have limitations: high labor costs, excessive number of false positives, and poor performance when scaling tasks. In the case of highly loaded and complex web applications, tracking all these processes manually is a rather time-consuming task, which is fraught with errors on the part of the tester.

Gurin R. E., Rudakov I. V., Rebrikov A.V. consider existing approaches to software verification and analyze their limitations and effectiveness[2]. The problems highlighted in the article point to the need for new solutions that can overcome the limitations of traditional approaches.

Also, studies [3-5] note that software verification has a number of limitations related to:

  1. limited testing time and allocated budget;
  2. the human factor (the subject conducting the test directly affects its result);
  3. changes in software requirements are already in the development process or at the product support stage;
  4. The formulation of the logic of verification tasks directly affects the results obtained, if the task is formulated incompletely or incorrectly, the result can be invalidated.;
  5. the lack of modern metrics for evaluating the effectiveness of testing.

The problems highlighted in the research indicate the need for new solutions that can overcome the limitations of traditional approaches. Multidimensional screenshot testing offers a combined approach combining functional and UI testing, which can improve the accuracy and performance of software verification.

Kudryavtseva E. Yu., also notes that the use of automated testing in large projects is the most reasonable and relevant[6]. Thus, the proposed methodology meets the challenges and contributes to the further development of the field of software verification.

It is also worth noting that at the moment in the Russian scientific literature, issues related to software verification, in particular, of publicly available use, are rarely considered or are limited to review articles of existing methods. Because of this, the field of software testing is not developing so rapidly, unlike development. However, the rapid pace of development of the software development field requires verification methods to adapt just as quickly to meet current challenges. Thus, the purpose of this work was to develop a method for multidimensional screenshot testing that combines functional and UI approaches to improve the accuracy of defect detection and optimize resources in the web application verification process. To achieve this goal, it is proposed to use the Python programming language with Selenium and PIL libraries, which provides automation of tests and reduces labor costs.

1. Research methods

The research methodology is based on the development and application of multidimensional screenshot testing, which is a combination of functional and UI testing, which makes it possible to effectively solve the tasks of verifying web applications. This method allows you to simultaneously test several aspects of the software in a single run of the test scenario. This approach is designed to reduce the number of false positives and increase the accuracy of testing.

The main idea of the method is to conduct a parallel analysis of the functional and visual characteristics of a web application by automatically comparing reference and test screenshots. This is achieved by:

  1. Creating reference versions of screenshots that capture the correct state of visual and functional elements.
  2. Development of test scenarios aimed at automated comparison of the current state of the application with the reference on several levels:
  • Pixel level: Checking the differences between the reference and test screenshots at the level of individual pixels.
  • The level of structural elements: analysis of the correctness of the display of interface elements (dimensions, location, margins, fonts).
  • Content level: verification of the correspondence of text and graphic information to the standard.

The methodology is focused on minimizing false positive results that may occur with dynamic interface changes (for example, pop-up notifications or animations). For this purpose, filters and masking of dynamic elements have been implemented, which makes it possible to exclude their influence on the final test results.

The experimental part of the methodology is aimed at conducting a comparative analysis of multidimensional screenshot testing with traditional methods such as functional and UI testing. For this purpose, key evaluation metrics have been identified.:

  • The number of identified defects;
  • Accuracy of testing (taking into account false positives and missed errors);
  • Test execution time;
  • Labor costs for the development and support of tests;
  • Reliability and stability of tests.

The methodology was tested on a web application developed for experimental purposes and covered aspects such as:

  • Checking the display of forms (registration, authorization);
  • Checking the operation of interface elements (buttons, input fields);
  • Adaptive interface display in various configurations (dark/light themes, screen resolution).

The methodology used provides a deeper analysis of the interaction of interface elements, which makes it possible to identify defects that could have been missed when using other types of dynamic verification.

For example, multidimensional screenshot testing is aimed at configurable verification of several test situations for a given verification task in one run of automated test scenarios. Each aspect focuses on verifying certain functional or visual elements of the interface and, in the process of passing the test, all violations for the tested test situations must be detected. Test situations are those situations in which testing is performed, and the procedures describing the process of creating these situations and the checks that need to be performed on the results obtained are called tests [7, P. 68]. As part of software verification through multidimensional screenshot testing, the functional and UI characteristics of the object are considered.

The object of research in this paper is a web application that requires regular verification to ensure stable operation and compliance with user requirements. Thus, an online store developed on the WordPress platform was chosen, with an emphasis on the "My Account" page. The application is a typical representative of commercial web resources that are widely used to provide online services. It contains standard user interface elements such as data entry forms, buttons, checkboxes, drop-down lists, and visual elements (icons, color schemes). The authorization and registration page is critically important for any online store, as it provides access to the functionality of the application and affects the user experience. The test environment used was a simulation of various real-world operating conditions: different screen resolutions, light and dark interface themes, as well as various browsers and devices (desktop and mobile). The use of multidimensional screenshot testing for such an object makes it possible to effectively demonstrate the advantages of the method, including identifying complex errors that could go unnoticed when using only functional or UI testing.

Thus, the chosen research object is a universal example that allows us to evaluate the effectiveness and applicability of the proposed method for a wider range of web applications.

To implement the proposed methodology of multidimensional screenshot testing, a set of tools and software libraries were used to automate the processes of verification, image processing and analysis of results. To simplify the implementation of a multidimensional approach, the proposed method has been integrated with the Python high-level programming language. Due to its flexibility and extensive library of tools for working with images and user interfaces, the language provides all the necessary tools for implementing a multidimensional approach[8]. Thus, the following were used:

  1. Python libraries such as Selenium for interacting with web interfaces;
  2. PIL for capturing and processing screenshots, including comparing reference images with current versions;
  3. Pytest is an xdist for parallel processing of test scenarios, allowing you to run tests on multiple device configurations or browsers simultaneously.
  4. JavaScript for masking dynamic interface elements (such as animations or pop-ups) to avoid false positives when comparing images.

The listed technologies integrate well into a single system, support customization for various devices, browsers and interface themes, and also allow you to run tests in parallel, which reduces the time for verification. This made it possible to automate the screenshot testing process, which simultaneously checks several aspects of the software.

It is also worth noting that multidimensional screenshot testing increases the chances of identifying potential defects that could go unnoticed when using other types of dynamic verification, for example, functional testing, where the ability to interact with the system is checked. Different aspects may behave differently depending on their relationship to other system elements or display conditions. For example, when interacting or hovering over a button, its color should change, but functional testing is designed to verify only that the button can be interacted with. It is the simultaneous verification that makes it possible to identify and analyze complex relationships and interactions between different aspects of the software, errors in which could go unnoticed during the step-by-step testing of individual aspects. For example, with multidimensional screenshot testing of web applications, you can simultaneously check the location of buttons, the occupancy of inputs, color schemes, and the state of various interface elements, which makes testing more complete and less error-prone due to carelessness or cross-platform compatibility.

2. Description of the proposed method

The verification task was formulated in such a way that all functional and non-functional tests were combined into one stream, which was used to verify the largest number of software aspects in the shortest time to complete test cases. So, multidimensional screenshot testing will make it possible to have hundreds of decomposed test cases in one scenario, which allows you to achieve good performance results and economic benefits.

Each test case has its own verification task, which determines the correct behavior. So, by comparing screenshots of the reference version of the software with the current version, the system analyzes the interface for matches/discrepancies. If a discrepancy is found, the test case returns false and shows the error label at the point where the screenshot differs from the expected behavior of the software. All this is fixed using the error-trace output algorithm.

Trace errors (or trace errors) are a sequence of steps or operations performed by a system or program that led to an error. They help to localize failures, which simplifies the process of diagnosis and error correction.

The result of the verification task can have one of the following states:

  • Successfully (Passed): the functional and UI aspects correspond to the expected results in all screenshots, which confirms the correctness of the work within the scope of the task.
  • Failed: A mismatch has occurred and an error has been fixed. Trace errors allow you to determine which steps led to the failure, helping to fix the problem faster.

The main stages of the development and implementation of multidimensional screenshot testing can be presented as a list.:

  • Preparing the test environment

At this stage, it is necessary to determine which aspects of the software will be tested. In this case: transitions; form filling; registration and authorization; screen resolution; orientation; themes (dark/light); browsers or platforms (mobile and desktop devices).

It is also important to formulate the dynamic verification task before creating test scenarios. For example, checking the correctness of the display of visual elements, text, icons, etc. and the possibility of interacting with them. After that, it is worth deploying a test environment that will be as close as possible to the actual conditions of use. This includes setting up device simulators or browsers, using different screen configurations, and downloading the necessary fonts and libraries.

For each of the aspects involved in screenshot testing, you can use different levels of verification. For example, you can perform a deeper check of only those elements that have already been identified as potentially problematic at the initial stage of the analysis. This allows you to reduce excessive resource costs.

  • Creating reference screenshots

By reference screenshots, we mean those images that reflect the desired or correct state of the interface. They will serve as the basis for a subsequent comparison. Screenshots are taken based on the version of the program that we take as the reference (it has either been previously tested and debugged, or a design layout of the system is used). When creating screenshots, it is suggested to rely on the technical requirements for the environments and devices that were defined in the previous step. As part of the verification task, when preparing test cases, several specific aspects of the software can be identified that will require verification using various parameters. Within the framework of multidimensional screenshot testing, the parameters can be divided as follows:

- Global deviations are significant changes in the expected behavior of the software (for example, missing elements, server errors).

- Detailed deviations are small changes, such as color changes or slight displacement of elements.

- Functional deviations – errors in the operation of elements that can be interacted with.

  • Automated taking of test screenshots

After collecting technical requirements and preparing reference versions of screenshots, automated test scenarios are launched that take pictures of the current state of the software under the same conditions in which the reference ones were made. For multidimensional screenshot testing, it is important to provide for the creation of certain conditions for verification. For example, checking different themes, localization languages, devices with different screens and resolutions. This allows you to expand the testing coverage and ensure that the software works correctly in various situations.

  • A multidimensional comparison

After automated removal of test screenshots on the current version of the software, automated test scenarios begin to compare the two screenshots available in the library for compliance/inconsistencies. The comparison takes place on three levels:

- the pixel level at which any differences between the reference and test screenshots are determined (changes in positions, fonts, colors).

- the structural element verification level allows you to check the correctness of the display of interface elements. These can be the sizes of margins/paddings, buttons and inputs, and their location in the viewport.

- the level of content verification, which checks not only the size / color of the font, but also the correctness of the content itself. In addition to the text information, you can also check that all graphical interface elements are displayed correctly and are in their places.

  • Reporting and analysis of results

After completing the tests, it is necessary to set the startup parameter using the framework for generating reports describing all negative and positive outcomes. These reports can include screenshots with highlighted problem areas, which makes it easier for both the tester and the developer to analyze and understand the problem.

Based on the reports received, teams can make decisions about the need to fix certain problems. This can be manual analysis for complex cases or automatic decision-making for obvious deviations.

To reduce false positives, it may be necessary to adjust the filtering conditions for the results obtained. So, all deviations found should be classified according to their criticality. For example, missing layout elements or a broken registration form can be much more important than minor deviations in the shade of the button or its shift by several pixels. It is also worth paying attention to dynamically changing elements: pop-up windows, animations, phone numbers with autocorrect, etc. The proposed method uses the PIL library for image processing to visually check the interface, and Javascript helps mask dynamic elements such as pop-up notifications or animations that may change between launches and interfere with correct comparison. This provides a more detailed and accurate comparison, eliminating unwanted errors that are not related to functional changes.

Thus, to minimize false positives, you need to adjust the sensitivity levels of the tests or work out an algorithm to bypass the analysis of undesirable elements for comparison. This will help you focus on critical bugs and avoid reporting minor issues that do not affect the functioning of the system.

  • Refactoring and optimizing tests

If the application has changed in accordance with the new requirements, the old reference screenshots may become obsolete. In this case, they need to be updated so that future tests are relevant to the current state of the software. This is an important process that helps to avoid false positives or inconsistencies in the set verification tasks. Optimization of the test system may also be required. Continuous improvement of screenshot comparison algorithms and automation tools is key to improving testing efficiency. For example, you can improve the results filtering system or introduce more accurate analysis methods. Optimization allows you to speed up testing and reduce the time for manual verification of results.

Also, in some cases, it is additionally possible to implement an iterative verification process, in which verification of the correctness or compliance of the system, program, or its individual components is performed repeatedly after each stage of development. For example, one of the development teams is working on a new feature that they plan to integrate into an already created product, in which case there is no need to wait for the new feature to appear in the final product — you can start testing it right away. This will allow you to identify errors or inconsistencies at an early stage and make the necessary changes before moving on to the next phase of work or integration with the main project. In this case, the subsequent launch of multidimensional screenshot testing will reduce the number of errors that could have been detected earlier, which will allow you to focus on checking the interaction of new and old elements in the system.

3. Experimental approbation of multidimensional screenshot testing

As part of the work, an experiment was conducted to test the following hypothesis: multidimensional screenshot testing is a more effective type of web application verification compared to functional and UI testing, thanks to a hybrid analysis of all aspects of the software within the selected test scenario in a single test run.

The purpose of the experiment is to prove that multidimensional screenshot testing is a more effective type of verification.

As part of the experiment, an online store website on WordPress was created to facilitate documentation and analysis of the results.

The verification task was set as follows: it is necessary to check the functionality of the web page of the online store "My Account", which is intended for authorization and user registration. By performance, we mean the following: the correctness of the page display according to the reference software versions, the possibility of functional interaction with various elements, and the correct transmission of information (login/password).

The effectiveness was assessed by comparative analysis using examples of functional, UI, and multidimensional screenshot testing.

Performance evaluation criteria: the effectiveness of the created solutions was evaluated according to the following criteria:

  • Number of detected defects: evaluation of the effectiveness of testing based on the number of detected deviations, including graphical artifacts and functional errors. Metric: The total number of defects detected by each type of testing. Percentage comparison between species.
  • Test execution time: an analysis of the time spent on conducting tests by each of the methods. Metric: The average time spent on executing each test scenario within the framework of the method.
  • Labor costs for the development and support of tests: assessment of the complexity of the development of tests and their support at each stage. Metric: The number of hours spent developing and maintaining tests for each method. It was measured as a percentage.
  • Test accuracy: A comparison of the error detection accuracy for each test method, including false positives and missed defects. Metric: Percentage of false positives and missed defects
  • Reliability: An assessment of the stability of tests, measured by the number of unpredictable test failures.

The following test scenarios were identified as part of the verification task:

  • for functional testing — 2 test situations (Table 1);
  • for UI testing — 16 test situations (Table 2);
  • for multidimensional screenshot testing, there are 4 test situations (Table 3).

Let's look at them in more detail:

Table 1

Functional Testing Test Model

Case number

Test name

Steps

Expected result

1

Checking the registration form. User Registration

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Select "Register" from the menu

The registration form is displayed

3. Fill In The Fields:

The mail field

FCs

Password

Password confirmation

Checkboxes

Registration button

The fields have been filled in successfully

4. Click on the Register button

Registration was completed successfully

2

Checking the authorization form. User's login

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Fill in the User Name fields

Name entered successfully

3. Fill in the Password field

The password is entered

4. Click on the Login button

Login has been successfully completed

Table 2

The UI Testing Model

Case number

Test name

Steps

Expected result

1

Checking the registration form (UI)

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the registration form is displayed.

2. Select "Register" from the menu

The registration form is displayed

3. Checking the fields in the registration form

Displayed:

The mail field

FCs

Password

Password confirmation

Checkboxes

Registration button

2

Checking the UI authorization form

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Checking the input menu displays

Displayed

Email or login

Password

Forgot your Password link

The Remember Me Checkbox

The Login tab

3-10

Checking the registration form (UI). Checking the properties of fields: Mail field

Surname

Name

Password

Password confirmation

Checkbox 1 and Checkbox 2

Registration button

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the registration form is displayed.

Checking the css properties of the mail field

The properties of the selected element are displayed correctly relative to the specified ones.

Table 3

A test model for multidimensional screenshot testing

Case number

Test name

Steps

Expected result

1

Checking the registration form (UI)

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Select "Register" from the menu

The registration form is displayed

3. Checking the fields in the registration form

Displayed:

The mail field

FCs

Password

Password confirmation

Checkboxes

Registration button

2.

Checking the registration form (UI) - registration

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Select "Register" from the menu

The registration form is displayed

3. Fill In The Fields:

The mail field

FCs

Password

Password confirmation

Checkboxes

Registration button

Fields have been filled in successfully

4. Click on the Register button

Registration was completed successfully

3.

Checking the UI authorization form

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Checking the input menu displays

Displayed

Email or login

Password

Forgot your Password link

The Remember Me Checkbox

The Login tab

4.

Checking the authorization form. User's login

1. Go to the page http://study-test.local/my-account/

The page has been opened successfully, and the authorization form is displayed.

2. Fill in the User Name fields

Name entered successfully

3. Fill in the Password field

The password is entered

4. Click on the Login button

Login has been successfully completed

Figure 1 below shows a flowchart of the proposed method of multidimensional screenshot testing.

Изображение выглядит как диаграмма, План, Технический чертеж, схематичный  Контент, сгенерированный ИИ, может содержать ошибки.

Fig. 1 — Block diagram of test execution

Checking the authorization and registration page

Step 1. The first test run was performed on a reference version of the software to evaluate and debug the health of the created code. As part of this step, all results should be passed. Figure 2 shows the results of testing the reference version of the software.

Fig. 2 — results of functional, UI and multidimensional screenshot testing of the reference software version

Step 2. To evaluate the results using the selected metrics, it is necessary to make changes to the model under test in order to evaluate the ability of the created solution to detect errors. Changes were made in the following aspects: the width of the fields of the authorization form was changed from 50 pixels to 60 pixels, the white fill of the form was changed from white to gray. The results are shown in Fig. 3 and Fig. 4.

Fig. 3 — visual demonstration of the reference version of the software

Fig. 4 — visual demonstration of the software version with simulated errors

According to the test results after making changes (Fig. 5), simulating the occurrence of real errors on the tested model, the following results were obtained:

Изображение выглядит как текст, снимок экрана, Шрифт, дизайн  Контент, сгенерированный ИИ, может содержать ошибки.

Fig. 5 — results of functional, UI and multidimensional screenshot testing of the software version with introduced errors

Figure 5 shows that functional testing could not fully complete the verification task, as its capabilities do not cover all aspects that require verification. In turn, UI testing revealed a higher number of errors, which is due to a wider set of test cases provided for testing the user interface. However, it is worth noting that this testing did not cover the verification of functionality, and therefore cannot serve as a full-fledged substitute for functional testing. In contrast, multidimensional screenshot testing showed the best result by identifying all errors and documenting them.

When an error is found, multidimensional screenshot testing outputs a message to the report, the content of which depends on the aspect being checked. If the error was in the visual component, the message will look like this: "Failed: Error when comparing screenshots: The difference was found in the screenshots: 419178 pixels are different from the previous screenshot: Checking the authorization form (UI). The presence of all elements_the name of the .png file", where "the name of the .png file" indicates the screenshot obtained after comparison. This simplifies the process of analyzing and documenting errors after verification for both the tester and the developer.

Functional and UI testing responds to the occurrence of an error by displaying the status "AssertionError: ", where may indicate, for example, "Did not go to the successful authorization page" or "Invalid color: rgba(160, 160, 160, 1)", this is not so obvious and requires additional work to prepare the reports.

The result of the overlap of the reference version of the screenshot and the version that is taken during testing is shown in Fig. 6. Reports are generated in this form after passing the tests.

Fig. 6 — An example of the final screenshot based on the test results

As a result of the experiment, the following results were obtained:

Testing accuracy:

  • Functional testing: 0 false positives were detected, 2 defects were missed.
  • UI testing: 7 defects were detected, 3 false alarms were detected, and 1 defect was missed.
  • Multidimensional screenshot testing: 4 defects were detected, 0 false positives, 0 defects were missed.

The following formula was used to evaluate the accuracy of the types of testing being tested:

Where x is the accuracy, y is the total number of detected defects, z is the number of false alarms, and f is the missed defects.

Thus, the following results were obtained:

- The accuracy of functional testing is 10.0%, however, UI defects were present on the page, which should not be determined by this type of testing. From the point of view of the verification task, functional testing covers only a part of the errors.

- The accuracy of UI testing is 37.5%, as 3 false alarms were detected and 1 error was missed.

- The accuracy of multidimensional screenshot testing is 100%, as all defects were identified without false positives and missed errors.

This calculation confirms that multidimensional screenshot testing benefits in accuracy compared to functional and UI testing, which makes it a more effective method for detecting defects in the system.

Stability of test scenarios:

The stability of test scenarios can be assessed based on the number of test runs that successfully complete over several iterations, compared to the total number of test runs. The formula for calculating stability was used as follows:

Where x is the stability of the tests, y is the number of successful test runs, and z is the total number of test runs.

  • Successful test runs are those tests that have passed without failures or errors.
  • The total number of test runs is the total number of test runs over several iterations.

To obtain stability data, test runs were conducted several times, after which the results were recorded.

  • Successful runs: The number of times the test runs were completed successfully.
  • Unsuccessful runs: how many times test runs failed due to unpredictable factors (for example, failures in the environment, network problems).

The following results were obtained:

  1. Functional testing: 10 successful launches out of 10. Stability is equal to 100%;
  2. UI testing: 5 successful launches out of 10, stability is 50%;
  3. Multidimensional screenshot testing: 9 out of 10 successful launches, 90% stability

This calculation shows that functional testing in the framework of the experiment showed the most stable results. However, it should be taken into account that the test scenarios within the framework of the set verification task were completed with significant errors.

A summary table of the results obtained can be presented as follows:

Table 4

The results obtained during the experiment

Analyzed positions for comparison

Functional testing

UI testing

Screenshot testing

Number of identified defects

0 out of 2

7 out of 16

4 of 4

Test execution time

11 seconds

45 seconds

28 seconds

Labor costs for the development and support of tests

Fast enough solutions

It is time-consuming, because each aspect of the interface needs to be covered separately with styles and properties.

Slightly longer than functional, but faster for UI testing

Testing accuracy,

including false positives and missed defects

0%

37.5%

100%

Assessment of test stability

100%

50%

90%

Thus, the hypothesis is confirmed, multidimensional screenshot testing is a more effective type of web application verification compared to functional and UI testing.

Discussion of the results

The proposed method was tested on an artificially created web application with the limitation of checking only a critical component within the framework of a clearly defined limited verification task. The proposed verification method has shown its viability, but requires additional verification when integrated into more complex systems and setting up a detailed verification task, which will require not only verification of the critical functionality of one page, but also less critical, but important web application verification criteria.

It is worth noting that the implementation of multidimensional screenshot testing can be a complex and resource-demanding task. As the number of verifiable aspects increases, the likelihood of encountering the problem of exponential complexity increases, which can lead to difficulties in processing all possible states of the system. In the context of multidimensional screenshot testing, this can manifest itself in the form of complex interfaces with many dynamically changing elements, which requires manual configuration of the notification system.

As part of the proposed method, we circumvented the limitation by creating a system for comparing screenshots and then saving them for reporting. The scheme of such a solution is shown above in Fig. 1. Each screenshot in the repository is named according to a certain logic: the name indicates the date of the test, the tested functionality and aspect (for example, it may be testing one of the forms on the page and then the screenshot will contain only it, or the full page with all aspects). In the context of regression testing, this approach allows for regular software stability checks without spending a lot of time going through the same type of test cases.

In the case of working with dynamic elements, such as auto-number substitution, etc., a solution is described in which these elements are masked using Javascript, which allows you to configure an automatic filtering system to exclude them from verification.

It is also worth emphasizing that the disadvantage of multidimensional screenshot testing may be its dependence on computing resources. Since the method involves simultaneous verification of many aspects, it requires a lot of computing power, especially when working with large and complex programs. This may manifest itself in the need to store a large amount of data (screenshots), as well as in image processing for their analysis. This creates additional load on the system and may increase the test execution time.

This limitation can be circumvented in various ways. In the context of this work, lossless image compression is used, which significantly reduces the amount of data stored. This way, the original image is saved without deleting any important information in PNG format.

It is also worth considering that long-term data storage is necessary for the preparation of quarterly and annual reports on many projects. In cases where the reports have already been processed and corrected after passing the tests and quick access to them is not so necessary, you can use data archiving methods. Archiving involves combining multiple files into a single file, followed by compression to reduce the total amount of data. For example, if necessary, you can split the data obtained during testing by the quarters in which the checks were performed, or by the type of verifiable functionality to simplify file management. Thus, at any moment it will be possible to raise exactly the layer of information that will be needed. Archiving is also an effective way to transfer data for backup or transfer to remote servers.

Thus, the implementation of multidimensional screenshot testing can become a serious problem if the technical requirements, processes and methods necessary for a particular software have not been analyzed and described[9].

In the context of the proposed method, the following recommendations can be formulated for further research:

  1. To conduct additional research on the application of multidimensional screenshot testing for mobile applications, where the influence of dynamic interface elements and a variety of devices can significantly complicate the verification process;
  2. To verify the method of multidimensional screenshot testing within the framework of various verification tasks using a more complex web application, which will require the development of algorithms for optimizing the resource intensity of the method when working with large and highly loaded applications;
  3. To explore the possibilities of integrating a multidimensional approach with other types of testing, as well as to improve the proposed method by including a system that simulates human behavior when testing software, using the latest advances in computer vision[10].

Conclusion

As part of the research, the goal was achieved — to prove the effectiveness of multidimensional screenshot testing as a method of automated dynamic verification of web applications. Based on experimental data, the hypothesis has been confirmed that multidimensional screenshot testing is superior to traditional methods such as functional and UI testing due to a hybrid analysis of the functional and visual characteristics of the application.

During the experiment conducted as part of the preparation of this article, it was shown that the possibilities of multidimensional screenshot testing exceed the possible limitations and disadvantages due to reduced testing time (especially regression testing), detailed analysis of various aspects, flexibility in changing the testing strategy. Experimental data can be extrapolated to more complex design solutions, which makes the use of multidimensional screenshot testing a more effective way of dynamic software verification.

The proposed technique is based on theoretically sound image comparison algorithms that ensure high accuracy of deviation analysis. The developed verification model makes it possible to effectively structure test scenarios and simplify their analysis.

Thus, multidimensional screenshot testing allows you to create a flexible and accurate verification system that takes into account different aspects of the software by using several types of testing at once within a single test run, which allows you to automate complex processes with a minimum number of false positives. Moreover, the ability to compare between different versions of the application allows you to track changes, which helps to make sure that you make informed edits that do not disrupt the functionality or appearance of the software. Using Python, Pytest, and PIL makes it easy to analyze changes between versions and quickly identify unexpected deviations.

References
1. National Institute of Standards and Technology. (2002). The economic impacts of inadequate infrastructure for software testing. Retrieved October 5, 2024, from https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf
2. Gurin, R. E., Rudakov, I. V., & Rebrikov, A. V. (2015). Methods of software verification. Machine Engineering and Computer Technologies, 10, 235-251.
3. Quadri, S. M. K., & Farooq, S. U. (2010). Software testing?goals, principles, and limitations. International Journal of Computer Applications, 6(9), 7-9.
4. Kumar, S. (2023). Reviewing software testing models and optimization techniques: An analysis of efficiency and advancement needs. Journal of Computers, Mechanical and Management, 2(1), 43-55.
5. Xie, Q., & Memon, A. M. (2007). Designing and comparing automated test oracles for GUI-based software applications. ACM Transactions on Software Engineering and Methodology, 16(1), 4.
6. Kudryavtseva, E. Y. (2014). Automated testing of web interfaces. Mining Information and Analytical Bulletin, S, 354-356.
7. Kulyamin, V. V. (2008). Methods of software verification. ISP RAN.
8. Persival, G. (2018). Python: Test-driven development. DMK Press.
9. Beregeiko, O. P., & Dubovsky, A. S. (2016). Automation of web application testing. Bulletin of the Master's Program, 12-4, 39-41.
10. Dwarakanath, A., Neville, D., & Sanjay, P. (2018). Machines that test software like humans. arXiv preprint arXiv:1809.09455.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The presented article on "Screenshot testing as a multidimensional type of automated dynamic verification of web applications" corresponds to the topic of the journal "Software Systems and Computational Methods" and is devoted to the issue of ensuring a high level of quality and maintaining stable functioning of software. Since even minor errors can lead to loss of stability, deterioration of user experience, etc., and as a result to economic costs, it is important to regularly analyze the state of the software. The article presents a broad analysis of literary Russian and foreign sources on the research topic. The aim of the study is to develop a method of multidimensional screenshot testing that combines functional and UI approaches to improve the accuracy of defect detection and optimize resources in the process of web application verification. The style and language of the presentation of the material is scientific and accessible to a wide range of readers. The volume of the article corresponds to the recommended volume of 12,000 characters or more. The article is quite structured - there is an introduction, conclusion, internal division of the main part (research methods, description of the proposed method, experimental approbation of multidimensional screenshot testing, discussion of the results). The work contains graphic material, represented by 6 figures, as well as 4 tables. As a research methodology, the authors indicate the development and application of multidimensional screenshot testing, which is a combination of functional and UI testing, which makes it possible to effectively solve the tasks of verifying web applications. The main idea of the method is to conduct a parallel analysis of the functional and visual characteristics of a web application by automatically comparing reference and test screenshots. The authors also conducted an experiment aimed at testing the following hypothesis: multidimensional screenshot testing is a more effective type of web application verification compared to functional and UI testing, thanks to a hybrid analysis of all aspects of the software within the selected test scenario in a single test run. The effectiveness was assessed by comparative analysis using examples of functional, UI, and multidimensional screenshot testing. The practical significance of the article is clearly justified – the effectiveness of multidimensional screenshot testing as a method of automated dynamic verification of web applications has been proven. Based on experimental data, the authors have confirmed the hypothesis that multidimensional screenshot testing is superior to traditional methods such as functional and UI testing due to a hybrid analysis of the functional and visual characteristics of the application. The method proposed by the authors was tested on an artificially created web application with the limitation of checking only the critical component within the framework of a clearly defined limited verification task. The proposed verification method has shown its viability, but requires additional verification when integrated into more complex systems and setting up a detailed verification task, which will require not only verification of the critical functionality of one page, but also less critical, but important web application verification criteria. The article "Screenshot testing as a multidimensional type of automated dynamic verification of web applications" can be recommended for publication in the journal Software Systems and Computational Methods.