Library
|
Your profile |
Software systems and computational methods
Reference:
Makarov, K.S., Fatkin, R.I. (2025). Screenshot testing as a multi-aspect type of automated dynamic verification for web applications. Software systems and computational methods, 1, 32–54. . https://doi.org/10.7256/2454-0714.2025.1.73535
Screenshot testing as a multi-aspect type of automated dynamic verification for web applications
DOI: 10.7256/2454-0714.2025.1.73535EDN: UVGEBCReceived: 02-03-2025Published: 03-04-2025Abstract: The subject of this study is multi-aspect screenshot testing as a modern method of automated dynamic verification of web applications, combining functional testing and user interface (UI) validation. Contemporary testing methods face challenges such as high labor intensity, false positives, and low scalability, especially in complex projects. The main objective of the research is to develop and implement a method that improves defect detection accuracy, reduces testing time, and lowers test case development costs. The study explores image comparison algorithms, dynamic element filtering techniques, and automated UI analysis approaches to enhance efficiency and standardization in the web application verification process. Unlike functional and UI testing conducted separately, the proposed method enables simultaneous analysis of multiple aspects of the interface and functionality, minimizing labor costs and increasing testing reliability. The approach employs automated comparison of reference and test screenshots at the pixel, structural element, and content levels using Python, Selenium, PIL, and Pytest-xdist for parallel test execution, effectively addressing the challenges of web application verification. Some researchers in the field of testing agree that the testing process lacks standardization and clear evaluation criteria. The proposed method ensures the achievement of verification objectives even under evolving strategies and approaches to system performance assessment by creating a flexible and precise validation system that integrates various testing types into a unified structure, making it suitable for modern software development challenges. The experimental section demonstrates the advantages of multi-aspect screenshot testing over other methods, including reduced testing time, improved defect detection accuracy, and enhanced analysis of test reports. This approach can be adapted to various testing scenarios and is particularly beneficial for high-load projects requiring regular regression testing. Keywords: screenshot testing, multi-aspect screenshot testing, automated testing, dynamic verification web applications, Python in testing, Selenium and PIL, UI testing, testing optimization, testing criteria, verification tasksThis article is automatically translated. You can find original text of the article here. Introduction In the process of developing any software, an important step is testing the solutions being created. Modern applications, especially web and mobile, require regular updates and improvements in the face of rapidly changing requirements from both the market and the introduction of new technologies. To ensure a high level of quality and maintain stable operation, it is important to regularly analyze the state of the system, as even minor errors can lead to loss of stability, deterioration of user experience, etc., and as a result to economic costs. Thus, annual economic losses due to low-quality software in the United States alone are estimated at tens of billions of dollars [1, p. 14], which underlines the urgency of developing reliable software testing methods. Web applications have become an integral part of life for billions of people around the world. And most of these solutions have a commercial focus. For a product whose ultimate goal is to make a profit, the economic losses associated with low-quality software can become one of the main sources of problems, since any errors that lead to a deterioration in the user experience lead to losses in the number of real users. It is product testing that is designed to solve a number of such problems. Screenshot testing, focused on automatic interface comparison, is actively used in dynamic software verification. However, traditional approaches such as functional or separate UI testing have limitations: high labor costs, excessive number of false positives, and poor performance when scaling tasks. In the case of highly loaded and complex web applications, tracking all these processes manually is a rather time-consuming task, which is fraught with errors on the part of the tester. Gurin R. E., Rudakov I. V., Rebrikov A.V. consider existing approaches to software verification and analyze their limitations and effectiveness[2]. The problems highlighted in the article point to the need for new solutions that can overcome the limitations of traditional approaches. Also, studies [3-5] note that software verification has a number of limitations related to:
The problems highlighted in the research indicate the need for new solutions that can overcome the limitations of traditional approaches. Multidimensional screenshot testing offers a combined approach combining functional and UI testing, which can improve the accuracy and performance of software verification. Kudryavtseva E. Yu., also notes that the use of automated testing in large projects is the most reasonable and relevant[6]. Thus, the proposed methodology meets the challenges and contributes to the further development of the field of software verification. It is also worth noting that at the moment in the Russian scientific literature, issues related to software verification, in particular, of publicly available use, are rarely considered or are limited to review articles of existing methods. Because of this, the field of software testing is not developing so rapidly, unlike development. However, the rapid pace of development of the software development field requires verification methods to adapt just as quickly to meet current challenges. Thus, the purpose of this work was to develop a method for multidimensional screenshot testing that combines functional and UI approaches to improve the accuracy of defect detection and optimize resources in the web application verification process. To achieve this goal, it is proposed to use the Python programming language with Selenium and PIL libraries, which provides automation of tests and reduces labor costs. 1. Research methods The research methodology is based on the development and application of multidimensional screenshot testing, which is a combination of functional and UI testing, which makes it possible to effectively solve the tasks of verifying web applications. This method allows you to simultaneously test several aspects of the software in a single run of the test scenario. This approach is designed to reduce the number of false positives and increase the accuracy of testing. The main idea of the method is to conduct a parallel analysis of the functional and visual characteristics of a web application by automatically comparing reference and test screenshots. This is achieved by:
The methodology is focused on minimizing false positive results that may occur with dynamic interface changes (for example, pop-up notifications or animations). For this purpose, filters and masking of dynamic elements have been implemented, which makes it possible to exclude their influence on the final test results. The experimental part of the methodology is aimed at conducting a comparative analysis of multidimensional screenshot testing with traditional methods such as functional and UI testing. For this purpose, key evaluation metrics have been identified.:
The methodology was tested on a web application developed for experimental purposes and covered aspects such as:
The methodology used provides a deeper analysis of the interaction of interface elements, which makes it possible to identify defects that could have been missed when using other types of dynamic verification. For example, multidimensional screenshot testing is aimed at configurable verification of several test situations for a given verification task in one run of automated test scenarios. Each aspect focuses on verifying certain functional or visual elements of the interface and, in the process of passing the test, all violations for the tested test situations must be detected. Test situations are those situations in which testing is performed, and the procedures describing the process of creating these situations and the checks that need to be performed on the results obtained are called tests [7, P. 68]. As part of software verification through multidimensional screenshot testing, the functional and UI characteristics of the object are considered. The object of research in this paper is a web application that requires regular verification to ensure stable operation and compliance with user requirements. Thus, an online store developed on the WordPress platform was chosen, with an emphasis on the "My Account" page. The application is a typical representative of commercial web resources that are widely used to provide online services. It contains standard user interface elements such as data entry forms, buttons, checkboxes, drop-down lists, and visual elements (icons, color schemes). The authorization and registration page is critically important for any online store, as it provides access to the functionality of the application and affects the user experience. The test environment used was a simulation of various real-world operating conditions: different screen resolutions, light and dark interface themes, as well as various browsers and devices (desktop and mobile). The use of multidimensional screenshot testing for such an object makes it possible to effectively demonstrate the advantages of the method, including identifying complex errors that could go unnoticed when using only functional or UI testing. Thus, the chosen research object is a universal example that allows us to evaluate the effectiveness and applicability of the proposed method for a wider range of web applications. To implement the proposed methodology of multidimensional screenshot testing, a set of tools and software libraries were used to automate the processes of verification, image processing and analysis of results. To simplify the implementation of a multidimensional approach, the proposed method has been integrated with the Python high-level programming language. Due to its flexibility and extensive library of tools for working with images and user interfaces, the language provides all the necessary tools for implementing a multidimensional approach[8]. Thus, the following were used:
The listed technologies integrate well into a single system, support customization for various devices, browsers and interface themes, and also allow you to run tests in parallel, which reduces the time for verification. This made it possible to automate the screenshot testing process, which simultaneously checks several aspects of the software. It is also worth noting that multidimensional screenshot testing increases the chances of identifying potential defects that could go unnoticed when using other types of dynamic verification, for example, functional testing, where the ability to interact with the system is checked. Different aspects may behave differently depending on their relationship to other system elements or display conditions. For example, when interacting or hovering over a button, its color should change, but functional testing is designed to verify only that the button can be interacted with. It is the simultaneous verification that makes it possible to identify and analyze complex relationships and interactions between different aspects of the software, errors in which could go unnoticed during the step-by-step testing of individual aspects. For example, with multidimensional screenshot testing of web applications, you can simultaneously check the location of buttons, the occupancy of inputs, color schemes, and the state of various interface elements, which makes testing more complete and less error-prone due to carelessness or cross-platform compatibility. 2. Description of the proposed method The verification task was formulated in such a way that all functional and non-functional tests were combined into one stream, which was used to verify the largest number of software aspects in the shortest time to complete test cases. So, multidimensional screenshot testing will make it possible to have hundreds of decomposed test cases in one scenario, which allows you to achieve good performance results and economic benefits. Each test case has its own verification task, which determines the correct behavior. So, by comparing screenshots of the reference version of the software with the current version, the system analyzes the interface for matches/discrepancies. If a discrepancy is found, the test case returns false and shows the error label at the point where the screenshot differs from the expected behavior of the software. All this is fixed using the error-trace output algorithm. Trace errors (or trace errors) are a sequence of steps or operations performed by a system or program that led to an error. They help to localize failures, which simplifies the process of diagnosis and error correction. The result of the verification task can have one of the following states:
The main stages of the development and implementation of multidimensional screenshot testing can be presented as a list.:
At this stage, it is necessary to determine which aspects of the software will be tested. In this case: transitions; form filling; registration and authorization; screen resolution; orientation; themes (dark/light); browsers or platforms (mobile and desktop devices). It is also important to formulate the dynamic verification task before creating test scenarios. For example, checking the correctness of the display of visual elements, text, icons, etc. and the possibility of interacting with them. After that, it is worth deploying a test environment that will be as close as possible to the actual conditions of use. This includes setting up device simulators or browsers, using different screen configurations, and downloading the necessary fonts and libraries. For each of the aspects involved in screenshot testing, you can use different levels of verification. For example, you can perform a deeper check of only those elements that have already been identified as potentially problematic at the initial stage of the analysis. This allows you to reduce excessive resource costs.
By reference screenshots, we mean those images that reflect the desired or correct state of the interface. They will serve as the basis for a subsequent comparison. Screenshots are taken based on the version of the program that we take as the reference (it has either been previously tested and debugged, or a design layout of the system is used). When creating screenshots, it is suggested to rely on the technical requirements for the environments and devices that were defined in the previous step. As part of the verification task, when preparing test cases, several specific aspects of the software can be identified that will require verification using various parameters. Within the framework of multidimensional screenshot testing, the parameters can be divided as follows: - Global deviations are significant changes in the expected behavior of the software (for example, missing elements, server errors). - Detailed deviations are small changes, such as color changes or slight displacement of elements. - Functional deviations – errors in the operation of elements that can be interacted with.
After collecting technical requirements and preparing reference versions of screenshots, automated test scenarios are launched that take pictures of the current state of the software under the same conditions in which the reference ones were made. For multidimensional screenshot testing, it is important to provide for the creation of certain conditions for verification. For example, checking different themes, localization languages, devices with different screens and resolutions. This allows you to expand the testing coverage and ensure that the software works correctly in various situations.
After automated removal of test screenshots on the current version of the software, automated test scenarios begin to compare the two screenshots available in the library for compliance/inconsistencies. The comparison takes place on three levels: - the pixel level at which any differences between the reference and test screenshots are determined (changes in positions, fonts, colors). - the structural element verification level allows you to check the correctness of the display of interface elements. These can be the sizes of margins/paddings, buttons and inputs, and their location in the viewport. - the level of content verification, which checks not only the size / color of the font, but also the correctness of the content itself. In addition to the text information, you can also check that all graphical interface elements are displayed correctly and are in their places.
After completing the tests, it is necessary to set the startup parameter using the framework for generating reports describing all negative and positive outcomes. These reports can include screenshots with highlighted problem areas, which makes it easier for both the tester and the developer to analyze and understand the problem. Based on the reports received, teams can make decisions about the need to fix certain problems. This can be manual analysis for complex cases or automatic decision-making for obvious deviations. To reduce false positives, it may be necessary to adjust the filtering conditions for the results obtained. So, all deviations found should be classified according to their criticality. For example, missing layout elements or a broken registration form can be much more important than minor deviations in the shade of the button or its shift by several pixels. It is also worth paying attention to dynamically changing elements: pop-up windows, animations, phone numbers with autocorrect, etc. The proposed method uses the PIL library for image processing to visually check the interface, and Javascript helps mask dynamic elements such as pop-up notifications or animations that may change between launches and interfere with correct comparison. This provides a more detailed and accurate comparison, eliminating unwanted errors that are not related to functional changes. Thus, to minimize false positives, you need to adjust the sensitivity levels of the tests or work out an algorithm to bypass the analysis of undesirable elements for comparison. This will help you focus on critical bugs and avoid reporting minor issues that do not affect the functioning of the system.
If the application has changed in accordance with the new requirements, the old reference screenshots may become obsolete. In this case, they need to be updated so that future tests are relevant to the current state of the software. This is an important process that helps to avoid false positives or inconsistencies in the set verification tasks. Optimization of the test system may also be required. Continuous improvement of screenshot comparison algorithms and automation tools is key to improving testing efficiency. For example, you can improve the results filtering system or introduce more accurate analysis methods. Optimization allows you to speed up testing and reduce the time for manual verification of results. Also, in some cases, it is additionally possible to implement an iterative verification process, in which verification of the correctness or compliance of the system, program, or its individual components is performed repeatedly after each stage of development. For example, one of the development teams is working on a new feature that they plan to integrate into an already created product, in which case there is no need to wait for the new feature to appear in the final product — you can start testing it right away. This will allow you to identify errors or inconsistencies at an early stage and make the necessary changes before moving on to the next phase of work or integration with the main project. In this case, the subsequent launch of multidimensional screenshot testing will reduce the number of errors that could have been detected earlier, which will allow you to focus on checking the interaction of new and old elements in the system. 3. Experimental approbation of multidimensional screenshot testing As part of the work, an experiment was conducted to test the following hypothesis: multidimensional screenshot testing is a more effective type of web application verification compared to functional and UI testing, thanks to a hybrid analysis of all aspects of the software within the selected test scenario in a single test run. The purpose of the experiment is to prove that multidimensional screenshot testing is a more effective type of verification. As part of the experiment, an online store website on WordPress was created to facilitate documentation and analysis of the results. The verification task was set as follows: it is necessary to check the functionality of the web page of the online store "My Account", which is intended for authorization and user registration. By performance, we mean the following: the correctness of the page display according to the reference software versions, the possibility of functional interaction with various elements, and the correct transmission of information (login/password). The effectiveness was assessed by comparative analysis using examples of functional, UI, and multidimensional screenshot testing.
Performance evaluation criteria: the effectiveness of the created solutions was evaluated according to the following criteria:
The following test scenarios were identified as part of the verification task:
Let's look at them in more detail:
Table 1 Functional Testing Test Model
Table 2 The UI Testing Model
Table 3 A test model for multidimensional screenshot testing
Figure 1 below shows a flowchart of the proposed method of multidimensional screenshot testing. Fig. 1 — Block diagram of test execution
Checking the authorization and registration page Step 1. The first test run was performed on a reference version of the software to evaluate and debug the health of the created code. As part of this step, all results should be passed. Figure 2 shows the results of testing the reference version of the software. Fig. 2 — results of functional, UI and multidimensional screenshot testing of the reference software version Step 2. To evaluate the results using the selected metrics, it is necessary to make changes to the model under test in order to evaluate the ability of the created solution to detect errors. Changes were made in the following aspects: the width of the fields of the authorization form was changed from 50 pixels to 60 pixels, the white fill of the form was changed from white to gray. The results are shown in Fig. 3 and Fig. 4. Fig. 3 — visual demonstration of the reference version of the software Fig. 4 — visual demonstration of the software version with simulated errors According to the test results after making changes (Fig. 5), simulating the occurrence of real errors on the tested model, the following results were obtained: Fig. 5 — results of functional, UI and multidimensional screenshot testing of the software version with introduced errors Figure 5 shows that functional testing could not fully complete the verification task, as its capabilities do not cover all aspects that require verification. In turn, UI testing revealed a higher number of errors, which is due to a wider set of test cases provided for testing the user interface. However, it is worth noting that this testing did not cover the verification of functionality, and therefore cannot serve as a full-fledged substitute for functional testing. In contrast, multidimensional screenshot testing showed the best result by identifying all errors and documenting them. When an error is found, multidimensional screenshot testing outputs a message to the report, the content of which depends on the aspect being checked. If the error was in the visual component, the message will look like this: "Failed: Error when comparing screenshots: The difference was found in the screenshots: 419178 pixels are different from the previous screenshot: Checking the authorization form (UI). The presence of all elements_the name of the .png file", where "the name of the .png file" indicates the screenshot obtained after comparison. This simplifies the process of analyzing and documenting errors after verification for both the tester and the developer. Functional and UI testing responds to the occurrence of an error by displaying the status "AssertionError: ", where may indicate, for example, "Did not go to the successful authorization page" or "Invalid color: rgba(160, 160, 160, 1)", this is not so obvious and requires additional work to prepare the reports. The result of the overlap of the reference version of the screenshot and the version that is taken during testing is shown in Fig. 6. Reports are generated in this form after passing the tests. Fig. 6 — An example of the final screenshot based on the test results
As a result of the experiment, the following results were obtained: Testing accuracy:
The following formula was used to evaluate the accuracy of the types of testing being tested: Where x is the accuracy, y is the total number of detected defects, z is the number of false alarms, and f is the missed defects.
Thus, the following results were obtained: - The accuracy of functional testing is 10.0%, however, UI defects were present on the page, which should not be determined by this type of testing. From the point of view of the verification task, functional testing covers only a part of the errors. - The accuracy of UI testing is 37.5%, as 3 false alarms were detected and 1 error was missed. - The accuracy of multidimensional screenshot testing is 100%, as all defects were identified without false positives and missed errors. This calculation confirms that multidimensional screenshot testing benefits in accuracy compared to functional and UI testing, which makes it a more effective method for detecting defects in the system.
Stability of test scenarios: The stability of test scenarios can be assessed based on the number of test runs that successfully complete over several iterations, compared to the total number of test runs. The formula for calculating stability was used as follows: Where x is the stability of the tests, y is the number of successful test runs, and z is the total number of test runs.
To obtain stability data, test runs were conducted several times, after which the results were recorded.
The following results were obtained:
This calculation shows that functional testing in the framework of the experiment showed the most stable results. However, it should be taken into account that the test scenarios within the framework of the set verification task were completed with significant errors. A summary table of the results obtained can be presented as follows: Table 4 The results obtained during the experiment
Thus, the hypothesis is confirmed, multidimensional screenshot testing is a more effective type of web application verification compared to functional and UI testing. Discussion of the results The proposed method was tested on an artificially created web application with the limitation of checking only a critical component within the framework of a clearly defined limited verification task. The proposed verification method has shown its viability, but requires additional verification when integrated into more complex systems and setting up a detailed verification task, which will require not only verification of the critical functionality of one page, but also less critical, but important web application verification criteria. It is worth noting that the implementation of multidimensional screenshot testing can be a complex and resource-demanding task. As the number of verifiable aspects increases, the likelihood of encountering the problem of exponential complexity increases, which can lead to difficulties in processing all possible states of the system. In the context of multidimensional screenshot testing, this can manifest itself in the form of complex interfaces with many dynamically changing elements, which requires manual configuration of the notification system. As part of the proposed method, we circumvented the limitation by creating a system for comparing screenshots and then saving them for reporting. The scheme of such a solution is shown above in Fig. 1. Each screenshot in the repository is named according to a certain logic: the name indicates the date of the test, the tested functionality and aspect (for example, it may be testing one of the forms on the page and then the screenshot will contain only it, or the full page with all aspects). In the context of regression testing, this approach allows for regular software stability checks without spending a lot of time going through the same type of test cases. In the case of working with dynamic elements, such as auto-number substitution, etc., a solution is described in which these elements are masked using Javascript, which allows you to configure an automatic filtering system to exclude them from verification. It is also worth emphasizing that the disadvantage of multidimensional screenshot testing may be its dependence on computing resources. Since the method involves simultaneous verification of many aspects, it requires a lot of computing power, especially when working with large and complex programs. This may manifest itself in the need to store a large amount of data (screenshots), as well as in image processing for their analysis. This creates additional load on the system and may increase the test execution time. This limitation can be circumvented in various ways. In the context of this work, lossless image compression is used, which significantly reduces the amount of data stored. This way, the original image is saved without deleting any important information in PNG format. It is also worth considering that long-term data storage is necessary for the preparation of quarterly and annual reports on many projects. In cases where the reports have already been processed and corrected after passing the tests and quick access to them is not so necessary, you can use data archiving methods. Archiving involves combining multiple files into a single file, followed by compression to reduce the total amount of data. For example, if necessary, you can split the data obtained during testing by the quarters in which the checks were performed, or by the type of verifiable functionality to simplify file management. Thus, at any moment it will be possible to raise exactly the layer of information that will be needed. Archiving is also an effective way to transfer data for backup or transfer to remote servers. Thus, the implementation of multidimensional screenshot testing can become a serious problem if the technical requirements, processes and methods necessary for a particular software have not been analyzed and described[9]. In the context of the proposed method, the following recommendations can be formulated for further research:
Conclusion As part of the research, the goal was achieved — to prove the effectiveness of multidimensional screenshot testing as a method of automated dynamic verification of web applications. Based on experimental data, the hypothesis has been confirmed that multidimensional screenshot testing is superior to traditional methods such as functional and UI testing due to a hybrid analysis of the functional and visual characteristics of the application. During the experiment conducted as part of the preparation of this article, it was shown that the possibilities of multidimensional screenshot testing exceed the possible limitations and disadvantages due to reduced testing time (especially regression testing), detailed analysis of various aspects, flexibility in changing the testing strategy. Experimental data can be extrapolated to more complex design solutions, which makes the use of multidimensional screenshot testing a more effective way of dynamic software verification. The proposed technique is based on theoretically sound image comparison algorithms that ensure high accuracy of deviation analysis. The developed verification model makes it possible to effectively structure test scenarios and simplify their analysis. Thus, multidimensional screenshot testing allows you to create a flexible and accurate verification system that takes into account different aspects of the software by using several types of testing at once within a single test run, which allows you to automate complex processes with a minimum number of false positives. Moreover, the ability to compare between different versions of the application allows you to track changes, which helps to make sure that you make informed edits that do not disrupt the functionality or appearance of the software. Using Python, Pytest, and PIL makes it easy to analyze changes between versions and quickly identify unexpected deviations. References
1. National Institute of Standards and Technology. (2002). The economic impacts of inadequate infrastructure for software testing. Retrieved October 5, 2024, from https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf
2. Gurin, R. E., Rudakov, I. V., & Rebrikov, A. V. (2015). Methods of software verification. Machine Engineering and Computer Technologies, 10, 235-251. 3. Quadri, S. M. K., & Farooq, S. U. (2010). Software testing?goals, principles, and limitations. International Journal of Computer Applications, 6(9), 7-9. 4. Kumar, S. (2023). Reviewing software testing models and optimization techniques: An analysis of efficiency and advancement needs. Journal of Computers, Mechanical and Management, 2(1), 43-55. 5. Xie, Q., & Memon, A. M. (2007). Designing and comparing automated test oracles for GUI-based software applications. ACM Transactions on Software Engineering and Methodology, 16(1), 4. 6. Kudryavtseva, E. Y. (2014). Automated testing of web interfaces. Mining Information and Analytical Bulletin, S, 354-356. 7. Kulyamin, V. V. (2008). Methods of software verification. ISP RAN. 8. Persival, G. (2018). Python: Test-driven development. DMK Press. 9. Beregeiko, O. P., & Dubovsky, A. S. (2016). Automation of web application testing. Bulletin of the Master's Program, 12-4, 39-41. 10. Dwarakanath, A., Neville, D., & Sanjay, P. (2018). Machines that test software like humans. arXiv preprint arXiv:1809.09455.
Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|