An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework

Authors

DOI:

https://doi.org/10.4108/airo.8895

Keywords:

Reinforcement Learning, Web Applications, UI Testing, BDD, Automated Testing

Abstract

Modern software applications demand efficient and reliable testing methodologies to ensure robust user interface functionality. This paper introduces an autonomous reinforcement learning (RL) agent integrated within a Behavior-Driven Development (BDD) framework to enhance UI testing. By leveraging the adaptive decision-making capabilities of RL, the proposed approach dynamically generates and refines test scenarios aligned with specific business expectations and actual user behavior. A novel system architecture is presented, detailing the state representation, action space, and reward mechanisms that guide the autonomous exploration of UI states. Experimental evaluations on open-source web applications demonstrate significant improvements in defect detection, test coverage, and a reduction in manual testing efforts. This study establishes a foundation for integrating advanced RL techniques with BDD practices, aiming to transform software quality assurance and streamline continuous testing processes.

Downloads

Download data is not yet available.

References

[1] Zhou, S., Xu, F.F., Zhu, H., Zhou, X., Lo, R., Sridhar, A., Cheng, X. et al. (2023) Webarena: A realistic web environment for building autonomous agents. ArXiv abs/2307.13854: null. doi:10.48550/arXiv.2307.13854, URL https://www.semanticscholar.org/paper/e41482f4ee984f17382f6cdd900df094d928be06.

[2] Shi, T., Karpathy, A., Fan, L.J., Hernández, J. and Liang, P. (2017) World of bits: An opendomain platform for web-based agents URL https://www.semanticscholar.org/paper/298a55ddc9777e39c5bad92a750827e1cae98ac1.

[3] Deng, X., Gu, Y., Zheng, B., Chen, S., Stevens, S., Wang, B., Sun, H. et al. (2023) Mind2web: Towards a generalist agent for the web. ArXiv abs/2306.06070: null. doi:10.48550/arXiv.2306.06070, URL https://www.semanticscholar.org/paper/58f8925a8b87054ad0635a6398a7fe24935b1604.

[4] Shaw, P., Joshi, M., Cohan, J., Berant, J., Pasupat, P., Hu, H., Khandelwal, U. et al. (2023) From pixels to ui actions: Learning to follow instructions via graphical user interfaces. ArXiv abs/2306.00245: null. doi:10.48550/arXiv.2306.00245, URL https://www.semanticscholar.org/paper/ee7020fc413590878dca60dcf41896bbe6a6c628.

[5] Liu, E., Guu, K., Pasupat, P., Shi, T. and Liang, P. (2018) Reinforcement learning on web interfaces using workflow-guided exploration. ArXiv abs/1802.08802: null. URL https://www.semanticscholar.org/paper/7dd8fc595afdd2097b43a5af9a8d9f5e97a65ec1.

[6] Furuta, H., Nachum, O., Lee, K.H., Matsuo, Y. and Gu, S. (2023) Instruction-finetuned foundation models for multimodal web navigation URL https://www.semanticscholar.org/paper/4762b614958013039ed14c59367eca05165b0c21.

[7] Gur, I., Nachum, O., Miao, Y., Safdari, M., Huang, A., Chowdhery, A., Narang, S. et al. (2022) Understanding html with large language models. ArXiv abs/2210.03945: null. doi:10.48550/arXiv.2210.03945,

URL https://www.semanticscholar.org/paper/3ee3f425482cf86989d809155cc8cf2bf8d8113e.

[8] Hong, W., Wang, W., Lv, Q., Xu, J., Yu, W., Ji, J., Wang, Y. et al. (2024) Cogagent: A visual language model for gui agents. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) : 14281–14290doi:10.1109/CVPR52733.2024.01354, URL https://www.semanticscholar.org/paper/e50583008fe4ec049e42fdc01727ce98f6d86a35.

[9] Shridhar, M., Yuan, X., Côté, M.A., Bisk, Y., Trischler, A. and Hausknecht, M.J. (2020) Alfworld: Aligning text and embodied environments for interactive learning. ArXiv abs/2010.03768: null. URL https://www.semanticscholar.org/paper/398a0625e8707a0b41ac58eaec51e8feb87dd7cb.

[10] You, K., Zhang, H., Schoop, E., Weers, F., Swearngin, A., Nichols, J., Yang, Y. et al. (2024) Ferretui: Grounded mobile ui understanding with multimodal llms doi:10.48550/arXiv.2404.05719, URL https://www.semanticscholar.org/paper/78fbb6e7a1c568a04e8c935aa9909d0c942ea5f6.

[11] Xu, Y., Su, H., Xing, C., Mi, B., Liu, Q., Shi, W., Hui, B. et al. (2023) Lemur: Harmonizing natural language and code for language agents. ArXiv abs/2310.06830: null. doi:10.48550/arXiv.2310.06830, URL https://www.semanticscholar.org/paper/7dd8fc595afdd2097b43a5af9a8d9f5e97a65ec1.

[12] Zheng, L., Wang, R. and An, B. (2023) Synapse: Trajectory-as-exemplar prompting with memory for computer control URL https://www.semanticscholar.org/paper/0bfc804e31eecfd77f45e4ee7f4d629fffdcd628.

[13] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. and Cao, Y. (2022) React: Synergizing reasoning and acting in language models. ArXiv abs/2210.03629: null. URL https://www.semanticscholar.org/paper/0671fd553dd670a4e820553a974bc48040ba0819.

[14] Rawles, C., Li, A., Rodriguez, D., Riva, O. and Lillicrap, T. (2023) Android in the wild: A largescale dataset for android device control. ArXiv abs/2307.10088: null. doi:10.48550/arXiv.2307.10088, URL https://www.semanticscholar.org/paper/060e0df71620f1844839f6a993dfa5fb8e4c3bf6.

[15] Xu, B., Peng, Z., Lei, B., Mukherjee, S., Liu, Y. and Xu, D. (2023) Rewoo: Decoupling reasoning from observations for efficient augmented language models URL https://www.semanticscholar.org/paper/90027ca7802645671a69b00b65e1fa94e6b63544.

[16] Ou, T., Xu, F.F., Madaan, A., Liu, J., Lo, R., Sridhar, A., Sengupta, S. et al. (2024) Synatra: Turning indirect knowledge into direct demonstrations for digital agents at scale URL https://www.semanticscholar.org/paper/3b10d18fbca2b59e7beb3038c658926d2e384879.

[17] Koh, J.Y., Lo, R., Jang, L., Duvvur, V., Lim, M.C., Huang, P.Y., Neubig, G. et al. (2024) Visualwebarena: Evaluating multimodal agents on realistic visual web tasks. ArXiv abs/2401.13649: null. doi:10.48550/arXiv.2401.13649, URL https://www.semanticscholar.org/paper/f554b22d2ccf786a6d61d5858f43024ba9115e15.

[18] Cheng, K., Sun, Q., Chu, Y., Xu, F., Li, Y., Zhang, J. and Wu, Z. (2024) Seeclick: Harnessing gui grounding for advanced visual gui agents doi:10.48550/arXiv.2401.10935, URL https://www.semanticscholar.org/paper/f9b39a6a7e40986b46f7796f3a805d70d7e3931a.

[19] Zheng, B., Gou, B., Kil, J., Sun, H. and Su, Y. (2024) Gpt-4v(ision) is a generalist web agent, if grounded. ArXiv abs/2401.01614: null. doi:10.48550/arXiv.2401.01614, URL https://www.semanticscholar.org/paper/c844694387a89a477e7a8bbf918171cdc3b85672.

[20] Wu, Z., Han, C., Ding, Z., Weng, Z., Liu, Z., Yao, S., Yu, T. et al. (2024) Os-copilot: Towards generalist computer agents with self-improvement. ArXiv abs/2402.07456: null. doi:10.48550/arXiv.2402.07456, URL https://www.semanticscholar.org/paper/7ca30ad71a113ab12a0089824d8bf9d0b4e623ae.

[21] Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y. et al. (2023) Agentbench: Evaluating llms as agents. ArXiv abs/2308.03688: null. doi:10.48550/arXiv.2308.03688, URL https://www.semanticscholar.org/paper/aed8f01122d1a89c43900e995c80bfda7936568e.

[22] Gur, I., Furuta, H., Huang, A., Safdari, M., Matsuo, Y., Eck, D. and Faust, A. (2023) A realworld webagent with planning, long context

understanding, and program synthesis. ArXiv abs/2307.12856: null. doi:10.48550/arXiv.2307.12856, URL https://www.semanticscholar.org/paper/c648fad0b5dea9ccdec956cce1294fd574d35eb2.

[23] Kim, G., Baldi, P. and McAleer, S. (2023) Multimodal web navigation with instructionfinetuned foundation models. ArXiv abs/2303.17491: null. doi:10.48550/arXiv.2303.17491, URL https://www.semanticscholar.org/paper/9a75e23639bfcc3a51da57a3b682a984d1d8ac0b.

[24] Baldi, P., McAleer, S. and Kim, G. (2023) Language models can solve computer tasks. ArXiv abs/2303.17491: null. doi:10.48550/arXiv.2303.17491, URL https://www.semanticscholar.org/paper/8147cec9245d34d13732a08e915c920a1a499bb5.

[25] Mughal, A.H., Chadalavada, P., Munir, A., Dutta, A. and Qureshi, M.A. (2022) Design of deep neural networks for transfer time prediction of spacecraft electric orbit-raising. Intelligent Systems with Applications 15: 200092. doi:https://doi.org/10.1016/j.iswa.2022.200092, URL

https://www.sciencedirect.com/science/article/pii/S2667305322000321.

[26] Mughal, A.H. (2024), Advancing bdd software testing: Dynamic scenario re-usability and step auto-complete for cucumber framework. URL https://arxiv.org/abs/2402.15928. 2402.15928.

[27] Pan, J., Zhang, Y., Tomlin, N., Zhou, Y., Levine, S. and Suhr, A. (2024) Autonomous evaluation and refinement of digital agents. ArXiv abs/2404.06474: null. doi:10.48550/arXiv.2404.06474, URL https://www.semanticscholar.org/paper/aed8f01122d1a89c43900e995c80bfda7936568e.

[28] Wang, X., Chen, Y., Yuan, L., Zhang, Y., Li, Y., Peng, H. and Ji, H. (2024) Executable code actions elicit better llm agents. ArXiv abs/2402.01030: null. doi:10.48550/arXiv.2402.01030, URL https://www.semanticscholar.org/paper/fa96417f8766568ba570088513940bbf14e3b356.

Downloads

Published

22-07-2025

How to Cite

[1]
A. H. Mughal, “An Autonomous RL Agent Methodology for Dynamic Web UI Testing in a BDD Framework”, EAI Endorsed Trans AI Robotics, vol. 4, Jul. 2025.