Wykorzystanie sztucznej inteligencji, w tym modeli językowych (LLM), w testowaniu aplikacji mobilnych

Podwysocki, Jan

Simple view

Full metadata view

Authors

Statistics

Wykorzystanie sztucznej inteligencji, w tym modeli językowych (LLM), w testowaniu aplikacji mobilnych

master

Alternative title

The Use of Artificial Intelligence, Including Large Language Models (LLM), in Mobile Application Testing

Author

Podwysocki Jan

Reviewer

Misztal Krzysztof

Roman Adam

Advisor

Misztal Krzysztof

Date of defence

2025-10-16

Keywords in Polish

testowanie aplikacji mobilnych, sztuczna inteligencja, duże modele językowe, LLM, automatyzacja testów, interfejs użytkownika, detekcja elementów UI, testowanie black-box, agenty AI, modele multimodalne, testowanie funkcjonalne, wykrywanie anomalii, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, testowanie regresyjne, deterministyczność testów, scenariusze testowe, zrzuty ekranu, OpenAI

Keywords in English

mobile application testing, artificial intelligence, large language models, LLM, test automation, user interface, UI element detection, black-box testing, AI agents, multimodal models, functional testing, anomaly detection, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, regression testing, test determinism, test scenarios, screenshots, OpenAI

Language

Polish

Abstract in Polish

Celem niniejszej pracy było zaprojektowanie, implementacja i ewaluacja inteligentnego agenta do testowania aplikacji mobilnych, wykorzystującego duże modele językowe (LLM). Agent działa autonomicznie, eksplorując interfejs użytkownika, realizując scenariusze testowe oraz wykrywając anomalie i błędy bez dostępu do kodu źródłowego aplikacji. Kluczowym założeniem jest brak dostępu do kodu źródłowego oraz danych wewnętrznych testowanej aplikacji. Agent operuje wyłącznie na podstawie interakcji z graficznym interfejsem użytkownika, wykorzystując modele multimodalne (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro oraz Qwen-VL-Max) do analizy zrzutów ekranu i podejmowania decyzji testowych.System opiera się na dwufazowym modelu działania. Faza Learn służy do wygenerowania i zapisania ścieżki realizacji dla danego scenariusza testowego. Istnieje możliwość zapisania wielu alternatywnych ścieżek dla każdego scenariusza w strukturze drzewiastej. Faza Run polega na odtwarzaniu zapisanych kroków, gdzie rola modelu LLM ogranicza się do wykrywania odchyleń i anomalii oraz zgodności z oczekiwanym rezultatem. System posiada mechanizm adaptacji do zmian w interfejsie użytkownika, który umożliwia douczenie się nowych ścieżek w przypadku napotkania sytuacji niezgodnej z zapamiętaną ścieżką, takich jak pojawiające się okna dialogowe, popupy informacyjne czy inny stan interfejsu.Architektura systemu obejmuje pięć głównych komponentów: DeviceController (komunikacja z urządzeniem), ElementsController (detekcja elementów UI z trzema implementacjami: XmlDumpController, UiedController, OmniparserController), NextActionController (podejmowanie decyzji o akcjach), AnomaliesVerifier (wykrywanie dziewięciu klas błędów) oraz ResultVerifier (sprawdzanie zgodności wyniku z oczekiwaniami).Ewaluacja została przeprowadzona na pięciu aplikacjach mobilnych: Gmail, PlayNow, Viva Payments, TOK FM i Spotify. Badania ograniczono do testów nie zmieniających w sposób bezpośredni stanu aplikacji. Porównanie detektorów elementów UI (UIED vs Omniparser) wykazało zasadniczo bardzo zbliżone rezultaty, przy czym nieznacznie lepsze wyniki osiąga algorytm UIED, wykazujący nieco lepszą stabilność detekcji.W ramach ewaluacji przetestowano cztery modele językowe: GPT-4o, Qwen-VL-Max, Gemini oraz Claude. Modele Gemini i Claude zostały wykluczone z głównej ewaluacji ze względu na niestabilność w generowaniu ustrukturyzowanych odpowiedzi. Model GPT-4o osiągnął najlepsze wyniki: w fazie nauki średni F1-score 90,9% (Precision 94,2%, Recall 88,1%), w fazie ewaluacji F1-score 83,2% (Precision 98,0%, Recall 72,6%), w detekcji anomalii F1-score 70,9% (Precision 62,3%, Recall 82,5%). Model Qwen-VL-Max uzyskał średnio o 8,7 punktów procentowych gorsze wyniki w fazie nauki i o 4,4 punktów procentowych gorsze w detekcji anomalii. System osiąga najlepsze rezultaty w aplikacjach o stabilnym interfejsie, podczas gdy aplikacje z dynamicznymi elementami stanowią większe wyzwanie. W obecnym stanie rozwoju system może ułatwić pracę testerów i pozwala na zmniejszenie potrzeby testów manualnych o około 50%, głównie ze względu na ograniczoną skuteczność detekcji anomalii.Główne wnioski z badań wskazują, że modele LLM są w stanie skutecznie rozumieć i realizować scenariusze testowe niezależnie od poziomu szczegółowości opisu testu. Zaproponowany sposób zapisywania i odtwarzania ścieżek testowych wykazuje znaczną odporność na zmiany danych w aplikacjach. Jednak obszar sprawdzania anomalii wymaga zdecydowanej poprawy ze względu na stosunkowo niskie wyniki detekcji błędów i wysoką liczbę fałszywych alarmów.

Abstract in English

The aim of this work was to design, implement and evaluate an intelligent agent for testing mobile applications using large language models (LLM). The agent operates autonomously, exploring the user interface, executing defined test scenarios and detecting anomalies and errors without access to the application's source code. A key assumption of the project is the lack of access to source code or internal data of the tested application. The agent operates solely based on interaction with the graphical user interface, using multimodal models (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro and Qwen-VL-Max) for screenshot analysis and test decision making.The system is based on a two-phase operation model. The Learn phase serves to generate and save an execution path for a given test scenario. Multiple alternative paths can be saved for each scenario in a tree structure. The Run phase involves replaying saved steps, where the LLM model's role is limited to detecting deviations and anomalies as well as compliance with expected results. The system has an adaptation mechanism for changes in the user interface that enables learning new paths when encountering situations inconsistent with the memorized path, such as appearing dialog boxes, information popups or other interface states.The system architecture includes five main components: DeviceController (device communication), ElementsController (UI element detection with three implementations: XmlDumpController, UiedController, OmniparserController), NextActionController (action decision making), AnomaliesVerifier (detecting nine error classes) and ResultVerifier (checking result compliance with expectations).Evaluation was conducted on five mobile applications: Gmail, PlayNow, Viva Payments, TOK FM and Spotify. The study was limited to tests that do not directly change the application state. Comparison of UI element detectors (UIED vs Omniparser) showed essentially very similar results, with the UIED algorithm achieving slightly better results and showing slightly better detection stability.Four language models were tested in the evaluation: GPT-4o, Qwen-VL-Max, Gemini and Claude. Gemini and Claude models were excluded from the main evaluation due to instability in generating structured responses. The GPT-4o model achieved the best results: in the learning phase average F1-score 90.9% (Precision 94.2%, Recall 88.1%), in the evaluation phase F1-score 83.2% (Precision 98.0%, Recall 72.6%), in anomaly detection F1-score 70.9% (Precision 62.3%, Recall 82.5%). The Qwen-VL-Max model achieved on average 8.7 percentage points worse results in the learning phase and 4.4 percentage points worse in anomaly detection. The system achieves the best results in applications with stable interfaces, while applications with dynamic elements pose a greater challenge. In its current state of development, the system can facilitate testers' work and allows for reducing the need for manual testing by about 50%, mainly due to limited anomaly detection effectiveness.The main conclusions from the research indicate that LLM models are able to effectively understand and execute test scenarios regardless of the level of detail of the test description. The proposed method of saving and replaying test paths shows significant resistance to data changes in applications. However, the anomaly checking area requires significant improvement due to relatively low error detection results and high number of false alarms.

dc.abstract.en	The aim of this work was to design, implement and evaluate an intelligent agent for testing mobile applications using large language models (LLM). The agent operates autonomously, exploring the user interface, executing defined test scenarios and detecting anomalies and errors without access to the application's source code. A key assumption of the project is the lack of access to source code or internal data of the tested application. The agent operates solely based on interaction with the graphical user interface, using multimodal models (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro and Qwen-VL-Max) for screenshot analysis and test decision making.The system is based on a two-phase operation model. The Learn phase serves to generate and save an execution path for a given test scenario. Multiple alternative paths can be saved for each scenario in a tree structure. The Run phase involves replaying saved steps, where the LLM model's role is limited to detecting deviations and anomalies as well as compliance with expected results. The system has an adaptation mechanism for changes in the user interface that enables learning new paths when encountering situations inconsistent with the memorized path, such as appearing dialog boxes, information popups or other interface states.The system architecture includes five main components: DeviceController (device communication), ElementsController (UI element detection with three implementations: XmlDumpController, UiedController, OmniparserController), NextActionController (action decision making), AnomaliesVerifier (detecting nine error classes) and ResultVerifier (checking result compliance with expectations).Evaluation was conducted on five mobile applications: Gmail, PlayNow, Viva Payments, TOK FM and Spotify. The study was limited to tests that do not directly change the application state. Comparison of UI element detectors (UIED vs Omniparser) showed essentially very similar results, with the UIED algorithm achieving slightly better results and showing slightly better detection stability.Four language models were tested in the evaluation: GPT-4o, Qwen-VL-Max, Gemini and Claude. Gemini and Claude models were excluded from the main evaluation due to instability in generating structured responses. The GPT-4o model achieved the best results: in the learning phase average F1-score 90.9% (Precision 94.2%, Recall 88.1%), in the evaluation phase F1-score 83.2% (Precision 98.0%, Recall 72.6%), in anomaly detection F1-score 70.9% (Precision 62.3%, Recall 82.5%). The Qwen-VL-Max model achieved on average 8.7 percentage points worse results in the learning phase and 4.4 percentage points worse in anomaly detection. The system achieves the best results in applications with stable interfaces, while applications with dynamic elements pose a greater challenge. In its current state of development, the system can facilitate testers' work and allows for reducing the need for manual testing by about 50%, mainly due to limited anomaly detection effectiveness.The main conclusions from the research indicate that LLM models are able to effectively understand and execute test scenarios regardless of the level of detail of the test description. The proposed method of saving and replaying test paths shows significant resistance to data changes in applications. However, the anomaly checking area requires significant improvement due to relatively low error detection results and high number of false alarms.	pl
dc.abstract.pl	Celem niniejszej pracy było zaprojektowanie, implementacja i ewaluacja inteligentnego agenta do testowania aplikacji mobilnych, wykorzystującego duże modele językowe (LLM). Agent działa autonomicznie, eksplorując interfejs użytkownika, realizując scenariusze testowe oraz wykrywając anomalie i błędy bez dostępu do kodu źródłowego aplikacji. Kluczowym założeniem jest brak dostępu do kodu źródłowego oraz danych wewnętrznych testowanej aplikacji. Agent operuje wyłącznie na podstawie interakcji z graficznym interfejsem użytkownika, wykorzystując modele multimodalne (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro oraz Qwen-VL-Max) do analizy zrzutów ekranu i podejmowania decyzji testowych.System opiera się na dwufazowym modelu działania. Faza Learn służy do wygenerowania i zapisania ścieżki realizacji dla danego scenariusza testowego. Istnieje możliwość zapisania wielu alternatywnych ścieżek dla każdego scenariusza w strukturze drzewiastej. Faza Run polega na odtwarzaniu zapisanych kroków, gdzie rola modelu LLM ogranicza się do wykrywania odchyleń i anomalii oraz zgodności z oczekiwanym rezultatem. System posiada mechanizm adaptacji do zmian w interfejsie użytkownika, który umożliwia douczenie się nowych ścieżek w przypadku napotkania sytuacji niezgodnej z zapamiętaną ścieżką, takich jak pojawiające się okna dialogowe, popupy informacyjne czy inny stan interfejsu.Architektura systemu obejmuje pięć głównych komponentów: DeviceController (komunikacja z urządzeniem), ElementsController (detekcja elementów UI z trzema implementacjami: XmlDumpController, UiedController, OmniparserController), NextActionController (podejmowanie decyzji o akcjach), AnomaliesVerifier (wykrywanie dziewięciu klas błędów) oraz ResultVerifier (sprawdzanie zgodności wyniku z oczekiwaniami).Ewaluacja została przeprowadzona na pięciu aplikacjach mobilnych: Gmail, PlayNow, Viva Payments, TOK FM i Spotify. Badania ograniczono do testów nie zmieniających w sposób bezpośredni stanu aplikacji. Porównanie detektorów elementów UI (UIED vs Omniparser) wykazało zasadniczo bardzo zbliżone rezultaty, przy czym nieznacznie lepsze wyniki osiąga algorytm UIED, wykazujący nieco lepszą stabilność detekcji.W ramach ewaluacji przetestowano cztery modele językowe: GPT-4o, Qwen-VL-Max, Gemini oraz Claude. Modele Gemini i Claude zostały wykluczone z głównej ewaluacji ze względu na niestabilność w generowaniu ustrukturyzowanych odpowiedzi. Model GPT-4o osiągnął najlepsze wyniki: w fazie nauki średni F1-score 90,9% (Precision 94,2%, Recall 88,1%), w fazie ewaluacji F1-score 83,2% (Precision 98,0%, Recall 72,6%), w detekcji anomalii F1-score 70,9% (Precision 62,3%, Recall 82,5%). Model Qwen-VL-Max uzyskał średnio o 8,7 punktów procentowych gorsze wyniki w fazie nauki i o 4,4 punktów procentowych gorsze w detekcji anomalii. System osiąga najlepsze rezultaty w aplikacjach o stabilnym interfejsie, podczas gdy aplikacje z dynamicznymi elementami stanowią większe wyzwanie. W obecnym stanie rozwoju system może ułatwić pracę testerów i pozwala na zmniejszenie potrzeby testów manualnych o około 50%, głównie ze względu na ograniczoną skuteczność detekcji anomalii.Główne wnioski z badań wskazują, że modele LLM są w stanie skutecznie rozumieć i realizować scenariusze testowe niezależnie od poziomu szczegółowości opisu testu. Zaproponowany sposób zapisywania i odtwarzania ścieżek testowych wykazuje znaczną odporność na zmiany danych w aplikacjach. Jednak obszar sprawdzania anomalii wymaga zdecydowanej poprawy ze względu na stosunkowo niskie wyniki detekcji błędów i wysoką liczbę fałszywych alarmów.	pl
dc.affiliation	Wydział Matematyki i Informatyki	pl
dc.area	obszar nauk ścisłych	pl
dc.contributor.advisor	Misztal, Krzysztof - 104632	pl
dc.contributor.author	Podwysocki, Jan - USOS211451	pl
dc.contributor.departmentbycode	UJK/WMI2	pl
dc.contributor.reviewer	Misztal, Krzysztof - 104632	pl
dc.contributor.reviewer	Roman, Adam - 142015	pl
dc.date.accessioned	2025-10-23T22:30:53Z
dc.date.available	2025-10-23T22:30:53Z
dc.date.createdat	2025-10-23T22:30:53Z	en
dc.date.submitted	2025-10-16	pl
dc.fieldofstudy	informatyka	pl
dc.identifier.apd	diploma-158743-211451	pl
dc.identifier.uri	https://ruj.uj.edu.pl/handle/item/563693
dc.language	pol	pl
dc.source.integrator	false
dc.subject.en	mobile application testing, artificial intelligence, large language models, LLM, test automation, user interface, UI element detection, black-box testing, AI agents, multimodal models, functional testing, anomaly detection, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, regression testing, test determinism, test scenarios, screenshots, OpenAI	pl
dc.subject.pl	testowanie aplikacji mobilnych, sztuczna inteligencja, duże modele językowe, LLM, automatyzacja testów, interfejs użytkownika, detekcja elementów UI, testowanie black-box, agenty AI, modele multimodalne, testowanie funkcjonalne, wykrywanie anomalii, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, testowanie regresyjne, deterministyczność testów, scenariusze testowe, zrzuty ekranu, OpenAI	pl
dc.title	Wykorzystanie sztucznej inteligencji, w tym modeli językowych (LLM), w testowaniu aplikacji mobilnych	pl
dc.title.alternative	The Use of Artificial Intelligence, Including Large Language Models (LLM), in Mobile Application Testing	pl
dc.type	master	pl
dspace.entity.type	Publication

dc.abstract.enpl

The aim of this work was to design, implement and evaluate an intelligent agent for testing mobile applications using large language models (LLM). The agent operates autonomously, exploring the user interface, executing defined test scenarios and detecting anomalies and errors without access to the application's source code. A key assumption of the project is the lack of access to source code or internal data of the tested application. The agent operates solely based on interaction with the graphical user interface, using multimodal models (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro and Qwen-VL-Max) for screenshot analysis and test decision making.The system is based on a two-phase operation model. The Learn phase serves to generate and save an execution path for a given test scenario. Multiple alternative paths can be saved for each scenario in a tree structure. The Run phase involves replaying saved steps, where the LLM model's role is limited to detecting deviations and anomalies as well as compliance with expected results. The system has an adaptation mechanism for changes in the user interface that enables learning new paths when encountering situations inconsistent with the memorized path, such as appearing dialog boxes, information popups or other interface states.The system architecture includes five main components: DeviceController (device communication), ElementsController (UI element detection with three implementations: XmlDumpController, UiedController, OmniparserController), NextActionController (action decision making), AnomaliesVerifier (detecting nine error classes) and ResultVerifier (checking result compliance with expectations).Evaluation was conducted on five mobile applications: Gmail, PlayNow, Viva Payments, TOK FM and Spotify. The study was limited to tests that do not directly change the application state. Comparison of UI element detectors (UIED vs Omniparser) showed essentially very similar results, with the UIED algorithm achieving slightly better results and showing slightly better detection stability.Four language models were tested in the evaluation: GPT-4o, Qwen-VL-Max, Gemini and Claude. Gemini and Claude models were excluded from the main evaluation due to instability in generating structured responses. The GPT-4o model achieved the best results: in the learning phase average F1-score 90.9% (Precision 94.2%, Recall 88.1%), in the evaluation phase F1-score 83.2% (Precision 98.0%, Recall 72.6%), in anomaly detection F1-score 70.9% (Precision 62.3%, Recall 82.5%). The Qwen-VL-Max model achieved on average 8.7 percentage points worse results in the learning phase and 4.4 percentage points worse in anomaly detection. The system achieves the best results in applications with stable interfaces, while applications with dynamic elements pose a greater challenge. In its current state of development, the system can facilitate testers' work and allows for reducing the need for manual testing by about 50%, mainly due to limited anomaly detection effectiveness.The main conclusions from the research indicate that LLM models are able to effectively understand and execute test scenarios regardless of the level of detail of the test description. The proposed method of saving and replaying test paths shows significant resistance to data changes in applications. However, the anomaly checking area requires significant improvement due to relatively low error detection results and high number of false alarms.

dc.abstract.plpl

Celem niniejszej pracy było zaprojektowanie, implementacja i ewaluacja inteligentnego agenta do testowania aplikacji mobilnych, wykorzystującego duże modele językowe (LLM). Agent działa autonomicznie, eksplorując interfejs użytkownika, realizując scenariusze testowe oraz wykrywając anomalie i błędy bez dostępu do kodu źródłowego aplikacji. Kluczowym założeniem jest brak dostępu do kodu źródłowego oraz danych wewnętrznych testowanej aplikacji. Agent operuje wyłącznie na podstawie interakcji z graficznym interfejsem użytkownika, wykorzystując modele multimodalne (GPT-4o, Claude Opus 4.1, Gemini 2.5 Pro oraz Qwen-VL-Max) do analizy zrzutów ekranu i podejmowania decyzji testowych.System opiera się na dwufazowym modelu działania. Faza Learn służy do wygenerowania i zapisania ścieżki realizacji dla danego scenariusza testowego. Istnieje możliwość zapisania wielu alternatywnych ścieżek dla każdego scenariusza w strukturze drzewiastej. Faza Run polega na odtwarzaniu zapisanych kroków, gdzie rola modelu LLM ogranicza się do wykrywania odchyleń i anomalii oraz zgodności z oczekiwanym rezultatem. System posiada mechanizm adaptacji do zmian w interfejsie użytkownika, który umożliwia douczenie się nowych ścieżek w przypadku napotkania sytuacji niezgodnej z zapamiętaną ścieżką, takich jak pojawiające się okna dialogowe, popupy informacyjne czy inny stan interfejsu.Architektura systemu obejmuje pięć głównych komponentów: DeviceController (komunikacja z urządzeniem), ElementsController (detekcja elementów UI z trzema implementacjami: XmlDumpController, UiedController, OmniparserController), NextActionController (podejmowanie decyzji o akcjach), AnomaliesVerifier (wykrywanie dziewięciu klas błędów) oraz ResultVerifier (sprawdzanie zgodności wyniku z oczekiwaniami).Ewaluacja została przeprowadzona na pięciu aplikacjach mobilnych: Gmail, PlayNow, Viva Payments, TOK FM i Spotify. Badania ograniczono do testów nie zmieniających w sposób bezpośredni stanu aplikacji. Porównanie detektorów elementów UI (UIED vs Omniparser) wykazało zasadniczo bardzo zbliżone rezultaty, przy czym nieznacznie lepsze wyniki osiąga algorytm UIED, wykazujący nieco lepszą stabilność detekcji.W ramach ewaluacji przetestowano cztery modele językowe: GPT-4o, Qwen-VL-Max, Gemini oraz Claude. Modele Gemini i Claude zostały wykluczone z głównej ewaluacji ze względu na niestabilność w generowaniu ustrukturyzowanych odpowiedzi. Model GPT-4o osiągnął najlepsze wyniki: w fazie nauki średni F1-score 90,9% (Precision 94,2%, Recall 88,1%), w fazie ewaluacji F1-score 83,2% (Precision 98,0%, Recall 72,6%), w detekcji anomalii F1-score 70,9% (Precision 62,3%, Recall 82,5%). Model Qwen-VL-Max uzyskał średnio o 8,7 punktów procentowych gorsze wyniki w fazie nauki i o 4,4 punktów procentowych gorsze w detekcji anomalii. System osiąga najlepsze rezultaty w aplikacjach o stabilnym interfejsie, podczas gdy aplikacje z dynamicznymi elementami stanowią większe wyzwanie. W obecnym stanie rozwoju system może ułatwić pracę testerów i pozwala na zmniejszenie potrzeby testów manualnych o około 50%, głównie ze względu na ograniczoną skuteczność detekcji anomalii.Główne wnioski z badań wskazują, że modele LLM są w stanie skutecznie rozumieć i realizować scenariusze testowe niezależnie od poziomu szczegółowości opisu testu. Zaproponowany sposób zapisywania i odtwarzania ścieżek testowych wykazuje znaczną odporność na zmiany danych w aplikacjach. Jednak obszar sprawdzania anomalii wymaga zdecydowanej poprawy ze względu na stosunkowo niskie wyniki detekcji błędów i wysoką liczbę fałszywych alarmów.

dc.affiliationpl

Wydział Matematyki i Informatyki

dc.areapl

obszar nauk ścisłych

dc.contributor.advisorpl

Misztal, Krzysztof - 104632

dc.contributor.authorpl

Podwysocki, Jan - USOS211451

dc.contributor.departmentbycodepl

UJK/WMI2

dc.contributor.reviewerpl

Misztal, Krzysztof - 104632

dc.contributor.reviewerpl

Roman, Adam - 142015

dc.date.accessioned

2025-10-23T22:30:53Z

dc.date.available

2025-10-23T22:30:53Z

dc.date.createdaten

2025-10-23T22:30:53Z

dc.date.submittedpl

2025-10-16

dc.fieldofstudypl

informatyka

dc.identifier.apdpl

diploma-158743-211451

dc.identifier.uri

https://ruj.uj.edu.pl/handle/item/563693

dc.languagepl

pol

dc.source.integrator

false

dc.subject.enpl

mobile application testing, artificial intelligence, large language models, LLM, test automation, user interface, UI element detection, black-box testing, AI agents, multimodal models, functional testing, anomaly detection, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, regression testing, test determinism, test scenarios, screenshots, OpenAI

dc.subject.plpl

testowanie aplikacji mobilnych, sztuczna inteligencja, duże modele językowe, LLM, automatyzacja testów, interfejs użytkownika, detekcja elementów UI, testowanie black-box, agenty AI, modele multimodalne, testowanie funkcjonalne, wykrywanie anomalii, Android, GPT-4o, Qwen-VL-Max, Claude Opus, Gemini, UIED, Omniparser, precision, recall, F1-score, testowanie regresyjne, deterministyczność testów, scenariusze testowe, zrzuty ekranu, OpenAI

dc.titlepl

Wykorzystanie sztucznej inteligencji, w tym modeli językowych (LLM), w testowaniu aplikacji mobilnych

dc.title.alternativepl

The Use of Artificial Intelligence, Including Large Language Models (LLM), in Mobile Application Testing

dc.typepl

master

dspace.entity.type

Publication

Affiliations

No affiliation

Podwysocki, Jan

Misztal, Krzysztof

Roman, Adam

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

36 Views per month

Views per city

Warsaw

21

Lodz

2

Siemiatycze

2

Boston

1

Czechowice-Dziedzice

1

Gdansk

1

Krakow

1

Otwock

1

Perth

1

Piekary Śląskie

1

No access

Collections

Masters theses