AutoGUI interaction (#36)

After trying out the devtools protocols in BugHog, the two simple implementations I had prepared worked only for Chromium 70+ and Firefox 127+. Even though there were some other protocols implemented in previous browser versions, it would require a lot of effort to implement all of these and correctly match them to the supported browser versions. Further, there would likely be no way to target versions approx. 20-50. As a result, I decided to use PyAutoGUI as you suggested. It works in all browsers and versions and for practical usage it seems almost as good as the devtools protocols. Instead of `url_queue.txt`, users can choose to provide the experiment configuration in `interaction_script.cmd`. These are the commands implemented so far: - `NAVIGATE url` Terminates the previous browser window and opens a new browser window on the specified URL. Further, it waits some time (1 sec for Chromium, 2 secs for Firefox) for the page to load. - `CLICK_POSITION x y` Clicks on specific coordinates on the screen (not necessarily the browser window), The argument values can be absolute in pixels, percentage of the screen, or a combination of both - e.g., `CLICK_POSITION 100 50%` clicks 100px from the left screen border and 50% of the screen height. - `CLICK element_id` Clicks on an element with the specified ID. Currently, the ID can be one of `one`, `two`, `three`, `four`, `five`, `six`. This is because PyAutoGUI can search for the location of a visual match on the screen. Therefore, I prepared styles in `res/bughog.css` that style elements with these IDs to boxes of distinct colors. This allows us to bypass the limitation of having to know the exact screen coordinates of an element. - `WRITE text` Types the text into the focused element. - `PRESS key` - `HOLD key` - `RELEASE key` - `HOTKEY key1 key2 ...` A combination of `HOLD` and `RELEASE` for multiple keys. E.g., `HOTKEY ctrl c`. - `SLEEP seconds` Usually should not be necessary to use because navigation implicitly includes sleeping. - `SCREENSHOT file_name` Captures the screen and stores the result in `logs/screenshots/{PROJECT}-{EXPERIMENT}-{file_name}-{BROWSER}-{VERSION}.jpg`. Very useful for debugging A simple experiment can be found in `Support/AutoGUI`. It got successfully reproduced in all versions of both browsers. We can possibly implement some browser-specific behaviour as well, e.g., bookmarking a string where the script would include only `BOOKMARK text` and based on the browser version, the correct shortcut would be pressed and screen positions clicked. Besides this, I made the following changes: - Extracted the default file templates to separate files and added templates for all file types - Implemented adding and modifying `interaction_script.cmd` and `url_queue.txt` from the web UI - Implemented a custom highlighting mode for `interaction_script.cmd` in the experiment editor - Fixed loading resources from `/res/`
DistriNet · Nov 13, 2024 · 2d1ad6d · 2d1ad6d
2 parents e7cfbd9 + 792447e
commit 2d1ad6d
Show file tree

Hide file tree

Showing 39 changed files with 667 additions and 86 deletions.
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -24,7 +24,10 @@
 				"Vue.volar"
 			]
 		}
-	}
+	},
+
+	// Install pip requirements
+	"postCreateCommand": "pip install -r requirements.txt"
 
 	// Features to add to the dev container. More info: https://containers.dev/features.
 	// "features": {},
@@ -35,9 +38,6 @@
 	// Uncomment the next line if you want to keep your containers running after VS Code shuts down.
 	// "shutdownAction": "none",
 
-	// Uncomment the next line to run commands after the container is created.
-	// "postCreateCommand": "cat /etc/os-release",
-
 	// Configure tool-specific properties.
 	// "customizations": {},
 }
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,17 @@ nginx/ssl/keys/*
 !**/.gitkeep
 **/node_modules
 **/junit.xml
+
+# Screenshots
+logs/screenshots/*
+!logs/screenshots/.gitkeep
+
+# Fish shell
+$HOME
+
+# JetBrains IDEs
+.idea
+
 # Created by https://www.toptal.com/developers/gitignore/api/intellij,python,flask,macos
 # Edit at https://www.toptal.com/developers/gitignore?templates=intellij,python,flask,macos
 

diff --git a/Dockerfile b/Dockerfile
@@ -54,7 +54,11 @@ RUN cp /app/scripts/daemon/xvfb /etc/init.d/xvfb
 # Install python packages
 COPY requirements.txt /app/requirements.txt
 RUN pip install --user -r /app/requirements.txt
+RUN apt-get install python3-tk python3-xlib gnome-screenshot -y
 
+# Initiate PyAutoGUI
+RUN touch /root/.Xauthority && \
+    xauth add ${HOST}:0 . $(xxd -l 16 -p /dev/urandom)
 
 FROM base AS core
 # Copy rest of source code

diff --git a/bci/browser/automation/terminal.py b/bci/browser/automation/terminal.py
@@ -7,31 +7,35 @@
 
 
 class TerminalAutomation:
-
     @staticmethod
-    def run(url: str, args: list[str], seconds_per_visit: int):
-        logger.debug("Starting browser process...")
+    def visit_url(url: str, args: list[str], seconds_per_visit: int):
         args.append(url)
+        proc = TerminalAutomation.open_browser(args)
+        logger.debug(f'Visiting the page for {seconds_per_visit}s')
+        time.sleep(seconds_per_visit)
+        TerminalAutomation.terminate_browser(proc, args)
+
+    @staticmethod
+    def open_browser(args: list[str]) -> subprocess.Popen:
+        logger.debug('Starting browser process...')
         logger.debug(f'Command string: \'{" ".join(args)}\'')
-        with open('/tmp/browser.log', 'a') as file:
-            proc = subprocess.Popen(
-                args,
-                stdout=file,
-                stderr=file
-            )
+        with open('/tmp/browser.log', 'a+') as file:
+            proc = subprocess.Popen(args, stdout=file, stderr=file)
+            return proc
 
-        time.sleep(seconds_per_visit)
+    @staticmethod
+    def terminate_browser(proc: subprocess.Popen, args: list[str]) -> None:
+        logger.debug('Terminating browser process using SIGINT...')
 
-        logger.debug(f'Terminating browser process after {seconds_per_visit}s using SIGINT...')
         # Use SIGINT and SIGTERM to end process such that cookies remain saved.
         proc.send_signal(signal.SIGINT)
         proc.send_signal(signal.SIGTERM)
 
         try:
             stdout, stderr = proc.communicate(timeout=5)
         except subprocess.TimeoutExpired:
-            logger.info("Browser process did not terminate after 5s. Killing process through pkill...")
+            logger.info('Browser process did not terminate after 5s. Killing process through pkill...')
             subprocess.run(['pkill', '-2', args[0].split('/')[-1]])
 
         proc.wait()
-        logger.debug("Browser process terminated.")
+        logger.debug('Browser process terminated.')
diff --git a/bci/browser/configuration/browser.py b/bci/browser/configuration/browser.py
@@ -1,6 +1,7 @@
 from __future__ import annotations
 
 import os
+import subprocess
 from abc import abstractmethod
 
 import bci.browser.binary.factory as binary_factory
@@ -15,9 +16,13 @@
 
 
 class Browser:
+    process: subprocess.Popen | None
 
-    def __init__(self, browser_config: BrowserConfiguration, eval_config: EvaluationConfiguration, binary: Binary) -> None:
+    def __init__(
+        self, browser_config: BrowserConfiguration, eval_config: EvaluationConfiguration, binary: Binary
+    ) -> None:
         self.browser_config = browser_config
+        self.process = None
         self.eval_config = eval_config
         self.binary = binary
         self.state = binary.state
@@ -34,10 +39,22 @@ def visit(self, url: str):
         match self.eval_config.automation:
             case 'terminal':
                 args = self._get_terminal_args()
-                TerminalAutomation.run(url, args, self.eval_config.seconds_per_visit)
+                TerminalAutomation.visit_url(url, args, self.eval_config.seconds_per_visit)
             case _:
                 raise AttributeError('Not implemented')
 
+    def open(self, url: str) -> None:
+        args = self._get_terminal_args()
+        args.append(url)
+        self.process = TerminalAutomation.open_browser(args)
+
+    def terminate(self):
+        if self.process is None:
+            return
+
+        TerminalAutomation.terminate_browser(self.process, self._get_terminal_args())
+        self.process = None
+
     def pre_evaluation_setup(self):
         self.__fetch_binary()
 
@@ -80,11 +97,17 @@ def _get_executable_file_path(self) -> str:
         return os.path.join(self.__get_execution_folder_path(), self.binary.executable_name)
 
     @abstractmethod
-    def _get_terminal_args(self):
+    def _get_terminal_args(self) -> list[str]:
+        pass
+
+    @abstractmethod
+    def get_navigation_sleep_duration(self) -> int:
         pass
 
     @staticmethod
-    def get_browser(browser_config: BrowserConfiguration, eval_config: EvaluationConfiguration, state: State) -> Browser:
+    def get_browser(
+        browser_config: BrowserConfiguration, eval_config: EvaluationConfiguration, state: State
+    ) -> Browser:
         from bci.browser.configuration.chromium import Chromium
         from bci.browser.configuration.firefox import Firefox
 

diff --git a/bci/browser/configuration/chromium.py b/bci/browser/configuration/chromium.py
@@ -32,6 +32,9 @@
 
 class Chromium(Browser):
 
+    def get_navigation_sleep_duration(self) -> int:
+        return 1
+
     def _get_terminal_args(self) -> list[str]:
         assert self._profile_path is not None
 

diff --git a/bci/browser/configuration/firefox.py b/bci/browser/configuration/firefox.py
@@ -2,10 +2,9 @@
 
 from bci import cli
 from bci.browser.configuration.browser import Browser
-from bci.browser.configuration.options import Default, BlockThirdPartyCookies, PrivateBrowsing, TrackingProtection
+from bci.browser.configuration.options import BlockThirdPartyCookies, Default, PrivateBrowsing, TrackingProtection
 from bci.browser.configuration.profile import prepare_firefox_profile
 
-
 SUPPORTED_OPTIONS = [
     Default(),
     BlockThirdPartyCookies(),
@@ -21,11 +20,15 @@
 
 class Firefox(Browser):
 
+    def get_navigation_sleep_duration(self) -> int:
+        return 2
+
     def _get_terminal_args(self) -> list[str]:
         assert self._profile_path is not None
 
         args = [self._get_executable_file_path()]
         args.extend(['-profile', self._profile_path])
+        args.append('-setDefaultBrowser')
         user_prefs = []
 
         def add_user_pref(key: str, value: str | int | bool):
@@ -45,6 +48,7 @@ def add_user_pref(key: str, value: str | int | bool):
         # add_user_pref('network.proxy.type', 1)
 
         add_user_pref('app.update.enabled', False)
+        add_user_pref('browser.shell.checkDefaultBrowser', False)
         if 'default' in self.browser_config.browser_setting:
             pass
         elif 'btpc' in self.browser_config.browser_setting:

diff --git a/bci/browser/interaction/__init__.py b/bci/browser/interaction/__init__.py
diff --git a/bci/browser/interaction/elements/five.png b/bci/browser/interaction/elements/five.png
diff --git a/bci/browser/interaction/elements/four.png b/bci/browser/interaction/elements/four.png
diff --git a/bci/browser/interaction/elements/one.png b/bci/browser/interaction/elements/one.png
diff --git a/bci/browser/interaction/elements/six.png b/bci/browser/interaction/elements/six.png
diff --git a/bci/browser/interaction/elements/three.png b/bci/browser/interaction/elements/three.png
diff --git a/bci/browser/interaction/elements/two.png b/bci/browser/interaction/elements/two.png
diff --git a/bci/browser/interaction/interaction.py b/bci/browser/interaction/interaction.py
@@ -0,0 +1,57 @@
+import logging
+from inspect import signature
+
+from bci.browser.configuration.browser import Browser as BrowserConfig
+from bci.browser.interaction.simulation import Simulation
+from bci.evaluations.logic import TestParameters
+
+logger = logging.getLogger(__name__)
+
+
+class Interaction:
+    browser: BrowserConfig
+    script: list[str]
+    params: TestParameters
+
+    def __init__(self, browser: BrowserConfig, script: list[str], params: TestParameters) -> None:
+        self.browser = browser
+        self.script = script
+        self.params = params
+
+    def execute(self) -> None:
+        simulation = Simulation(self.browser, self.params)
+
+        if self._interpret(simulation):
+            simulation.sleep(str(self.browser.get_navigation_sleep_duration()))
+            simulation.navigate('https://a.test/report/?bughog_sanity_check=OK')
+
+    def _interpret(self, simulation: Simulation) -> bool:
+        for statement in self.script:
+            if statement.strip() == '' or statement[0] == '#':
+                continue
+
+            cmd, *args = statement.split()
+            method_name = cmd.lower()
+
+            if method_name not in Simulation.public_methods:
+                raise Exception(
+                    f'Invalid command `{cmd}`. Expected one of {", ".join(map(lambda m: m.upper(), Simulation.public_methods))}.'
+                )
+
+            method = getattr(simulation, method_name)
+            method_params = list(signature(method).parameters.values())
+
+            # Allow different number of arguments only for variable argument number (*)
+            if len(method_params) != len(args) and (len(method_params) < 1 or str(method_params[0])[0] != '*'):
+                raise Exception(
+                    f'Invalid number of arguments for command `{cmd}`. Expected {len(method_params)}, got {len(args)}.'
+                )
+
+            logger.debug(f'Executing interaction method `{method_name}` with the arguments {args}')
+
+            try:
+                method(*args)
+            except:
+                return False
+
+        return True
diff --git a/bci/browser/interaction/simulation.py b/bci/browser/interaction/simulation.py
@@ -0,0 +1,85 @@
+import os
+from time import sleep
+
+import pyautogui as gui
+import Xlib.display
+from pyvirtualdisplay.display import Display
+
+from bci.browser.configuration.browser import Browser as BrowserConfig
+from bci.evaluations.logic import TestParameters
+
+
+class Simulation:
+    browser_config: BrowserConfig
+    params: TestParameters
+
+    public_methods: list[str] = [
+        'navigate',
+        'click_position',
+        'click',
+        'write',
+        'press',
+        'hold',
+        'release',
+        'hotkey',
+        'sleep',
+        'screenshot',
+    ]
+
+    def __init__(self, browser_config: BrowserConfig, params: TestParameters):
+        self.browser_config = browser_config
+        self.params = params
+        disp = Display(visible=True, size=(1920, 1080), backend='xvfb', use_xauth=True)
+        disp.start()
+        gui._pyautogui_x11._display = Xlib.display.Display(os.environ['DISPLAY'])
+
+    def __del__(self):
+        self.browser_config.terminate()
+
+    def parse_position(self, position: str, max_value: int) -> int:
+        # Screen percentage
+        if position[-1] == '%':
+            return round(max_value * (int(position[:-1]) / 100))
+
+        # Absolute value in pixels
+        return int(position)
+
+    # --- PUBLIC METHODS ---
+    def navigate(self, url: str):
+        self.browser_config.terminate()
+        self.browser_config.open(url)
+        self.sleep(str(self.browser_config.get_navigation_sleep_duration()))
+
+    def click_position(self, x: str, y: str):
+        max_x, max_y = gui.size()
+
+        gui.moveTo(self.parse_position(x, max_x), self.parse_position(y, max_y))
+        gui.click()
+
+    def click(self, el_id: str):
+        el_image_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), f'elements/{el_id}.png')
+        x, y = gui.locateCenterOnScreen(el_image_path)
+        self.click_position(str(x), str(y))
+
+    def write(self, text: str):
+        gui.write(text, interval=0.1)
+
+    def press(self, key: str):
+        gui.press(key)
+
+    def hold(self, key: str):
+        gui.keyDown(key)
+
+    def release(self, key: str):
+        gui.keyUp(key)
+
+    def hotkey(self, *keys: str):
+        gui.hotkey(*keys)
+
+    def sleep(self, duration: str):
+        sleep(float(duration))
+
+    def screenshot(self, filename: str):
+        filename = f'{self.params.evaluation_configuration.project}-{self.params.mech_group}-{filename}-{type(self.browser_config).__name__}-{self.browser_config.version}.jpg'
+        filepath = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../../../logs/screenshots', filename)
+        gui.screenshot(filepath)