Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Try to append batches web console lines for record #4922

Closed
1 of 4 tasks
ovaar opened this issue Aug 2, 2024 · 4 comments
Closed
1 of 4 tasks

[BUG]: Try to append batches web console lines for record #4922

ovaar opened this issue Aug 2, 2024 · 4 comments

Comments

@ovaar
Copy link

ovaar commented Aug 2, 2024

What happened?

While executing long-running python tasks tasks using PythonScript@0 or just bash, which invokes python, the agent seems to lose connection to the host, maybe because it didn't receive enough messages in the stdout/stderr to feed the watchdog, resulting in a broken connection that makes the build freeze?

    - task: PythonScript@0
          displayName: "Build"
          inputs:
            scriptSource: 'filePath'
            scriptPath: "$(Pipeline.Workspace)build.py"
            arguments: '--build_type=Release --rebuild --deploy'
            workingDirectory: "$(Pipeline.Workspace)"
            pythonInterpreter: 'python3'

    - bash: |
          set -euo pipefail
          python3 build.py --build_type=Release --rebuild --deploy
        displayName: "Build"

Running the (long running) build scripts as shown above, results in Python using buffered streams for stdout and stderr.
Setting the PYTHONUNBUFFERED resolved the issue.

echo "##vso[task.setvariable variable=PYTHONUNBUFFERED]1"

Versions

Azure Devops version 3.236.1 / Windows 10

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 10

Version controll system

No response

Relevant log output

[2024-07-02 12:34:00Z INFO JobServerQueue] Try to append 4 batches web console lines for record 'f77843cc-c3dc-5c4f-e956-bc0deeafe94e', success rate: 4/4.
[2024-07-02 12:34:01Z INFO HostContext] Setting up value for origin: Variables_Set_repoUrlWithCred
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Exited process 9932 with exit code 0
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Finished process 9932 with exit code 0, and elapsed time 00:00:02.2383436.
[2024-07-02 12:34:01Z INFO WorkerCommandManager] Disable plugin internal command extension.
[2024-07-02 12:34:01Z INFO StepsRunner] Step result: 
[2024-07-02 12:34:01Z INFO StepsRunner] No need for updating job result with current step result 'Succeeded'.
[2024-07-02 12:34:01Z INFO StepsRunner] Current state: job state = ''
[2024-07-02 12:34:01Z INFO StepsRunner] Processing step: DisplayName='Run Unit Tests', ContinueOnError=False, Enabled=True
[2024-07-02 12:34:01Z INFO ExpressionManager] Evaluating: SucceededNode()
[2024-07-02 12:34:01Z INFO ExpressionManager] Result: True
[2024-07-02 12:34:01Z INFO StepsRunner] Starting the step.
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO StepsRunner] Which: 'chcp'
[2024-07-02 12:34:01Z INFO StepsRunner] Location: 'C:\Windows\system32\chcp.COM'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Starting process:
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   File name: 'C:\Windows\system32\chcp.COM'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Arguments: '65001'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Working directory: 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Persist current code page: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   ContinueAfterCancelProcessTreeKillAttempt: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Process started with process id 6304, waiting for process exit.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Exited process 6304 with exit code 0
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Finished process 6304 with exit code 0, and elapsed time 00:00:00.0075292.
[2024-07-02 12:34:01Z INFO StepsRunner] Successfully returned to code page 65001 (UTF8)
[2024-07-02 12:34:01Z INFO TaskRunner] Allow publishing telemetry for [email protected] task: True
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Tasks': 'E:\agent_pool\_work\_tasks'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'TaskZips': 'E:\agent_pool\_work\_taskzips'
[2024-07-02 12:34:01Z INFO TaskManager] Loading task definition 'E:\agent_pool\_work\_tasks\PythonScript_6392f95f-7e76-4a18-b3c7-7f078d2f7700\0.237.1\task.json'.
[2024-07-02 12:34:01Z INFO TaskRunner] Get handler data for target platform Windows
[2024-07-02 12:34:01Z INFO TaskRunner] Handler data is of type Microsoft.VisualStudio.Services.Agent.Worker.Node20_1HandlerData
[2024-07-02 12:34:01Z INFO TaskRunner] The original input is a rooted path, return absolute path: E:\agent_pool\_work\7\bimcollab-zoom\Build\test.py
[2024-07-02 12:34:01Z INFO TaskRunner] The original input is a rooted path, return absolute path: E:\agent_pool\_work\7\bimcollab-zoom\Build\
[2024-07-02 12:34:01Z INFO NodeHandler] Handler.AfterExecutionContextInitialized _continueAfterCancelProcessTreeKillAttempt = True
[2024-07-02 12:34:01Z INFO NodeHandler] AZP_AGENT_IGNORE_VSTSTASKLIB enabled, ignoring fix
[2024-07-02 12:34:01Z INFO NodeHandler] Task.json has node20_1 handler data: True node20ResultsInGlibCError = False
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Externals': 'E:\agent_pool\externals'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Externals': 'E:\agent_pool\externals'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Starting process:
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   File name: 'E:\agent_pool\externals\node20_1\bin\node.exe'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Arguments: '"E:\agent_pool\_work\_tasks\PythonScript_6392f95f-7e76-4a18-b3c7-7f078d2f7700\0.237.1\main.js"'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Working directory: 'E:\agent_pool\_work\7\s'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Require exit code zero: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Encoding web name: utf-8 ; code page: '65001'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Persist current code page: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   ContinueAfterCancelProcessTreeKillAttempt: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Process started with process id 8552, waiting for process exit.

*** Edited ***

Set the PYTHONUNBUFFERED as a workaround.

echo "##vso[task.setvariable variable=PYTHONUNBUFFERED]1"
@kirill-ivlev
Copy link
Contributor

Hi @ovaar,
thanks for reporting! We are working on more prioritized issues at the moment, but will get back to this one soon.

@aleksandrlevochkin
Copy link
Contributor

Hi @ovaar! I tried to reproduce this issue using a Python script that executes a long-running task, but did not have success both with the agent version v3.236.1 and the latest one. Could you please provide more details on what your build.py script looks like? Also, it would be great if you could provide the Worker file's log for a pipeline run with system.debug variable enabled

@aleksandrlevochkin
Copy link
Contributor

I am closing this issue for now. @ovaar if this issue is still relevant feel free to re-open

@ovaar
Copy link
Author

ovaar commented Sep 18, 2024

Thanks @aleksandrlevochkin for investigating.

Since I have a workaround the issue, for me, is not that pressing.
Though, as requested I would still like to share the build python script.
Finally, and unfortunately I am no longer able to provide the Worker's log file.

#!/usr/bin/env python3

import argparse
from contextlib import nullcontext
from enum import Enum
import platform
import shlex
import subprocess
from pathlib import Path
from typing import Final, List, Optional

try:
    from enum import StrEnum
except:
    raise Exception("Install python >= 3.11")


BUILD_PATH: Final[Path] = Path(__file__).parent / "Build"
WIN_PROJECT_FILE: Final[Path] = BUILD_PATH / "app.sln"


class BuildType(StrEnum):
    Debug = "Debug"
    Release = "Release"


def to_build_type(arg: str) -> BuildType:
    try:
        return BuildType[arg]
    except:
        raise argparse.ArgumentTypeError("Invalid argument build_type!")


class LoggingMode(Enum):
    NO_LOGGING = 0
    LOG_TO_TERMINAL = 1
    LOG_TO_FILE = 2


def run_command(
    command: str,
    logging_mode: LoggingMode = LoggingMode.LOG_TO_TERMINAL,
    throw_on_error=True,
    logfile: Optional[str] = None,
    encoding: str = "utf-8",
):
    """Invoke a command on a subprocess and redirect the outputs"""
    if logfile:
        log_dir: Path = Path(__file__).parent
        if not log_dir.exists():
            log_dir.mkdir(parents=True, exist_ok=True)

    print()
    print(f"$ {command}")

    stdout_lines: List[str] = []
    rc: int = 0

    try:
        process = subprocess.Popen(
            shlex.split(command),
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding=encoding,
        )

        with Path(logfile).open("a") if logfile else nullcontext() as f:
            while True:
                output = process.stdout.readline()
                if output == "" and process.poll() is not None:
                    break

                if output:
                    stdout = output.strip()
                    if logging_mode is LoggingMode.LOG_TO_FILE:
                        f.write(stdout + "\n")
                    if logging_mode is LoggingMode.LOG_TO_TERMINAL:
                        print(stdout)
                    stdout_lines.append(stdout)

        rc = process.poll()
        if rc and throw_on_error:
            raise subprocess.CalledProcessError(rc, cmd=command)
    except Exception as e:
        if throw_on_error:
            raise e
        rc = 1

    print()
    return stdout_lines, rc


def main(rebuild: bool, build_type: BuildType):
    if platform.system() == "Windows":
        from python_modules.pyvswhere import get_msbuild_path

        """Build using MSBuild"""
        msbuild_path = get_msbuild_path(
            version="17", product="Microsoft.VisualStudio.Product.BuildTools"
        )

        cmd: str = (
            f'"{msbuild_path}" "{WIN_PROJECT_FILE}" /property:Configuration={str(build_type)} /property:Platform=x64'
        )
        if rebuild:
            cmd += " -t:rebuild"

        output, rc = run_command(cmd)


parser = argparse.ArgumentParser()
parser.add_argument(
    "-b",
    "--build_type",
    type=to_build_type,
    help=f'Set the build type "{"|".join(list(BuildType._member_names_))}"',
    required=True,
)
parser.add_argument(
    "--rebuild",
    dest="rebuild",
    action="store_true",
    help="When set a rebuild is started.",
)
args = parser.parse_args()

build_type: BuildType = args.build_type

main(args.rebuild, build_type)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants