Skip to content

Commit ca305f0

Browse files
authored
Dump JVM output when bazel fails on Windows (#41844)
### What does this PR do? This is to diagnose [recurring](https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1166997067): ``` Starting local Bazel server (8.3.1) and connecting to it... ... still trying to connect to local Bazel server (45008) after 10 seconds ... ... still trying to connect to local Bazel server (45008) after 20 seconds ... ... still trying to connect to local Bazel server (45008) after 30 seconds ... ... still trying to connect to local Bazel server (45008) after 40 seconds ... ... still trying to connect to local Bazel server (45008) after 50 seconds ... ... still trying to connect to local Bazel server (45008) after 60 seconds ... ... still trying to connect to local Bazel server (45008) after 70 seconds ... ... still trying to connect to local Bazel server (45008) after 80 seconds ... ... still trying to connect to local Bazel server (45008) after 90 seconds ... ... still trying to connect to local Bazel server (45008) after 100 seconds ... ... still trying to connect to local Bazel server (45008) after 110 seconds ... FATAL: couldn't connect to server (45008) after 120 seconds. ``` ### Motivation The change hopefully provides more actionable material when it happens, see #incident-42947. ### Additional Notes There's no need to wait for 2 minutes prior to bailing out, so decreasing to 30 seconds will probably allow us to reproduce more frequently. Also, we'll print any running `bazel.exe` or `java.exe` before and after the `bazel` command to know whether there should be additional cleanup to consider adding.
1 parent 923fdba commit ca305f0

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

tools/bazel.bat

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,12 +42,16 @@ if exist "!BAZEL_REPO_CONTENTS_CACHE!" (
4242

4343
rem Pass CI-specific options through `.user.bazelrc` so any nested `bazel run` and next `bazel shutdown` also honor them
4444
(
45+
echo startup --connect_timeout_secs=30
4546
echo startup --output_user_root=!BAZEL_OUTPUT_USER_ROOT!
4647
echo common --config=cache
4748
echo common --repo_contents_cache=!ext_repo_contents_cache!
4849
echo build --disk_cache=!BAZEL_DISK_CACHE!
4950
) >"!CI_PROJECT_DIR!\user.bazelrc"
5051

52+
:: Diagnostics: print any stalled client/server before `bazel` execution
53+
>&2 powershell -NoProfile -Command "Get-Process bazel,java -ErrorAction SilentlyContinue | Select-Object 🟡,ProcessName,StartTime"
54+
5155
rem Payload: execute `bazel` and remember exit status
5256
"!BAZEL_REAL!" %*
5357
set "bazel_exit=!errorlevel!"
@@ -56,6 +60,13 @@ rem Stop `bazel` (if still running) to close files and proceed with cleanup
5660
>&2 "!BAZEL_REAL!" shutdown --ui_event_filters=-info
5761
>&2 del /f /q "!CI_PROJECT_DIR!\user.bazelrc"
5862

63+
:: Diagnostics: print any stalled client/server after `bazel` execution and dump JVM output on failure
64+
>&2 powershell -NoProfile -Command "Get-Process bazel,java -ErrorAction SilentlyContinue | Select-Object 🟡,ProcessName,StartTime"
65+
if !bazel_exit! neq 0 (
66+
>&2 echo 🟡 JVM output:
67+
>&2 type "!BAZEL_OUTPUT_USER_ROOT!\server\jvm.out"
68+
)
69+
5970
rem Reintegrate `--repo_contents_cache` to original directory
6071
if exist "!ext_repo_contents_cache!" (
6172
call :robomove "!ext_repo_contents_cache!" "!BAZEL_REPO_CONTENTS_CACHE!"

0 commit comments

Comments
 (0)