Skip to content

Conversation

@leodido
Copy link
Contributor

@leodido leodido commented Dec 2, 2025

Summary

Fixes vulnerability scanning crash when a dependency build fails in a parallel goroutine but the main build continues due to the build lock mechanism.

Fixes https://linear.app/ona-team/issue/CLC-2122/scan-vulnerabilities-of-packages-iff-their-build-status-is-successful

Problem

In gitpod-next PR #11869, the build fails with:

[api/go:lib] Package api/go:lib not found in local cache, cannot scan for vulnerabilities
build failed

Root Cause: Build Lock Race Condition

Leeway builds dependencies in parallel using goroutines. When multiple packages depend on the same package, a build lock prevents duplicate builds. However, this creates a race condition when a build fails:

The Scenario

Package A ──┐
            ├──> api/go:lib (shared dependency)
Package B ──┘

What Happens

  1. Goroutine A obtains lock for api/go:lib, starts building, build fails, releases lock, returns error
  2. Goroutine B waits on lock, wakes up when released, returns nil (success) without building
  3. Main Build continues if dependency chain goes through Goroutine B
  4. Vulnerability scanning runs and crashes because api/go:lib isn't in cache

Code Evidence

In pkg/leeway/build.go:

func (p *Package) build(buildctx *buildContext) (err error) {
    doBuild := buildctx.ObtainBuildLock(p)
    if !doBuild {
        // Another goroutine is already building this package
        return nil  // ← Returns success even if other goroutine failed!
    }
    // ... actual build ...
}

The Fix

Vulnerability scanning now:

  • Receives the pkgstatus map tracking build outcomes
  • Only scans packages with status PackageBuilt or PackageDownloaded
  • Skips packages that failed or weren't built
  • Logs which packages are skipped and why

This prevents the fatal error in vulnerability scanning while still allowing the build to fail if the dependency error propagates through the main dependency chain.

Changes

  • pkg/leeway/build.go: Pass pkgstatus to vulnerability scanning
  • pkg/leeway/sbom-scan.go: Filter packages by build status before scanning

Testing

  • ✅ Unit tests pass
  • ✅ Integration tests pass
  • ✅ Handles the specific scenario from gitpod-next PR #11869

Related

@leodido leodido self-assigned this Dec 2, 2025
@leodido leodido requested a review from geropl December 2, 2025 16:39
@leodido leodido force-pushed the ld/fix-vuln-scan-race branch from 123132f to af7bc1f Compare December 3, 2025 15:47
@leodido leodido changed the base branch from ld/fix-oci-extraction to main December 3, 2025 16:07
Root cause: When multiple packages depend on the same package (e.g., api/go:lib),
and that package's build fails, the build lock mechanism causes a race condition:

1. Goroutine A obtains the lock and starts building the shared dependency
2. Goroutine B waits on the lock
3. If Goroutine A's build fails, it releases the lock and returns an error
4. Goroutine B wakes up, sees the lock is released, and returns nil (success)
5. The error propagates up through Goroutine A's dependency chain only
6. If the target package's dependency chain goes through Goroutine B, the
   main build 'succeeds' even though the dependency failed
7. Vulnerability scanning runs and fails because the package isn't in cache

This fix:
- Passes the package build status map to vulnerability scanning
- Only scans packages with status PackageBuilt or PackageDownloaded
- Skips packages that failed verification, download, or build
- Logs which packages are skipped and why

This prevents fatal errors in vulnerability scanning when a dependency build
fails in a parallel goroutine. The build will still fail if the dependency
error propagates through the main dependency chain, but vulnerability scanning
won't cause an additional confusing error.

Fixes the issue in gitpod-next PR #11869 where api/go:lib build failed in a
parallel goroutine, but the main build continued and vulnerability scanning
crashed trying to scan a package that wasn't in cache.

Co-authored-by: Ona <no-reply@ona.com>
@leodido leodido force-pushed the ld/fix-vuln-scan-race branch from af7bc1f to 541fd78 Compare December 3, 2025 16:08
@leodido leodido merged commit 9971a07 into main Dec 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants