Skip to content

Conversation

@dciangot
Copy link
Member

@dciangot dciangot commented Oct 2, 2025

Summary

This PR adds full support for Kubernetes startup probes with proper initialDelaySeconds handling to the interlink-slurm-plugin.

Problem

Previously, the plugin only supported readiness and liveness probes, completely ignoring startup probes defined in pod specifications. This meant that pods with startup probes would not respect the initialDelaySeconds configuration, leading to premature probe execution.

Additionally, issue #91 reported that ContainerRuntime was not defaulting properly when missing from config files, causing startup failures when upgrading from older versions.

Solution

Implemented complete startup probe support that:

  • Respects initialDelaySeconds: Startup probes now wait for the configured delay (e.g., 300s) before first execution
  • Non-blocking container startup: Containers start immediately without waiting for startup probes
  • Blocks other probes: Readiness and liveness probes wait for startup probe success before executing
  • Supports both probe types: HTTP and Exec probes are fully supported
  • Proper error handling: Failed startup probes prevent other probes from starting

Fixed ContainerRuntime default handling:

  • Backward compatibility: ContainerRuntime now properly defaults to "singularity" when not specified in config
  • Smooth upgrades: Users can upgrade from pre-0.5.2 versions without modifying their config files

Changes

  • Add vendor/ to .gitignore to exclude vendored dependencies
  • Update ContainerCommand struct to include startupProbes field
  • Extend translateKubernetesProbes() to handle startup probe translation
  • Implement runStartupProbe() and waitForStartupProbes() functions
  • Update probe metadata storage/loading for startup probe counts
  • Add comprehensive startup probe example with 300s initial delay
  • Fix probe script generation to properly sequence probe execution
  • Fix ContainerRuntime to default to "singularity" when not configured (fixes The default value for ContainerRuntime is not functioning #91)

Testing

  • All code compiles without errors (go build ./...)
  • Passes static analysis (go vet ./...)
  • Added example pod specification demonstrating startup probe usage
  • Verified backward compatibility with missing ContainerRuntime config

Kubernetes Semantics

This implementation follows Kubernetes startup probe semantics where startup probes must succeed before readiness and liveness probes begin execution, preventing premature health checks during application initialization.

Fixes #91

Add full support for Kubernetes startup probes with proper initialDelaySeconds
handling. Startup probes now:
- Respect initialDelaySeconds configuration before first execution
- Run in background without blocking container startup
- Block readiness and liveness probes until successful
- Support both HTTP and Exec probe types
- Include proper cleanup and error handling

Changes:
- Add vendor/ to .gitignore to exclude vendored dependencies
- Update ContainerCommand struct to include startupProbes field
- Extend translateKubernetesProbes to handle startup probe translation
- Implement runStartupProbe and waitForStartupProbes functions
- Update probe metadata storage/loading for startup probe counts
- Add startup probe example demonstrating 300s initial delay
- Fix probe script generation to properly sequence probe execution

Startup probes follow Kubernetes semantics where they must succeed before
other probes begin execution, preventing premature readiness/liveness checks
during application initialization.

Signed-off-by: Diego Ciangottini <diego.ciangottini@pg.infn.it>
@dciangot dciangot force-pushed the feature/startup-probe-support branch from f4f81f9 to 5e70cfd Compare October 2, 2025 17:48
Fix issue where ContainerRuntime defaults to empty string when not
specified in config file, causing plugin startup failure. Now properly
defaults to 'singularity' for backward compatibility with configs from
versions prior to 0.5.2.

This allows users to upgrade without modifying their existing config files.

Fixes interlink-hq#91

Signed-off-by: Diego Ciangottini <diego.ciangottini@pg.infn.it>
@dciangot dciangot requested a review from Bianco95 October 2, 2025 17:56
@dciangot dciangot merged commit e2a1cab into interlink-hq:main Oct 3, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The default value for ContainerRuntime is not functioning

1 participant