Skip to content

Starlark tool #62

@thejhh

Description

@thejhh

(Starlark is designed for hermetic, deterministic execution; go.starlark.net/starlark executes from a string with no FS/net/clock unless you expose it.

  • Implement tool contract and handler for code.sandbox.starlark.run that executes source from memory: create internal/tools/starlarkrun/handler.go exposing Name() string { return "code.sandbox.starlark.run" } and Call(ctx, json.RawMessage) (ToolResult, error) which accepts {source:string,input:string,limits:{wall_ms:int,output_kb:int},caps:{}}; parse JSON, run Starlark with ExecFile using a Thread and predeclared functions read_input()->str and emit(str); enforce wall-time via context.WithTimeout and cap emitted bytes with a bounded buffer; return {stdout:string}; DoD: unit test passes showing a script emit(read_input()) returns input, times out on while True: pass, and output > limit is truncated with a clear error.
  • Add dependency and wiring: append require go.starlark.net vX to go.mod, create internal/tools/starlarkrun/module.go registering the tool into the tool registry (constructor + dependency-free init), and update README.md “Tools” table with usage and example input/output; DoD: go build ./... succeeds locally, registry lists the tool, README shows a runnable curl example.
  • Harden capabilities (deny-by-default): ensure only emit and read_input are available (no FS, net, clock); do not bind any os, time, or custom builtins; add negative tests that attempt to import or access such capabilities and expect failure; DoD: tests demonstrate no ambient side effects are reachable and only declared builtins exist.
  • Determinism test: add a table test running the same source+input 100× and asserting identical output and errors; DoD: flaky rate 0/100 locally; test name includes “deterministic”; comment cites Starlark determinism; tests green locally; DoD documented in test.
  • Structured errors & shared schema: return standardized errors {code:string,message:string,details?:object} for timeouts (TIMEOUT), output limit exceeded (OUTPUT_LIMIT), and evaluation failures (EVAL_ERROR); update shared error schema doc and example in README; DoD: unit tests assert JSON shape for each failure mode and README section is present.
  • Observability: add structured logs (trace id, tool name, wall_ms, bytes_out) and emit OpenTelemetry span attributes; DoD: local run shows JSON logs with those fields and a span named tools.starlark.run.
  • Contract examples: add docs/interfaces/code.sandbox.starlark.run.md with request/response examples (valid, timeout, error), security notes, and performance caveats; DoD: doc renders and is linked from main docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions