-
Notifications
You must be signed in to change notification settings - Fork 151
implement event-sourced architecture #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 08ff4d1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (9 failed)vite (9 failed):
🌍 Community Worlds (104 failed)mongodb (26 failed):
redis (26 failed):
starter (26 failed):
turso (26 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
❌ Some E2E test jobs failed:
Check the workflow run for details. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
6ebd4c5 to
2e46b8a
Compare
eece359 to
290e879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a performance optimization for event creation by adding a createBatch() method to the World interface. The implementation enables atomic batch creation of multiple events, significantly improving the wait completion logic in the runtime from O(n²) to O(n) complexity.
Key Changes
- Added
events.createBatch()method to the World interface for creating multiple events in a single operation - Implemented batch creation across three storage backends (world-vercel, world-postgres, world-local) with backend-specific optimizations
- Optimized runtime wait completion logic using Set-based correlation ID lookup and batch event creation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
packages/world/src/interfaces.ts |
Added createBatch() method signature with JSDoc documentation to the Storage events interface |
packages/world-vercel/src/storage.ts |
Integrated batch event creation into the storage adapter |
packages/world-vercel/src/events.ts |
Implemented createWorkflowRunEventBatch() using parallel API calls via Promise.all |
packages/world-postgres/src/storage.ts |
Implemented batch creation using a single INSERT query with multiple values for optimal database performance |
packages/world-local/src/storage.ts |
Implemented sequential batch creation to maintain monotonic ULID ordering for filesystem storage |
packages/core/src/runtime.ts |
Refactored wait completion to use Set-based lookup and batch event creation, improving from O(n²) to O(n) complexity |
.changeset/brave-dots-bake.md |
Added changeset documenting the performance improvement across all affected packages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f6b62f0 to
08ff4d1
Compare
| data.eventType === 'step_created' || | ||
| data.eventType === 'hook_created' | ||
| ) { | ||
| throw new WorkflowAPIError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status code thrown when creating entities on a terminal run is incorrect. The code throws 409, but the suspension handler expects 410.
View Details
📝 Patch Details
diff --git a/packages/world-local/src/storage.ts b/packages/world-local/src/storage.ts
index 1a2ead1..7135f36 100644
--- a/packages/world-local/src/storage.ts
+++ b/packages/world-local/src/storage.ts
@@ -398,7 +398,7 @@ export function createStorage(basedir: string): Storage {
) {
throw new WorkflowAPIError(
`Cannot create new entities on run in terminal state "${currentRun.status}"`,
- { status: 409 }
+ { status: 410 }
);
}
}
diff --git a/packages/world-postgres/src/storage.ts b/packages/world-postgres/src/storage.ts
index 026e9e2..17e6697 100644
--- a/packages/world-postgres/src/storage.ts
+++ b/packages/world-postgres/src/storage.ts
@@ -309,7 +309,7 @@ export function createEventsStorage(drizzle: Drizzle): Storage['events'] {
) {
throw new WorkflowAPIError(
`Cannot create new entities on run in terminal state "${currentRun.status}"`,
- { status: 409 }
+ { status: 410 }
);
}
}
Analysis
Incorrect HTTP status code when creating entities on terminal runs
What fails: When events.create() is called with step_created or hook_created event types on a run in terminal state (completed, failed, or cancelled), both packages/world-local/src/storage.ts (line 401) and packages/world-postgres/src/storage.ts (line 312) throw WorkflowAPIError with status 409, but the suspension handler in packages/core/src/runtime/suspension-handler.ts (lines 83-87) expects status 410.
How to reproduce:
// Create a workflow run and complete it
const run = await storage.events.create(null, {
eventType: 'run_created',
eventData: { deploymentId: 'test', workflowName: 'test', input: [] }
});
// Move run to terminal state
await storage.events.create(run.run.runId, {
eventType: 'run_completed',
eventData: { output: 'done' }
});
// Try to create a step on the completed run
try {
await storage.events.create(run.run.runId, {
eventType: 'step_created',
correlationId: 'step1',
eventData: { stepName: 'test-step', input: [] }
});
} catch (err) {
console.log(err.status); // Outputs 409, but handler expects 410
}Result: The suspension handler receives status 409 and logs "Hook already exists, continuing" instead of the correct "Workflow run has already completed, skipping hook" message, resulting in misleading log messages when hooks/steps cannot be created due to the run being in a terminal state.
Expected: Status code 410 is thrown for terminal run entity creation, while status 409 remains for duplicate token conflicts (verified: duplicate token check at lines 776 in world-local and 704 in world-postgres correctly use 409).
Files fixed:
packages/world-local/src/storage.tsline 401: Changed{ status: 409 }to{ status: 410 }packages/world-postgres/src/storage.tsline 312: Changed{ status: 409 }to{ status: 410 }

Pranay:
corresponding workflow-server PR: https://github.com/vercel/workflow-server/pull/154
important: This is a big change to the way workflows work since everything is now event sourced, I introduced new events types,
and changed the shape of the step object (lastKnownError -> error and startedAt -> firstStartedAt). New event logs that use this published version ofworkflowwill be incompatible with previous workflow version event logs. This doesn't affect the runtime of workflows since those are deployment pegged - but this does affect observability since the event shape looks different and the world spec has changed. The web-shared package just needs to be compatible with viewing workflow runs of the old schema for this to work correctly (which I believe it does, but please double check @VaguelySerious if I missed anything).The currently failing e2e tests on vercel world are related to the CLI I believe (slack x-ref). However once we merged the workflow-server PR, we can drop the env var changes on the vercel deployments for PR so that this PR points to the main prod deployment, again and then I'll re-run e2e tests to make sure they work :)
I Also added a new docs page with diagrams to explain the event sourcing and state machine lifecycles (preview link):
small: I also removed the unused run paused/resumed stuff which we've never used to simplify
Summary
Implement event-sourced architecture for runs, steps, and hooks:
run_created,run_started,run_completed,run_failed,run_cancelled)step_retryingevent for non-fatal step failures that will be retriedfatalfield fromstep_failedevent (step_failed now implies terminal failure)lastKnownErrortoerrorfor consistency with serverevents.create()step_createdevent for earlier detectionrun_paused/run_resumedevents andpausedstatusThis makes the system faster, easier to reason about, and resilient to data inconsistencies.
Test plan
🤖 Generated with Claude Code