-
Notifications
You must be signed in to change notification settings - Fork 29
Bump hyperlight-host to new snapshot api #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work updating hyperlight-wasm to use the new snapshoting API.
I left a couple of comments/questions. Let me know what you think.
1540b2a
to
d2de1b9
Compare
…ead of mshv2) Signed-off-by: Mark Rossett <marosset@microsoft.com>
Update all hyperlight dependencies from various revisions to a unified 172fcfa69b0f9064c7a0e48e512f8a86ae1fdbe1 snapshot that includes snapshot/restore functionality. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Replace transition-based sandbox evolution with direct snapshot/restore API: - Remove EvolvableSandbox/DevolvableSandbox trait usage - Replace transition callbacks with direct method calls - Add snapshot/restore methods to LoadedWasmSandbox - Store initial snapshots in WasmSandbox for efficient unloading - Export Snapshot type in public API This provides more direct control over sandbox state management and enables features like checkpoint/restore. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
metrics::gauge!(METRIC_ACTIVE_WASM_SANDBOXES).increment(1); | ||
metrics::counter!(METRIC_TOTAL_WASM_SANDBOXES).increment(1); | ||
WasmSandbox { inner: Some(inner) } | ||
Ok(WasmSandbox { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like a slightly weird separation of concerns that we do the restore in LoadedWasmSandbox and then actually create this here. Does it make sense to either move this code into LoadedWasmSandbox or move the restore into here? In a sense, for modularity, it feels to me like if WasmSandbox is going to bother to expose a constructor that tries to enforce its internal invariants, one of which is that inner is in a state that corresponds to snapshot, we should probably make the constructor enforce that if it can, rather than relying on the code calling it to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restore is made in here. Does that alleviate your concerns?
} | ||
None => Err(new_error!("WasmSandbox is None, cannot load module")), | ||
/// Helper function to finalize module loading and create LoadedWasmSandbox | ||
fn finalize_module_load(mut self) -> Result<LoadedWasmSandbox> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taking self by value, so why do we need to move out of these fields? Is it just about the fact that we have drop glue on this type?
One option to get rid of the dynamic check here would be ManuallyDrop
, but I don't know if we like the tradeoff of (closer to what we actually want, no unnecessary dynamic checks) to increased unsafe code surface, and I suppose we would still need to have a flag that let the Drop impl know whether or not the contents had been moved (or use ManuallyDrop<Self>
to write an into_inner()
function that removes the contents of the sandbox, manually updates the counters, and never runs the drop glue at all).
Relatedly, I think that scattered throughout our errors in general, we have a mix of "this means the user did something wrong, and they should be better" with "this means this is a fundamental invariant that we believed would always be true but was somehow violated". I would like us to separate these in an easily visible way (akin to rustc errors vs ICEs), since they have quite different consequences. I wonder if it is reasonable to panic in the latter case, since if we were wrong about one invariant, who knows what else we were wrong about...
Sorry, this comment has become a bit long and large in scope, so feel free to not consider it a blocker for this PR, but perhaps think a little bit about whether the Option semantics are indeed what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with everything but will leave for another time. #136
} | ||
|
||
/// Restore the state of the sandbox to the state captured in the given snapshot. | ||
pub fn restore(&mut self, snapshot: &Snapshot) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have a pub fn new_from_dirty_sandbox_and_snapshot(sandbox_to_be_erased: MultiUseSandbox, loaded_snapshot: Snapshot)
(probably with less verbose names) or something like that?
I think that the common use case for e.g. function services is going to be "have a pool of sandboxes in various states, and whenever a request comes in, grab one, restore a sanpshot for the correct customer into it, and continue", which means that you would indeed want a function here which can just take any hyperlight sandbox in any state and restore the snapshot of runtime + image into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would work quite yet since restoring only works for snapshots from the same sandbox right now. But that does sound good in general
src/wasm_runtime/src/hostfuncs.rs
Outdated
)?; | ||
|
||
// Track any allocations for later cleanup | ||
if !allocated_addrs.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that we are deferring the cleanup here makes me feel like we are not being precise about the lifetime and ownership of parameter/return data (in the informal sense, not the Rust sense). I guess that we are saying here that whenever you call into a sandbox, the parameters and the return values from any host call, both live until you return from that call. That seems like a bit of a strange and arbitrary invariant to me, and if we really want to go down that path we should definitely document it.
Why doesn't the wasm inside the component free these values itself, whenever it is done with them? It seems like their lifetimes only affect wasm memory (since there aren't any e.g. resource handles which refer out of the sandbox in the module variant), and so it seems like it ought to be a guest concern to free them? I think that would make all of this more similar to the component model design as well.
Instead of enforcing this lifetime this way, we could perhaps think about finding a way to add something to audit the guest heap usage, or something like that? I'm not super sure what would make sense here, but I don't really like this approach to deciding how long the memory in wasm lives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed, please have another look at marshal.rs top of file comment.
- Reuse wasmtime Store and Instance across guest function calls instead of creating new one per call. - Establish memory contract between host and guest. - Guest functions takes ownership of input parameters - Guest transfer ownership of return values - Host functions parameters are borrowed from guest - Host function return values are owned by guest and guest must free them. - Component: Add post_return calls for proper WASM function cleanup - Fix ABI mismatch in parameter of guest_dispatch_function Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Bumps hyperlight dependency. Reworks uses of MultiuseSandboxContext to use snapshots instead.
Memory is no longer restored after a call to
LoadedWasmSandbox::call_guest_function
which revealed that memory is leaked during mashalling, and a missingpost_call
after calling a wasmtime func. In addition, a Store and wasm Instance are no longer created per guest function call. Rather, it's being created once and reused.