Skip to content

Commit 27c7be6

Browse files
committed
fix: saving/restoring async IO engine transport state
When saving the state of a microVM with one or more block devices backed by the async IO engine, we need to take a few steps extra steps before serializing the state to the disk, as we need to make sure that there aren't any pending io_uring requests that have not been handled by the kernel yet. For these types of devices that need that we call a prepare_save() hook before serializing the device state. If there are indeed pending requests, once we handle them we need to let the guest know, by adding the corresponding VirtIO descriptors to the used ring. Moreover, since we use notification suppression, this might or might not require us to send an interrupt to the guest. Now, when we save the state of a VirtIO device, we save the device specific state **and** the transport (MMIO or PCI) state along with it. There were a few issues with how we were doing the serialization: 1. We were saving the transport state before we run the prepare_save() hook. The transport state includes information such as the `interrupt_status` in MMIO or `MSI-X config` in PCI. prepare_save() in the case of async IO might change this state, so us running it after saving the transport state essentially looses information. 2. We were saving the devices states after saving the KVM state. This is problematic because, if prepare_save() sends an interrupt to the guest we don't save that "pending interrupt" bit of information in the snapshot. These two issues, were making microVMs with block devices backed by async IO freeze in some cases post snapshot resume, since the guest is stuck in the kernel waiting for some notification for the device emulation which never arrives. Currently, this is only a problem with virtio-block with async IO engine. The only other device using the prepare_save() hook is currently virtio-net, but this one doesn't modify any VirtIO state, neither sends interrupts. Fix this by ensuring the correct ordering of operations during the snapshot phase. Signed-off-by: Babis Chalios <bchalios@amazon.es>
1 parent ff0e866 commit 27c7be6

File tree

6 files changed

+46
-34
lines changed

6 files changed

+46
-34
lines changed

src/vmm/src/device_manager/pci_mngr.rs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -283,9 +283,13 @@ impl<'a> Persist<'a> for PciDevices {
283283

284284
for pci_dev in self.virtio_devices.values() {
285285
let locked_pci_dev = pci_dev.lock().expect("Poisoned lock");
286-
let transport_state = locked_pci_dev.state();
287286
let virtio_dev = locked_pci_dev.virtio_device();
287+
// We need to call `prepare_save()` on the device before saving the transport
288+
// so that, if we modify the transport state while preparing the device, e.g. sending
289+
// an interrupt to the guest, this is correctly captured in the saved transport state.
288290
let mut locked_virtio_dev = virtio_dev.lock().expect("Poisoned lock");
291+
locked_virtio_dev.prepare_save();
292+
let transport_state = locked_pci_dev.state();
289293

290294
let pci_device_bdf = transport_state.pci_device_bdf.into();
291295

@@ -316,7 +320,6 @@ impl<'a> Persist<'a> for PciDevices {
316320
snapshotting yet"
317321
);
318322
} else {
319-
block_dev.prepare_save();
320323
let device_state = block_dev.save();
321324
state.block_devices.push(VirtioDeviceState {
322325
device_id: block_dev.id().to_string(),
@@ -338,7 +341,6 @@ impl<'a> Persist<'a> for PciDevices {
338341
imds_compat: mmds_guard.imds_compat(),
339342
});
340343
}
341-
net_dev.prepare_save();
342344
let device_state = net_dev.save();
343345

344346
state.net_devices.push(VirtioDeviceState {

src/vmm/src/device_manager/persist.rs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -227,11 +227,15 @@ impl<'a> Persist<'a> for MMIODeviceManager {
227227

228228
let _: Result<(), ()> = self.for_each_virtio_device(|_, devid, device| {
229229
let mmio_transport_locked = device.inner.lock().expect("Poisoned lock");
230+
let mut locked_device = mmio_transport_locked.locked_device();
231+
// We need to call `prepare_save()` on the device before saving the transport
232+
// so that, if we modify the transport state while preparing the device, e.g. sending
233+
// an interrupt to the guest, this is correctly captured in the saved transport state.
234+
locked_device.prepare_save();
230235
let transport_state = mmio_transport_locked.save();
231236
let device_info = device.resources;
232237
let device_id = devid.clone();
233238

234-
let mut locked_device = mmio_transport_locked.locked_device();
235239
match locked_device.device_type() {
236240
virtio_ids::VIRTIO_ID_BALLOON => {
237241
let device_state = locked_device
@@ -255,7 +259,6 @@ impl<'a> Persist<'a> for MMIODeviceManager {
255259
snapshotting yet"
256260
);
257261
} else {
258-
block.prepare_save();
259262
let device_state = block.save();
260263
states.block_devices.push(VirtioDeviceState {
261264
device_id,
@@ -275,7 +278,6 @@ impl<'a> Persist<'a> for MMIODeviceManager {
275278
});
276279
}
277280

278-
net.prepare_save();
279281
let device_state = net.save();
280282
states.net_devices.push(VirtioDeviceState {
281283
device_id,

src/vmm/src/devices/virtio/block/device.rs

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,6 @@ impl Block {
8282
}
8383
}
8484

85-
pub fn prepare_save(&mut self) {
86-
match self {
87-
Self::Virtio(b) => b.prepare_save(),
88-
Self::VhostUser(b) => b.prepare_save(),
89-
}
90-
}
91-
9285
pub fn process_virtio_queues(&mut self) -> Result<(), InvalidAvailIdx> {
9386
match self {
9487
Self::Virtio(b) => b.process_virtio_queues(),
@@ -227,6 +220,13 @@ impl VirtioDevice for Block {
227220
self.process_virtio_queues();
228221
}
229222
}
223+
224+
fn prepare_save(&mut self) {
225+
match self {
226+
Self::Virtio(b) => b.prepare_save(),
227+
Self::VhostUser(b) => b.prepare_save(),
228+
}
229+
}
230230
}
231231

232232
impl MutEventSubscriber for Block {

src/vmm/src/devices/virtio/device.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,9 @@ pub trait VirtioDevice: AsAny + Send {
169169

170170
/// Kick the device, as if it had received external events.
171171
fn kick(&mut self) {}
172+
173+
/// Prepare the device for saving its state
174+
fn prepare_save(&mut self) {}
172175
}
173176

174177
impl fmt::Debug for dyn VirtioDevice {

src/vmm/src/devices/virtio/net/device.rs

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -938,26 +938,6 @@ impl Net {
938938

939939
Ok(())
940940
}
941-
942-
/// Prepare saving state
943-
pub fn prepare_save(&mut self) {
944-
// We shouldn't be messing with the queue if the device is not activated.
945-
// Anyways, if it isn't there's nothing to prepare; we haven't parsed any
946-
// descriptors yet from it and we can't have a deferred frame.
947-
if !self.is_activated() {
948-
return;
949-
}
950-
951-
// Give potential deferred RX frame to guest
952-
self.rx_buffer.finish_frame(&mut self.queues[RX_INDEX]);
953-
// Reset the parsed available descriptors, so we will re-parse them
954-
self.queues[RX_INDEX].next_avail -=
955-
Wrapping(u16::try_from(self.rx_buffer.parsed_descriptors.len()).unwrap());
956-
self.rx_buffer.parsed_descriptors.clear();
957-
self.rx_buffer.iovec.clear();
958-
self.rx_buffer.used_bytes = 0;
959-
self.rx_buffer.used_descriptors = 0;
960-
}
961941
}
962942

963943
impl VirtioDevice for Net {
@@ -1069,6 +1049,26 @@ impl VirtioDevice for Net {
10691049
self.process_virtio_queues();
10701050
}
10711051
}
1052+
1053+
/// Prepare saving state
1054+
fn prepare_save(&mut self) {
1055+
// We shouldn't be messing with the queue if the device is not activated.
1056+
// Anyways, if it isn't there's nothing to prepare; we haven't parsed any
1057+
// descriptors yet from it and we can't have a deferred frame.
1058+
if !self.is_activated() {
1059+
return;
1060+
}
1061+
1062+
// Give potential deferred RX frame to guest
1063+
self.rx_buffer.finish_frame(&mut self.queues[RX_INDEX]);
1064+
// Reset the parsed available descriptors, so we will re-parse them
1065+
self.queues[RX_INDEX].next_avail -=
1066+
Wrapping(u16::try_from(self.rx_buffer.parsed_descriptors.len()).unwrap());
1067+
self.rx_buffer.parsed_descriptors.clear();
1068+
self.rx_buffer.iovec.clear();
1069+
self.rx_buffer.used_bytes = 0;
1070+
self.rx_buffer.used_descriptors = 0;
1071+
}
10721072
}
10731073

10741074
#[cfg(test)]

src/vmm/src/lib.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,12 @@ impl Vmm {
436436
/// Saves the state of a paused Microvm.
437437
pub fn save_state(&mut self, vm_info: &VmInfo) -> Result<MicrovmState, MicrovmStateError> {
438438
use self::MicrovmStateError::SaveVmState;
439+
// We need to save device state before saving KVM state.
440+
// Some devices, (at the time of writing this comment block device with async engine)
441+
// might modify the VirtIO transport and send an interrupt to the guest. If we save KVM
442+
// state before we save device state, that interrupt will never be delivered to the guest
443+
// upon resuming from the snapshot.
444+
let device_states = self.device_manager.save();
439445
let vcpu_states = self.save_vcpu_states()?;
440446
let kvm_state = self.kvm.save_state();
441447
let vm_state = {
@@ -450,7 +456,6 @@ impl Vmm {
450456
self.vm.save_state(&mpidrs).map_err(SaveVmState)?
451457
}
452458
};
453-
let device_states = self.device_manager.save();
454459

455460
Ok(MicrovmState {
456461
vm_info: vm_info.clone(),

0 commit comments

Comments
 (0)