Corrupt unusable qcow image with multipass after power off

erny · January 6, 2021, 1:35pm

Hi.

I’ve got a corrupt image (qcow2) after power off on OS X and I’m not able to recover it. The initial symptoms were that during start up the computer was unresponsive.

Can anyone give a pointer to how try to repair these type of files on OS X?

My system:

OS X 10.13.6
multipass 1.5.0

What I learned:

It seems that qemu-image has / had a bug on OS X with sparse images (lseek)
qemu-image 5.1.0 installed with homebrew creates images that multipass cannot read (probably with qcow3 format):

hyperkit: [ERROR] Mirage block device raised exception: (Failure "Read a header_length of 112 but we computed 104")
ls -alh hangs the computer when accessing the corrupt image
A cp -a <corrupt image> <new destination on other disk> worked initially but I still don’t know if it will be useful
Converting an image with qemu-image convert -O qcow2 <old image> <new image> creates an unusable image (same header_length error).
Converting an image with qemu-image convert -S 0 ... creates a thick provisioned, i.e. non-sparse, image

saviq · January 8, 2021, 9:03am

Hi @erny, is your main goal to retrieve data from the image?

By qemu-image you mean the qemu-img utility? We ship a known-compatible version along Multipass: /Library/Application Support/com.canonical.multipass/bin/qemu-img - that should be a better choice for any manipulation of the image. A qemu-img convert -p -O qcow2 <source> <target> may help you obtain a hyperkit-compatible image.

The initial symptoms were that during start up the computer was unresponsive.
ls -alh hangs the computer when accessing the corrupt image

These actually suggest there’s a problem with the host disk/filesystem at this location. Are you certain your drive is in good health?

A cp -a <corrupt image> <new destination on other disk> worked initially but I still don’t know if it will be useful

That, on the other hand, suggests things could be well.

You should be able to mount the image as well.

Another utility is qcow-tool, using the same underlying QCOW implementation that hyperkit does. You can get that one through OPAM.

Let us know how it goes!

erny · January 18, 2021, 7:30pm

Hi @saviq, thanks for the info. I used the qcow-tool provided with docker which also uses hyperkit. Finally I had to throw away the image. I may be related to the suspend / wakeup issue we discussed here: https://github.com/canonical/multipass/issues/1924

Now, when I try to shut down the VM which is several containers which also require time to shut down, I get “Stopping primary [2021-01-18T09:23:54.422] [error] [primary] process error occurred Crashed”, but only if I don’t stop the containers before.

Is there any shutdown timeout for “multipass stop” ?

Thanks

saviq · January 18, 2021, 7:54pm

Yes, we only give it 15s to shut down, then we kill it… Unfortunately there’s not much of a way for us to know the difference between the instance being stuck and still shutting down. That said, maybe 15s is a bit on the short side.

steventillson02 · July 29, 2021, 11:28am

Thanks man, A cp -a <corrupt image> <new destination on other disk> worked for me.