I’ve got a corrupt image (qcow2) after power off on OS X and I’m not able to recover it. The initial symptoms were that during start up the computer was unresponsive.
Can anyone give a pointer to how try to repair these type of files on OS X?
My system:
OS X 10.13.6
multipass 1.5.0
What I learned:
It seems that qemu-image has / had a bug on OS X with sparse images (lseek)
qemu-image 5.1.0 installed with homebrew creates images that multipass cannot read (probably with qcow3 format):
hyperkit: [ERROR] Mirage block device raised exception: (Failure "Read a header_length of 112 but we computed 104")
ls -alh hangs the computer when accessing the corrupt image
A cp -a <corrupt image> <new destination on other disk> worked initially but I still don’t know if it will be useful
Converting an image with qemu-image convert -O qcow2 <old image> <new image> creates an unusable image (same header_length error).
Converting an image with qemu-image convert -S 0 ... creates a thick provisioned, i.e. non-sparse, image
Hi @erny, is your main goal to retrieve data from the image?
By qemu-image you mean the qemu-img utility? We ship a known-compatible version along Multipass: /Library/Application Support/com.canonical.multipass/bin/qemu-img - that should be a better choice for any manipulation of the image. A qemu-img convert -p -O qcow2 <source> <target> may help you obtain a hyperkit-compatible image.
The initial symptoms were that during start up the computer was unresponsive. ls -alh hangs the computer when accessing the corrupt image
These actually suggest there’s a problem with the host disk/filesystem at this location. Are you certain your drive is in good health?
A cp -a <corrupt image> <new destination on other disk> worked initially but I still don’t know if it will be useful
That, on the other hand, suggests things could be well.
Hi @saviq, thanks for the info. I used the qcow-tool provided with docker which also uses hyperkit. Finally I had to throw away the image. I may be related to the suspend / wakeup issue we discussed here: https://github.com/canonical/multipass/issues/1924
Now, when I try to shut down the VM which is several containers which also require time to shut down, I get “Stopping primary [2021-01-18T09:23:54.422] [error] [primary] process error occurred Crashed”, but only if I don’t stop the containers before.
Is there any shutdown timeout for “multipass stop” ?
Yes, we only give it 15s to shut down, then we kill it… Unfortunately there’s not much of a way for us to know the difference between the instance being stuck and still shutting down. That said, maybe 15s is a bit on the short side.