Repairing corrupted partition from encrypted volume that won't mount

Ubuntu 22.04 LTS, Gnome

I am trying to recover around 1 to 1.5 GB of data from a 3TB external HDD. It was disconnected while mounted under Ubuntu, and now fails to mount. The first mount attempt was when briefly connected under Windows 10.

I originally created the filesystem as a single non-bootable partition (IIRC ext3 or ext4) and encrypted with LUKS-2, using all the drive space, after zeroing out the whole drive first.

TestDisk analysis seems to find a large EFI GPT partition, with a bad sector count, overlapped near the end by a ~9GB extended partition, containing a FAT32 partition. Elsewhere it finds a sketchy FAT16 partition it can’t recover, and reports that the HDD ā€œseems too smallā€ and should actually be over 4000MB.

The HDD is visible to my Ubuntu install.

$sudo fdisk -l

Disk /dev/sdb: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST3000DM001-9YN1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/sdb1 1 4294967295 4294967295 2T ee GPT

Partition 1 does not start on physical sector boundary.

The partition does not mount

###@Stallo:~$ sudo mount /dev/sdb1 /home/###/mount
mount: /home/###/mount: special device /dev/sdb1 does not exist.

Indeed, /dev doesn’t list any partitions, just /dev/sdb. $sudo partprobe /dev/sdb detects nothing.

What I’ve Tried:

PhotoRec got past the partition mess, but found a bunch of ā€œpdfā€ files up to a few dozen GB in size. There many such large files on the volume, but not pdfs. I guess PhotoRec is confused by the encryption.

Main Questions:

  1. Am I going to have to fix the partition table first, to be able to decrypt the contents? I’m expecting this is how LUKS-2 works
  2. If so, can it be done in pieces, i.e. creating a partial image, or will I have to image the whole thing onto another drive? And will that drive have to be >3TB?
  3. Any advice on going about the repair once it’s imaged, greatly appreciated

Time to review your backup regime

1 Like

https://superuser.com/questions/1660630/proper-alternative-method-to-recover-lost-directory-ext4

Looks useful

touch /forcefsck and it will run fsck whenever you reboot the server.

@pavlos Are you sure about this?

If I am not mistaken, the adoption of systemd changed things.

See, for example, this topic here.

I stand corrected, thank you for the link.

1 Like

The first step would be to copy the data to another drive which you should be able to do with gparted described at the link below.

https://unix.stackexchange.com/questions/778236/can-msdos-partition-be-located-past-the-2tb-point-on-a-disk

You indicate that you created a linux filesystem on that drive initially and that the first mount attempt was when briefly connected under Windows 10. Why and how would you do that as a default windows OS will not be able to read/write to a linux filesystem?

Your testdisk output shows an EFI partition which also shows GPT. A drive larger than 2TB will be GPT. Your last image shows an extended partition which will not exist on a GPT drive. Your fdisk output shows the Disklabel type as dos not gpt? Did you convert from gpt to dos?

Did you try running fsck (filesystem check) from Ubuntu on the drive/partition? There is nothing in your output indicating a linux filesystem. It’s interesting that fdisk shows the drive/partition (sdb1) but you are unable to mount it, don’t know why that would be but the differing output for disklabel type (gpt and dos) is problematic. I’d start by cloning/copying what if anything you can from that drive to another if possible.

I’m not entirely sure it was ext4, it may have been NTFS or something windows-compatible.

that’s why I mention having used it under W10. It’s a dual booting machine.

I am sure I created no such dos partition, and the disk was zeroed out before I created any partitions. Notably it starts a few k blocks before the endpoint of the GPT partition. I didn’t try to write anything to it under W10, but I wouldn’t put it past Windows to do that on its own. However I think that’s a long shot, as previously it was disconnected while mounted, IMHO a more plausible way to damage the filesystem

(sudo fsck /dev/sdb
fsck from util-linux 2.37.2e2fsck 1.46.5 (30-Dec-2021)ext2fs_open2: Bad magic number in super-blockfsck.ext2: Superblock invalid, trying backup blocks…fsck.ext2: Bad magic number in super-block while trying to open /dev/sdb
The superblock could not be read or does not describe a valid ext2/ext3/ext4filesystem.  If the device is valid and it really contains an ext2/ext3/ext4filesystem (and not swap or ufs or something else), then the superblockis corrupt, and you might try running e2fsck with an alternate superblock:e2fsck -b 8193 ore2fsck -b 32768 
Found a PMBR partition table in /dev/sdb

sudo fsck /dev/sdb1 returns ā€œno such file or directory.ā€

before being corrupted, the drive automounted under ubuntu.

May take a few days to get another 3TB drive to clone to. On the other hand, maybe there’s a way to decrypt the files individually, as PhotoRec seems capable of finding them.

thanks for the input

The fsck output seems to indicate it is not an ext (linux) partition so you might try running chkdsk from windows on it.

If you had the drive attached while in windows, you should verify that bitlocker is not on, that hibernation is off (some windows updates turn it on without notice) and any fastboot option in windows or the BIOS is off.

The reference to ā€˜dos’ is not to a partition but the disk label or partition table type. Was this just a data disk?

1 Like

Disklabel type: dos

That is a huge problem. Drives over 2TiB must be gpt as MBR does not support them. You forced your drive to be at most 2TB. I did not think any hewer tools to partition drives let you make that mistake.

If you use testdisk in MBR mode, not Intel/gpt does it show anything. If deeper search shows files immediately copy them as some have not and never could get them later. Or if you can mount the MBR partition(s) can you copy data?

You cannot run fsck on a drive, only on partitions, fsck /dev/sda is incorrect. It needs to be an ext4 partition and then fsck or e2fsck on that partition with parameters to make fixes. Examples here:

Trying various repairs on original drive might make it worse. It would probably be a good start to create a full image of the failing drive (for example using dd command) and only work on this image (or a copy of image) to recover lost data.

1 Like

chkdsk and other windows tools failed with messages like ā€œcan not accessā€

Yes. zeroed out and made one big encrypted partition. I regret the encryption because PhotoRec was finding stuff easily. There were no subpartitions of any kind.

I’m not sure the disklabel was dos before. It was working well (for years) before being corrupted. Immediate events before corruption were a) disconnected while mounted under ubuntu and b) connected under Windows 10. Could the disklabel have been rewritten?

In TestDisk 7.1 I’m not seeing an MBR mode option

What method or software did you use to ā€˜zero out’ the drive. It is likely if the drive in question was purchased since 2012 that it was gpt and the disk label is not going to change on its own, meaning without some user intervention. I don’t think incorrectly unmounting would do that. I’m not sure what the method/software you used to ā€˜ zero out’ the drive would do. If you had something on the drive previously or wanted to start fresh a long format should do the job. Creating one large partition for that drive would not be a problem in itself for either a Linux or Windows drive/partition.

it was pre-owned, and IIRC had a single NTFS partition before I wiped it. It would have been Disk Utility, shred or dd, and only zeroes, I didn’t care about randomizing.

In any case, that was a long time ago, and it always worked fine. So if the disk label is the problem, it only became a problem recently, and the cause would plausibly be recent.

It will be a few days before I have a 2nd 3TB drive to tinker with the image. Maybe I can find a way to decrypt the data without recovering the partitions.

Update: got another 3TB drive, cloned the first. Space problem, possibly due to having more bad sectors than the original

sudo dd if=/dev/sdb of=/dev/sdc bs=64K conv=noerror,sync
dd: error writing ā€˜/dev/sdc’: No space left on device
45785415+1 records in45785415+0 records out
3000592965632 bytes (3.0 TB, 2.7 TiB) copied, 24495.1 s, 122 MB/s

fdisk - l shows several differences: Microsoft reserved partition has appeared out of nowhere, type is now gpt not dos, logical sector size up from 512 to 4096 bytes

Disk /dev/sdc: 2.73 TiB, 3000592965632 bytes, 732566642 sectors
Disk model: FA GoFlex Desk
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 5207####-####-####-####-C619BB85####
Device     Start   End Sectors Size Type
/dev/sdc1      6  4095    4090  16M Microsoft reserved

TestDisk analyze found no partitions but the new Microsoft reserved, though on the first run it reported a cryptic ā€œbad number of sectors per clusterā€ error. Disk is a Hitachi but is hooked up to a Seagate usb interface, DiskUtility recognizes the brand correctly.

check_FAT: Bad number of sectors per cluster
Unknown 257329194 256329193 0 [0~KM-)^?M-K]

sdc has 16,384 fewer bytes than sdb, at 4096 bytes/sector that’s exactly 4 sectors short. Could that be enough to erase the whole-disk partition still visible on sdb, by erasing the endpoint? If so, is a workaround possible?

at least there’s some LUKS data apparently within the Windows partition:

sudo hexdump -C /dev/sdc |grep LUKS
3652f530 5a cb 72 88 3e 0f 74 b4 a7 4c 55 4b 53 10 77 93 |Z.r.>.t..LUKS.w.|

will poke around for more of it a bit later. That may take a while. Might get somewhere with

https://unix.stackexchange.com/questions/741404/overwritten-luks-with-a-partition-table

Am I the only one realizing that you may need to unlock the LUKS2 container first? It is still an encrypted drive, if I understand correctly.

Plus, you really shouldn’t be using device nodes like /dev/sdb directly. Those can change from one reboot to another; and just like that your source is now the sink: irreversible data loss!
See this Redhat article on persistent naming attributes:
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_storage_devices/persistent-naming-attributes_managing-storage-devices

The short of it: use the symbolic links in /dev/disk/by-id. But in this case it’s going to be more like /dev/mapper/<unlocked LUKS2 container>. And that’s what you hand to testdisk as the device to search through. Or, just run fsck on that one. But if it’s an NTFS filesystem inside – and it looks like that’s the case – you should have Windows do the filesystem check, after unlocking the container, of course.

Testdisk also finds false positives, if some ā€œmagic bytesā€ are encountered; since an encrypted device is basically just random noise to the outside, chances are that some bytes will look like a partition signature; or like PDF files.

To see what’s what:

lsblk -fe7

Unless you have actually crippled the partition table, which shouldn’t happen by simply yanking the USB cord of a mounted device, that may provide some clue what to do next.

Where would I locate this on my system? Drive started life as an external.

Currently I’m browsing through the copied drive looking for the keys and metadata, which hopefully allow manual reconstruction of the partition

You should start by running lsblk -fe7 in a terminal. Or just list all the symlinks in /dev/disk/by-id, like so:

ls -l /dev/disk/by-id

Those are somewhat easily identifiable, because they follow a pattern like <interface>-<vendor>-<serial number>-<partition>, or similar. Or try gparted, which should be able to recognize the LUKS2 container/partition. But do keep in mind that device names like /dev/sdb may change between reboots, so always double check that you are operating on the correct device.

Also, what do you mean by:

Drive started life as an external.

Did you dismantle the case and install it as an internal drive? That should only change the very beginning of the link in /dev/disk/by-id from usb-* to ata-* or whatever the true interface.

It also helps to remember how you did configure it as LUKS2 device.

P.S.: You could also try to unlock the whole disk device, because LUKS2 can also work on whole disks without even a partition table:

sudo cryptsetup open /dev/disk/by-id/<device link> <temporary name>

(names, such as <device link>, are placeholders to be replaced by your real device)

That, if it works, should get you a new device node /dev/mapper/<temporary name>.