I had recently worked on a SAN failure that resulted in a perfect storm of bad backups, broken offsite replication and disabled notifications. The data was sent off to professionals to recover the data. What was returned to me was VHD files from each of the LUNs this SAN had. This SAN had about 20 LUN with some Windows and Linux VMs attaching directly into the SAN for a data drive. I assume this was done because it appears VMWare ESXi 4.1 had a 2TB LUN limitation. Which explains why we had 2 massive 1.5 TB LUNs attached to the hosts. I won’t detail out the complete 16 hours of trial and error I spent trying to get this working, which first started on a Windows box, then my Mac and finally the Ubuntu box I settled on.
During my initial testing of the VHDs, I found that Windows 7 wouldn’t attach any of them saying there where corrupted. I then thought to try and use Microsoft’s iSCSI Target software to push out the VHDs as iSCSI drives again to the hosts. But this also failed. I then started copying the data off the 5 TB recovery drive I had in my hands to other external drives via USB 3, so I would have copies as I didn’t want to risk modifying or damaging any of the files and waiting to get another copy from the recovery specialists. I also found out during this time that the recovery specialists don’t handle this portion of the job, but they where able to verify and tell me which ones where good and bad before they sent me the data.
I then moved over to my mac, as Windows was just not as powerful for my needs as my mac had. I first tried to get the mac to open the VHDs with little results from the GUI interface. Dropping into terminal.app, I issued a “head -n20” command on the VHDs, when the data returned on one, I saw a good old NTDL is missing in the output along with some other lines. I determined without any research on the mater, that this was a MBR formatted Windows drive.
After some research I decided I needed to downloaded Mac FUSE and installed it. I attempted to mount the VHD again but this failed. More research was leading me to using the hdiutil command, but with little luck I wasn’t able to mount the drive. While discussing with another tech on this, it came up to change the file extension. Normally I don’t believe in this to often as verifying what a file contains only based of its extension seems ridiculous. Well, I changed that .vhd to a .img and attempted to mount it via the GUI interface and to my shock, I had the Windows drive showing in my sidebar along with plenty of Windows data!
I attempted to repeat this result with a VHD containing a linux EXT3 volume, but while it would attach, it wouldn’t be read properly. I assumed it was because my Mac couldn’t read EXT2/3 volumes, so I downloaded the EXT2 FUSE module and installed it. No difference here. I was unsure at the time as to why I couldn’t see what was inside the mounted volume, as I had decided it was time to use a Linux VM for this. FUSE works on linux (and default in Ubuntu 14.04), so no download for that was needed. Ubuntu would be my goto source for this as I figured it had lots of public support and would have the tools.
However, while my Mac would instantly mount the drive and work, Linux just said no to doing this at all. After more research I found out I could use the “fdisk -l” command on the .VHD and see a drive data. This was great. I also seen I had a start offset. While research other methods to attach this, I found this guide on using xmount, in which during their process they had to know the start to set a offset bytes count before it would mount the drive. Taking that hunch, I did the same thing, calculated the offset and was able to get it to mount both the Windows and Linux VHDs. I had data at this point.
However, I still didn’t have my big 1.5 TB VHDs/LUNs yet. I knew this would be a problem, as they where attached to the VMWare servers themselves. Sure enough when checking it out, I found them formatted as VMFS File system. I quickly did more research and found out that there are VMFS tools out there to mount the VMFS volumes in Ubuntu and Mac. This ended up with failure at first as I was trying to install Virtualbox and then compile some code that would allow me to mount this on my Mac. This ended up failing. All the guides that instructed me to do a simple “apt-get install” for Ubuntu, didn’t work, “apt-cache search” didn’t return any results. I ended up locating the .dpkg file on theUbuntu Packages page, and downloaded it manually, then installed it with “dpkg -i”. Due to all my trial and errors on Ubuntu at first, this broke package manager afterwords because I forced it to install because of a dependency error I created installing virtualbox.
After getting vmfs-fuse installed, I attempted to mount the VHD, with the inability to do as it would return an error. I soon realized after looking around, my error was because of that 128 byte offset on the VHD. This was proving to be my enemy in the entire process. I wasn’t having any luck getting into the data. I decided to go another route quickly and setup iscsitarget on the ubuntu and attach the luns directly to the VHD files. I had hoped by doing this, I would get them to be seen by hosts as a valid ISCSI resource and see the drives. This didn’t work, I even setup one with one of the LUNs/VHDs that had Windows data and attempted to present it to a Windows server via iSCSI Target services. It would connect but wouldn’t recognize the drive.
I don’t know where in my research it dawned on me, but I decided I what I wanted to do was create a loopback of the .VHD to attach it to a /dev/ research and hopefully see the partition tables. Each VHD/LUN was basically a drive and if I could attach it to a /dev/ resource it would hopefully attach each partition to a /dev resource and I would be free to use the mount command to get my data. Well this just wasn’t working. While using “mount -o loop” could mount the drive to a /mnt, it wasn’t doing the job of showing as a resource.
Finally I stumbled onto a post about using losetup, this was looking promising, but while using it would attach the VHD/LUN to a /dev resource, it wouldn’t attach each of the partitions. I attempted to use “sfdisk -R” to force it to do this, without any luck. Research wasn’t turing up any results as to why. I then realized that losetup had a -o for a offset. After calculating my offset again, I removed the original losetup I did and ran this again with the offset. After I did this I had a /dev/loop1 that had a VMWare File system on it. I then issued my vmfs-fuse command to mount that to a /mnt/ drive and I could see data! That was it, I had folders with vmdks and other files for me to restore.
So now it was time to repeat this setup on other Ubuntu systems setup so we could import data faster as the USB 3.0 interface would be the slow point, not network or the new SAN with RAID 50. The basic summary of working steps is.
- Installed ubuntu
- Installed vmfs tools via the dpkg download
- Run fdisk to get the offset and calculated its bytes
- Ran the losetup with the -o bytes which attached to a /dev/loop
- Ran vmfs-fuse to mount the /dev/loop to a /mnt/temp drive
- scp the data up.