EonNAS ZFS Raid data recovery - Case Study
International Architecture Studio Gets Their Failed EonNAS ZFS Server Recovered after Competitor's Unsuccessful Attempt
If you’re diagnosed with a serious or rare disease, you would probably seek a second opinion to double-check the diagnosis. The same is true for data recovery. Even though we recommend choosing the right data recovery company the first time to prevent further damage to the storage media, we still encourage customers to send us their failed devices when competitors deem them unrecoverable. In many cases ACE Data Recovery engineers have proven their expertise in getting thought-to-be-lost data back.
In October 2018, an international architecture studio, based in San Francisco, suffered a catastrophic failure of their EonNAS 3016RT1 server. The server housed three decades of archived projects as well as some live project data for their current clients. This happened because IT personnel were stretched to the limit and barely able to keep up with the demands of a 100-person architecture firm at the time. The perfect storm ensued: already in a critical state, another disk in the server failed without their knowledge. A second critical member failed during their attempted remediation.
Company files were stored on a series of disks configured in a RAID 10 to provide protection against drive failure. The underlying file-system was built on Sun Microsystem’s ZFS RAID array. Despite the sophistication of the ZFS array, the configuration they selected years ago to give them a balance of redundancy, storage capacity and performance was not equipped to handle the two additional disks failing. And, to their dismay, the server backup that should have contained a copy of the archive was nowhere to be found. Just like that, their production files and historic data evaporated.
The senior systems administrator at the studio explained what happened next:Luckily for us, nearly every team had a local copy of their current project. This single advantage gave us a way of moving forward without losing business for an entire year. We shipped the EonNAS to DriveSavers immediately, based on past success with single disks. We thought they could open each drive and clone the platters to new hardware then rebuild the array. It was not that simple. DriveSavers spent months trying to get a readable file-system to mount, only to ship the EonNAS back with the dreaded response: "Unfortunately after extensive time and effort we confirmed recovery was not possible in this case." The uniqueness of our dead server proved to be too difficult to reconstruct, or so we thought.
In disbelief, we searched the web for a reputable company specializing in RAID, SAN, or NAS systems until something caught our eyes: "We recover data other companies can't." Could it be true? Don Wells seemed to think so. With his reassuring voice and Texan accent, he talked us through the steps and quoted us an affordable recovery, assuming that ACE would be successful. There is, of course, no guarantee, but with 40 years of industry experience, Don's opinion was that ACE could accept the challenge and return some amount of valuable data. Convinced it was worth an attempt, we shipped our EonNAS to Texas. Once the engineers examined the drives, the estimated cost increased significantly. This was not going to be an easy extraction. Deciding to proceed, we accepted and crossed our fingers.
After clean room inspection ACE Data Recovery engineers found out that four drives were opened and had debris inside. Initial drive diagnostics revealed that five drives had failed head-rack assemblies. Engineers started the recovery process by imaging all viable drives then rebuilding the damaged drives in a clean room; images were acquired from these drives as well with only minor errors because of ECC failures. During the imaging process one head-rack assembly died causing light media damage.
Our next step was logical analysis. We found that two drives were stand alone and not active members of the RAID. The metadata from the disks identified the original configuration as ZFS RAID 10 on 14 physical disks. Four disks were partially overwritten and other disks were split pairs. The missing mates were the failed-disk. After matching the split pairs and repairing the metadata, we reassembled the ZFS RAID and started scavenging for storage volumes.
As mentioned earlier, the file-system signature was identified as ZFS. Because of the amount of physical disks and disk sizes, the total space was 28TB. It took a lot of time to scavenge these volumes and extract the two iSCSI LUN volumes which were 15TB and 10TB in size. Inside these logical volumes, files were corrupted by incomplete CHKDSK routines run against the NTFS partitions. We had to scavenge every partition to extract requested files because additional corruption was found due to damaged sectors. This process was very time consuming so we had to custom configure special servers to speed up the process. After scavenging was completed we identified approximately 80 million files on the first LUN and approximately 28 million on the second.
The senior administrator continued his story:September 2019, we received incredible news. Don had explained that ACE was using a technique to rebuild the failed, high-density SAS drives. This is a proprietary solution developed at ACE which DriveSavers does not have. Don never promised us anything until he was completely certain our archive was recoverable. Then, to our amazement, screenshots of our restored volumes arrived, showing the structure of the lost data. Attached were files from various projects, proving the image was real. Don was able to declare a 99-100% recovery with the tremendous effort of his tenacious engineers, who refused to give up.
This multi-level recovery was extremely complex and time consuming and couldn’t be performed by just any data recovery firm. It required expertise in storage hardware, file-systems, data layout, clean room techniques and RAID reassembly. So if you have to decide where to send your failed RAID, please think twice because not all data recovery companies are equal.