Tech Meltdown 2011

110815 harddrive

Long time readers might recall a very bad day in 2005 where we had a hard drive die, successfully rescue it and in general, pull a lot of hair out. This was a hard drive that held photos of Chuck and Leta from their entry into my life. Back then, there was 40 gigabytes of critical life data that was in danger of being lost. Hard drives have ballooned to many times that size, so recovering lost data takes even longer.

Last week, I decided to plunge in and upgrade my operating system to Mac OS 10.7, aka, Lion, aka anal probe. I didn’t realize this at the time, but this upgrade occurred almost 6 years TO THE DAY after the aforementioned tech meltdown. I should have known better than to tempt any of the surviving fates. The upgrade appeared to go okay. I spend a few hours upgrading software I forgot to upgrade prior to the install of 10.7; drivers, utilities and apps that I rely on to work productively. I am lucky that most of the drivers and software I need is represented to be 10.7 “ready”. I begin the following day to “do work” on the “computer” that I “upgraded”. I’m noticing some slow downs, but I attribute that to the upgrade as opposed to any other of a million things that might cause a computer to seem slower. New system things like re-indexing the hard drive can slow things down, so no big whoop.

I decided to further tempt fate by choosing to do a project in the recently upgraded Final Cut Pro X, an app that worked great for me when I did the work from home video for dooce​.com. Seemed faster than the old Final Cut Pro. I’ll talk about Final Cut Pro X at a later time, so save your questions. As I began to pull in source video files from my scratch drive, Final Cut became suddenly extremely crashy. Not good. Not good at all. I do a lot of searching and a lot of reading of horror stories. This is mine.It’s at this point that I opened up the nerd-dilettante and friendly Console.app. I notice that in between crashes, the kernel of the OS is spitting out the following message:

8/13/11 4:42:05 PM kernel disk2s2: I/O error.

110815 console app icon

This is not a good message. Ever. Note the best answer in this thread on Apple’s support discussion site. I spent half a day trying to get Final Cut Pro X to simply keep event files and not jack the project I was trying to start. It was hopeless. I figured that something in the upgrade hosed the drive and I needed to roll back my system. I had tried Safe Boot, Single User Boot and a number of other tricks to “fix” the bad drive. I tried the 10.7 Recovery Tool. I decided that getting work done was more important than beta testing 10.7. Since I had completed a project in Final Cut Pro X, I was less concerned about that being the culprit.

A lot of things have changed since 2005. One of them is that Apple introduced a great app called Time Machine. Time Machine automatically backs up your hard drive. A lot. I use Time Machine as a first course of action for situations exactly like this. Problem: upgrading to 10.7 and then rolling back to 10.6.8 filled the Time Machine drive nearly full. I was able to roll back successfully and start working again. For a short while. Until Saturday, when I realized that I was facing a near catastrophic hard drive failure. Even copying the critical files I needed (source video) was spitting out I/O errors. I stopped rying to copy files and ran to a local Mac store (Apple stores don’t sell internal drives) to buy a 2 Terabyte hard drive.

I formatted that drive and found, on the Time Machine drive, a backup of the scratch drive. An incomplete backup that Time Machine CAN’T SEE, but I can via the Finder. I’ve spent two days reconstructing the data from the Time Machine backup. Before I assign the malfunctioning drive to the scrap heap, I wanted to ask: has anybody used anything besides Disk Warrior to rescue data on a Mac?

Finally, I usually recommend that backing up files onto a second drive, while great, isn’t enough. For any mission-critical (yes, photos of family and kids is mission-critical), it’s better to back them up on two separate drives and then have a third drive offsite. We usually follow this. Except on the scratch drive. Since we’ve been doing more videos for dooce​.com, the scratch drive was the only drive large enough to hold all the source video and captured (and processed) video. And I had backed up some of the files to our ultra-redundant Drobos. Except for the ones that the previous version of Final Cut Pro hides inside a folder called “Capture Scratch”. There sat over 300 gigabytes of processed and rendered video files. Fortunately, the freaky, now visible Time Machine backup has all those files. I’m very lucky. I figure I’ve lost a fraction of what I could have. I’ve lost several days of time, which frankly, is the worst part about the 2011 Tech meltdown.

If you use a computer: Back up your data. The data is worth more than the computer, worth more than your time. After I get it all recovered, I’m going to do a major backup and spend another day getting the new scratch drive sorted. And then I’ll buy another, even more massive drive to do weekly backups of my whole system; the boot drive, the Time Machine drive and the scratch drive. I’ll be moving most of these files over to the Drobo. Normally, I copy all my photos and any other files I work on to the Drobo. That Drobo wasn’t fast enough to do video editing, which is why I bought the scratch drive in the first place. I need to automate copying files from the scratch to the Drobo, as the Drobo is backed up by another Drobo. Precisely for times like this when drives fail. This was one of the to do list items that kept getting pushed to the bottom because I wasn’t editing video or hitting the scratch drive that hard.

Final note: This is the first Western Digital drive I’ve had that failed. I know, many of you have horror stories about any and all drive manufacturers. I have them as well. However, I’ve had an extremely good run with the Western Digital Caviar Black internal drives.

  • http://epcostello.com/ e.p.c.

    Never, ever, ever install the first release of any O/S on your primary system. 
    Am deducting two points of geek cred from your balance for that alone.

    I just assume drives will go bad.  disk drives in laptops will definitely go bad, no matter how well you pad and carry the laptop, there’ll be that one day that you thought it was safely asleep and the laptop fell, only slightly you say to yourself, and it turns out that that was the one day that OS X (or windows) decided “nah, not going to sleep for awhile, let’s just run the battery down”.

    Do you use different drives in the Drobo(s)?  ie, I have a mix of Seagate and WD drives in my drobo pro, simply to avoid the possibility that a manufacturing flaw in one brand takes out the entire rack.

    Another thing to consider in the drobo and RAID drives in general is to physically change/cycle the drives after awhile.  I have not done this yet with my Drobo, but am thinking that after ~3 years of use to start rotating in new physical drives and use the “old” drives as spares.

    Is the bad backup in a separate .sparsebundle on that drive or is it just showing up as a standard time machine directory hierarchy?

    • http://blurbomat.com blurb

      I had been running Lion on my laptop with no problems. Using the newer video apps, I was hoping for a performance boost. I will take your nerd point reduction with my head held low.

      The Time Machine directory shows up as a standard directory. I’m missing about 100 gigs, and most of that will likely be recovered once I finish the project I’m working on and can spend a day letting Disk Warrior work the magic. I have stayed with Western Digital Caviar Blacks (Heather’s Drobos have WD Caviar Greens in them) and a few months back, I started swapping the 1 terabyte drives out for 2 terabyte ones. I’m halfway through that project. I’ve heard about mixing vendors as another way to avoid catastrophic issues. The hard part in mixing drives is finding reliable reviews. Seems like everybody has a horror story or two. My old Time Machine drive from way back was one of the ganked Seagate ones that were guaranteed to fail without a firmware update. I yanked that out and threw in the current 2 terabyte drive. It’s worked great so far… knocking on wood right now.

      • http://epcostello.com/ e.p.c.

        I’ve had zero complaints about both WD and Seagate drives in the drobo (I guess I lucked out on the firmware thing), had loads of problems with Seagates as external drives.  Have yet to try the Hitachi (neé IBM) drives.  

        How are you attaching the scratch drives? I know USB 2 is supposed to be as fast or faster than FW800 but I consistently got better sustained speed out of Firewire attached drives than USB.  Now I’m stuck on a MBP with only USB and regretting that slightly.

        Just noodling here, but if I was in this situation I’d probably try for a low-level copy of the drive as suggested in other comments, and then try to rebuild the inode/directory tree using DW.  

        I’m a little confused, are the “hidden” files on the scratch drive expected to be files created/saved by Time machine, or a separate set of files that happened to be on the same drive?

        The genius of time machine is that it uses symbolic links, so if a file or directory hasn’t change from one backup to the next, Tm just creates a symbolic link to the “base” file.  The downside of this is that you really don’t have a dozen copies of that file, you have one copy with 11 other entries in the filesystem pointing back to that single copy.  Is fantastic for routine stuff, but if that one file gets hosed you’ve lost “all” of them, and not just the one copy.

  • http://pulse.yahoo.com/_QCJM72ISCFBQFEVDTJTWWPGMKE Duane Pinkerton

    I haven’t re-imaged my hard drive on my mac pro since I got it in 2007.  I decided that 10.7 is a good opportunity to do a fresh install on one of those fancy new SSD drives.  I plan to mount my home directories on my paltry little 250GB drive and continue keeping my real data on the RAID (backed up with time machine and to the cloud, of course)  I decided that just for fun, I’d throw a new blu-ray drive into the mix at the same time. :^) Wish me luck, the drives should arrive today!

    • http://blurbomat.com blurb

      I wondered if I should do that as well when i finally do jump to 10.7. Is there an option in the installer or does one need to hack about?

      • http://epcostello.com/ e.p.c.

        You should be able to do a clean install and then a selective restore from Time Machine. I haven’t done this with Lion yet, but used it to set up someone’s MBA (IIRC, restoring data + apps from an iMac time machine image)

  • http://twitter.com/jessicadennis Jessica Dennis

    Sometimes, and it’s good you didn’t have to go there, I had better luck recovering files from Mac volumes by hooking them up to Windows and using MacDrive to read them. This was when things had gone disastrously wrong (e.g. a user ignoring the click of death for weeks or months) and other Macs found it just too upsetting to look at those drives. I saved some profressors’ butts that way (professors NEVER back up their data, generally). Once I used the freezer trick, which actually did help, followed by connecting to a PC running MacDrive. That hard drive had been *screeching* for weeks, and the user thought that was ok.

    Good times. And yeah, Disk Warrior is totally awesome, and a must-have for any Mac user who cares about his data.

    • http://blurbomat.com blurb

      Thanks for responding. I’m gonna give Disk Warrior a shot before I try the freezer. :-)

      • http://twitter.com/jessicadennis Jessica Dennis

        It’s only a very narrow set of circumstances that the freezer is useful for; Disk Warrior should be able to do its magic and rescue your data.

        What I might be inclined to do if I were you, and thus had data that was actually valuable, is a low-level copy of the drive to another one, in case it’s a physical failure that’s just going to get worse as time goes by. Manufacturers usually provide a utility that will do that, if Disk Utility’s Restore capability balks. Then run DW on that and toss the old drive immediately. Just a thought.

  • http://profiles.google.com/silentgoddess.etsy Lane

    Some may call it woo-woo but I call it Mercury in Retrograde (will be until August 26th). I NEVER do anything techie during a Retrograde as it is just asking for a whole lotta trouble.

  • Rosswog

    This actually brings up a question for me.  Who are you using for the cloud?  Has anyone tried AWS’s service?  

    • http://blurbomat.com blurb

      I can’t begin to think about how long it would take me to copy all our data to the cloud. Months, for sure. I’ve heard that AWS is great, but can get spendy fast. Don’t take that as gospel. Pricing could have changed.

      • Rosswog

        We’ve used Mozy in the past with mixed results, but got notice that AWS was giving out 5 gig of cloud space for free for this sort of thing.  We’ve always kind of had a ‘super critical photos and documents that must never be lost’ file that is about 20G, and I’m thinking these prices here are probably well worth it, if it works as stated.

        At $20-$50 a year it seems worth the relative peace of mind.  My wife always reminds me that my super duper RAID setup doesn’t do much against fire.https://​www​.amazon​.com/​c​l​o​u​d​d​r​i​v​e​/​m​anage/

  • Jimmy Hill

    DataRescue has saved my (and others’) bacon many a time. Well worth the small expense

  • http://blurbomat.com blurb

    Just an update: Finished my project and I’ve been running Tech Tool on the drive. Approximately halfway through the surface scan and so far 103 bad blocks, including  a sparsebundle of my boot drive. I have no idea how that even got on the damaged drive in the first place.

    Time to tighten up my workflow. 

    • http://blurbomat.com blurb

      @twitter-17140443:disqus I ran Disk Warrior as well. The drive needed a surface scan. Running Tech Tool showed a whole bunch of bad blocks. I’ll zero out the drive, but I think I’m going to pull it. I’m wishing that there was a Thunderbolt option for the big local drives, but until then I’m looking at either USB 3 or eSATA as a stopgap.

      And thanks for the tip on the hard drive dock. That’s been on my to do list for a very long time.

      I should add that all of my photography is redundantly backed up every night. It’s the most recent work with video files that wasn’t redundant. That is about to change.

      Thanks for taking the time to comment. I need the butt kick.

  • http://twitter.com/innundatedQ jon quattlebaum

    Jon, also invest in a copy of DiskWarrior.  Much better than TechTool.  (from someone who’s done Mac IT for 20 years)

    Nevertheless, strike two more nerd points for not implementing a redundant backup solution.  Just a simple hard drive dock and a couple of bare 2TB drives to swap out plus Carbon Copy Cloner will get you a full backup twice a month (just clone your drive to them over night on the 1st and 15th) , then Add a third drive as your time machine disk and carbonite or similar cloud based solution.  Combine that with cloning your computer before you ever do any updates and you will never worry about losing data or production time again.

  • http://twitter.com/innundatedQ jon quattlebaum

    Also… Bad blocks (not to be confused with bad sectors)  happen when the directory(where your computer stores the location of every bit of info) becomes dangerously fragmented from use over time.  Disk Warrior run once every few months will help keep this from happening.  Just think of it like oil changes in a car… the more you use it, the more frequently you need service.  I can give you a full-on boring explanation if you’re interested.  jonq at mac dot com.