Bit rot and silly mistakes. Was: I miss my viewfinder

Discussion in 'Digital SLR' started by Elliott Roper, May 23, 2011.

  1. This rambling rant started off as a reply to Andrew Reilly, but it
    wandered off...

    We need some numbers here. The error correction hardware and
    mathematics is remarkably strong. For the purposes of this discussion
    (disk bit rot) we are looking at mis-corrected errors. In other words,
    the chance of a bit being returned in the wrong state without the next
    level up in the hardware and software chain being aware of it.
    I found it hard to find a non-technical article on the web about the
    ecc design of modern-ish disk drives. This one
    http://www.pcguide.com/ref/hdd/geom/errorRead-c.html
    claims p(miscorrection)= 1e-21
    That's a pretty small number. If you ran a disk at full belt, let's be
    generous and say 100MBytes/sec for its entire service life of 5 years,
    it would transfer 1.26e17 bits. You'd need to run 8000 disks at once to
    have an even chance of one error bit going undetected on any of them
    before they reached their service life. Don't try this at home, your
    electricity bill will be $700,00.00 at 10c per KWH. (with nice green
    20w disks)

    You are better off worrying about something else, like your house
    burning down, your computer going berserk during a backup, some other
    catastrophic hardware failure or making a silly mistake. Bits don't
    rot!

    All of this is very close to my heart at the moment. I made a chain of
    silly mistakes and came close to losing all my 30,000 photos despite
    having layers and layers of backup. I use Apple's Aperture. I cycle
    through three vaults, one offsite. You would think I'd be safe? Wrong!
    I was running out of disk space, and the library and the three vaults
    were around 600GB each. So I chucked one vault away, moved some movies
    and music onto the gap I'd made to free up room to grow for the
    remaining vault, which by sheer luck was the oldest of the three.
    Stupid mistake No 1 - weaken your safety net before doing something
    dangerous. Next I edited the backup schedules to backup the movies and
    music from their new location to a sparsebundle on the same 2TB RAID
    the main Aperture Library was on. Then I tested it. Whoops! Instead of
    erasing the sparsebundle the modified script erased the whole RAID.
    Goodbye Aperture library! Stupid mistake No 2 - fail to consider the
    consequences of editing the script incorrectly. So now I'm left with a
    fresh offsite vault and an old one still spinning on an internal disk.
    No problem, I restore the library from the offsite vault (that takes
    many hours, but hey, its only computer time). With the library
    restored, I discover a lot of the masters need reprocessing for the
    updated version of Aperture. I'm unsure whether the library I
    accidentally deleted also had them un-reprocessed or there is a
    software bug in Apple's Aperture and it does not notice reprocessed
    masters when updating vaults. Or is that an undocumented feature?
    Again, just computer time. But wait! It starts snivelling about a
    hundred odd missing masters as it does so. OK, I'm a geek, I know how
    to dive into the package contents of the library and vault and copy
    them over. Oh no! Missing on the offsite vault too! Stupid mistake No 3
    - Failing to test backups. My excuse was the original problem - not
    enough disk space. That, it is now clear, is not an excuse! Now I start
    getting worried. First I buy another two 2TB disks from Amazon, next
    day delivery. I make a new vault from the partially fixed library. I
    restore that to a new library. It tests OK. I restore the missing
    masters from the old vault, it is only luck that none of my new photos
    is missing - or is it a bug in Aperture? Stupid mistake No 3 again. You
    should never trust a database you didn't write yourself. I still don't
    know why the masters went walkies.

    I'm leaving the wreckage alone now. I won't be touching it till I am
    very sure the fixes all worked.
    I'm thinking about writing an Aperture integrity checker application.
    There does not seem to be anything much that does that inside Aperture.
    The file status -> missing filter lied to me. These are some of the
    lessons.
    1. Have enough spare disk to check that restores from backups actually
    work.
    2. Do the restore checks regularly.
    3. Don't trust the software you use. Have an independent integrity
    checker. Freeze a spare backup before going to a new version of your
    software.
    4. Test your backup scripts on throw-away copies. Have enough disk
    space for the throw away results. See lesson 1
    5. Big disk drives are amazingly cheap. A 2TB bare drive is cheaper
    than 500 blank DVDs, holds the same amount, is 10-100 times faster and
    takes 20 times less cupboard space. See lesson 1.
     
    Elliott Roper, May 23, 2011
    #1
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.