We Need Simple Backup Solutions for Complicated Data

It’s that time of year when our thoughts of New Year’s resolutions are just beginning to fade. So let me remind you of one resolution you should probably keep. Do have backups of your irreplaceable data? Are those backups recent enough that you would not loose anything serious? If the answer to either of these questions is yes then congratulations, you are solidly in the minority. Could you restore or work from those backups and not lose more than a couple of hours of work? If so, then you are in great shape, but hopefully I’ll still have something for you in this article. I will talk about online backup and storage services that serve as excellent complements to disk-based storage.

The Mean Time Before Failure (MTBF) ratings of hard disks are largely disconnected from reality. Common street wisdom says you are lucky if your drive makes it 30 days past its warranty period. As many have said, the question of data loss is when, not if. Once you assume that data loss is inevitable, then backups are clearly essential—unless you are prepared to sacrifice your email, photographs, bookmarks, draft letters and other data to the great data cemetery in sky.

The problem of course is that our data and thus our backups have gotten far more complicated in recent years. Increasingly, portions of our data and even the backup themselves are stored in the cloud—data and services that reside in a collection online services, network drives, virtual servers and places we often just refer to as online.

These days, I suspect most of us have complicated data, some of it is stored on home desktops or laptops, some on work machines, some of it online in the cloud, some backed up and some not. We may have multiple copies of sections of data and yet we may not have any copies of other data. Complicated data lends itself to complicated backup solutions. We often have good intentions about complicated solutions, like New Year’s resolutions, most of which we never get around to.

Online Backup Services

Online backup services are a viable option for many people to backup at least their critical data. They are relatively inexpensive, commonly about $50 USD a year for 50 Gigs of data and are a good compliment to other backup options. I currently do not recommend using an online service as your sole backup option unless you have less than 10 GB total as larger collections may take a week or more to upload, depending on your broadband connection, and many days to recover a most of the data. I do recommend keeping multiple backups and multiple types of backups at least a clone and an incremental backup, just to insure against accidental failure of the backup itself.

Most collections of emails and word processing documents, spreadsheets, and presentation decks are of relatively modest size and thus simple and fast to backup. Music and photos take substantially more space with video being even larger. Current online services are less ideal for very large collections. Another issue is that while many services can make backups to a local disk, they do not backup the entire disk and thus there is no way to do a bare metal recovery. You will still want to keep a clone of your hard disk to dramatically reduce the amount of down time in case of a disk failure.

Some common online backup services are Carbonite, CrashPlan, Jungle Disk and Mozy. Carbonite and Mozy both work with Windows and Mac OS X, while CrashPlan and Jungle Disk work with Windows, Mac OS X and Linux. My current favorite is CrashPlan. The software seems easy on my system resources and the UI is reasonably well done. One interesting feature is that CrashPlan has a peer-to-peer mode where you can backup to another local machine, even on a different platform, or to any other machine accessible on the Internet. CrashPlan even lets you back up to a local hard disk, carry that disk to a friend’s house and continue the backup from that point, thus potentially saving weeks of waiting for the initial transfer. A single CrashPlan instance can accept backups from multiple clients. There is a business version of CrashPlan that will let you manage multiple clients in small business settings. CrashPlan comes in two versions, a $60 USD version that will make continuous backups as files change and a free version without continuous backups. The online CrashPlan backup service costs $50 USD a year (for 50 gigs), which is similar to Carbonite. CrashPlan says they plan to offer storage to Amazon’s S3 in the future.

Jungle Disk offers many features similar to CrashPlan and Carbonite and offers the additional feature of working as a general network drive. The service stores your data on Amazon S3, which will give you the benefit of only being charged for the data you use. However, this means that billing is month-to-month, you can not simply pay a yearly fee and be done with it. Jungle Disk also offers a feature called Jungle Disk Plus which is $1 USD a month and gives you the ability to do partial file updates, restart transfers, potentially faster file transfers and optionally obtain access to your files from a Web interface. I found Jungle Disk’s user interface and billing slightly more complicated than CrashPlan, but a solid choice overall. You will also have to set up an Amazon S3 account. Jungle Disk plans to use Amazon’s upcoming aggregated storage feature in the future to simplify billing.

Mozy is the one service that offers unlimited data storage for a single price, although only for its home users and not for business users. Unfortunately, I find Mozy has become increasingly problematic, especially on the Mac and I have seen it fail repeatedly without adequate warning. Mozy is the most resource intensive of the bunch, often dramatically slowing down my Apple notebook when calculating a backup.

Amazon’s Simple Storage Service (S3) is increasingly used as the storage back-end to many Web-based services. It is possible for consumers to store data on S3 directly, however the service is not yet particularly consumer friendly. The most sophisticated application for end users to work with S3 and Amazon’s CDN CloudFront is probably Bucket Explorer, a $50 USD java application that works on Windows, Mac OS X and Linux. Bucket Explorer’s UI is serviceable, but could be simplified. The application provides control over low level details of S3 buckets—the way Amazon partitions space for individual users. The software comes with a command line tool called Bucket Commander that can automate transfers to S3. S3Fox is a free Firefox extension that allows you to work directly with S3 and CloudFront.

Cloud-based Storage

Another compliment to online backups is cloud-based storage. Originally, these services were typically called network drives and were little more than a virtual flash memory drive, however recent offerings are much more sophisticated. Services such as Microsoft Live Mesh, Apple’s MobileMe, SugarSync and Dropbox offer replication between the local disk and the cloud-based storage.

My current favorite online storage service is Dropbox. I like the service because the user interface is largely invisible. The software works like this; Dropbox appears as a local folder and you simply copy documents in and out of it. These documents are then automatically synchronized with the Dropbox service in addition to any other machines you have running the Dropbox software. As files change, Dropbox keeps copies of the changes allowing you to retrieve older versions of files or retrieve files that have been deleted. Dropbox also makes it easy to share files with other people, you simply share a file with an email address of another Dropbox user and the file will appear in their Dropbox folder. Dropbox is free for up to 2 gigabytes of storage for $10 USD per month or $100 USD per year for 50GB of storage. Overall the service is impressive, however it would be nice if they provided better tools to clean up extraneous versions of files after a time and if they added additional pricing levels. Dropbox uses Amazon’s S3 Web service for file storage.

As a final word of caution, I advise you to never blindly trust your backups. At the minimum, always test them by attempting to recover at least a small bit of data periodically. If you have never tested your backup, how can you really know it exists? It would be painful to think you have viable backups until the point at which you actually need to recover something and then find that your backups were never actually there.

* This article originally appeared as The Need for Simple Backup Solutions for Complicated Data in the in the February 2009 issue of Messaging News magazine.

You should follow me on Twitter.