Q&A with ISIS: Outsourced Backup

Q: What do you think of outsourced backup solutions? Are they secure? Would you use one? I want to backup my data but I’m not sure I can trust an outsourced backup provider.

A: If you’re concerned about keeping the data in your possession, why not do that? The ready-made NAS market is starting to mature and they’re great for use as backup appliances. I wrote about this previously when I bought a Synology CS407. Now the market has changed a bit and I see 4 good options: Synology, Netgear (they bought Infrant), get an Apple and start using Time Capsule + Time Machine, or even FreeNAS as soon it will be based on FreeBSD 7 and use ZFS internally (can’t get much better than that).

But then what happens when I click the wrong button and delete my data (if I’m a terrible sysadmin), or when someone breaks into my apartment and steals my hardware? Maybe I don’t want to spend as much money and maybe, this sounds like you, I’m just more concerned with the availability of the data and the confidentiality issue is just a distraction. If someone breaks into Amazon S3 (or maybe they already work there) will they care about your data when they find it? Or, if someone is after your data, are they going to want to/be able to break into Amazon S3? If it were me, I’d be going after your data before it leaves your laptop, it’s an easier target. And besides, there are ways to mitigate your risks to confidentiality by using something like TrueCrypt or PGPDisk (two passwords to get your data isn’t so bad). Rather, you’re defaulting to an outsourced backup provider because they:

  • are better sysadmins than you
  • have more reliable hardware and systems than you
  • have lower overhead costs than doing it yourself (probably the biggest motivator)

Given that an outsourced backup provider is only as useful as the above 3 services it provides, it’s important not to choose solely based on cost. Their value decreased rapidly with the possibility that they may not be as big and distributed as you thought, they have less expertise than you thought, or even if they get taken over in an acquisition. Hence, this list immediately, unconsciously recommended only Tier 1 backup providers like Amazon S3 who we know are a) experts and b) will be around as long as we need our data.

With all that out of the way, the big question becomes: Would I use it? And the answer is: absolutely not. First let me state that I’m a little bit of a hypocrite as I obviously outsource my e-mail, and for some reason, all of us completely underestimate the confidential nature of our *communications* online relative to the *stored* data on our hard drives. Even when we recognize this, there is little we can do to protect the confidentiality of our e-mail in the hands of others as the medium is incredibly difficult to encrypt.

So why not backup? I’m not at all comfortable with the level of control or the level of visibility of my data. All I can do to check on my backups is log in to S3 and see that they are still there. I don’t know if one more HD failure will pop them into nonexistence, I don’t know if they’ve been compromised, I don’t know if they are planning downtime, I don’t know if they are rebuilding part of their infrastructure, I don’t know anything. In other words, I am not in control of the risks I *have to* accept when using their service.

Contrast this to the Linux NAS in my apartment, where I control the level of risk I’m comfortable with. I can see when HDs fail (none have, go Seagate!), I can SSH in and view logs if I think I’ve been compromised, I’m in complete control of what I do with it. If I want more reliability, I can add another hard drive. If I want less, I can kick it really really hard when I get angry. If my appliance dies, I can take the HDs to Ontrack and get them recovered. You can’t do, control, or even know about these things with an outsourced provider.

Hopefully this will give you enough information to make an informed decision regarding how you want to backup your data.

-Dan

Would you use an online backup solution?

View Results

Loading ... Loading …
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • NewsVine
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • YahooMyWeb
  • Facebook
  • Google
  • Pownce
  • TwitThis
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 3.0 License.

4 Responses to “Q&A with ISIS: Outsourced Backup”


  1. 1 Brad Schonhorst

    Dan’s answer was very thorough and definitely made some excellent points. I would just like to add two other ideas to what has already been stated about backup solutions.

    I think it is important to consider where physically your data will be stored as well. Google for example, presumably hosts my email data in climate controlled data centers around the world and that data is stored many times over. Should one data center be wiped off the map, I won’t lose anything. (Most likely, I am making some HUGE assumptions here about what I know about the google file system rather than any guarantees from Google. They offer no guarantees at this point and technically their services are still considered beta versions.)

    However, no matter how secure I make my hard drives at home in my NYC apartment, it only takes one careless tenant to start my building on fire - or if you’re from the midwest like I am, one tornado to ruin your day. I suppose another option would be to exchange NAS’s with a friend on the other side of town (or country) and backup locally as well as across the net to a colleague’s network.

    On the other hand, internet backups, either to your friends place or a provider, require bandwidth. I looked into some internet backup solutions for a network I manage and we simply didn’t have enough bandwidth for the amount of data we want to backup. It was well outside the budget to add or expand our current internet connection enough to accommodate the large amount of data.

    To answer your original question, I prefer to use a combination of back solutions for different types of data. I think that the confidentiality risk is worth accepting for certain things (like family photos) but probably not for others (like financial documents.)

  2. 2 Michal Piekarczyk

    I remember this Professor Simson L. Garfinkel was using Amazon’s S3 online backup system for a very large Naval Postgrad School installation. He was very satisfied. He was using this for hard drive and hard drive fulls of peoples’ private information. ( his presentation details are below ).

    Sounded almost as good as an endorsement but without the money.

    **********************************************************************
    > Title: The Drives Project: From Disk Forensics to Media Exploitation
    >
    > Speaker: Simson Garfinkel, Naval Postgraduate School
    >
    > Time and Location: Monday 10/1 at 11am in LC102
    >
    > Abstract:
    >
    > A hard drive is a window into the past and a door into the mind. Using
    > forensic techniques the data on a hard drive can reveal who broke
    > into a
    > computer system, what was done, and the perpetrators. Hard drives have
    > proved so useful that they are now routinely seized or imaged in DoD,
    > intelligence, law enforcement, and even civil actions. But
    > analyzing the
    > information a hard drive today takes the time of a skilled analyst;
    > today’s
    > tools lack significant automation and intelligence, and frequently
    > crash.
    > As a result there is a large backlog of hard drives waiting to be
    > analyzed;
    > important information is easily missed or not analyzed for months
    > after it
    > is acquired.
    >
    > This talk discusses the work to date of the Drives Project, a 9-
    > year (and
    > counting) effort that is creating a large-scale collection of real
    > disk
    > drive images, open source tools, and new techniques for automatically
    > processing data recovered from disk drives and other kinds of storage
    > devices. Today the Drives Project has assembled a corpus of more
    > than 1000
    > forensically interesting images from hard drives and USB storage
    > devices
    > that were collected all over the world. We have created open source
    > formats,
    > tools and algorithms for automatically analyzing this data in bulk and
    > rapidly producing answers to questions that are relevant to the
    > Defense,
    > Intelligence and Law Enforcement communities. The Project is now in
    > the
    > process of dramatically expanding the global reach of data being
    > acquired and
    > exploring new research opportunities for using this data.
    >
    > Bio:
    > Simson L. Garfinkel is an Associate Professor at the Naval
    > Postgraduate
    > School in Monterey, California, and a fellow at the Center for
    > Research on
    > Computation and Society at Harvard University. Dr. Garfinkel has
    > research
    > interests in computer forensics, the emerging field of usability
    > and security,
    > and privacy. He is the author or co-author of fourteen books on
    > computing.
    > He is perhaps best known for his book Database Nation: The Death of
    > Privacy
    > in the 21st Century. Garfinkel’s most successful book, Practical
    > UNIX and
    > Internet Security (co-authored with Gene Spafford), has sold more than
    > 250,000 copies in more than a dozen languages since the first
    > edition was
    > published in 1991.
    >
    > Garfinkel received three Bachelor of Science degrees from MIT in
    > 1987, a
    > Master’s of Science in Journalism from Columbia University in 1988,
    > and a
    > Ph.D. in Computer Science from MIT in 2005.

  3. 3 Michal Piekarczyk

    Actually I should deny or criticize what I wrote. I looked at one of Simson L. Garfinkel ’s paper reviewing Amazon S3 and it looks like he claimed Amazon’s service uses standard privacy techniques, but has potentially bad authentication. But, this was back in early 2007 [1], so maybe they have changed their ways.

    He pointed out that the service uses HMAC to authenticate all transaction requests sent over to S3. Any kind of write request is hashed with SHA-1 along with a 20 character ID, 41 character secret key and a time stamp. Amazon checks the time stamp to see if it has already been issued and it also has your (ID,key) so they check if you are who you say you are. This is standardized, but he also noted someone could reset your password if they had access to your email. Actually, because most email is not encrypted, I think someone just has send a reset request on your behalf and then sniff your traffic, waiting for the reset hyper-link.

    But, data you send to Amazon is at least exchanged with SSL, so your data is encrypted until… someone has your key. SSL ’s key exchange between you and Amazon is of course unrelated to your Amazon ID.

    [1] Garfinkel, S. “Commodity Grid and Computing with Amazon’s S3 and EC2,” ;LOGIN:, February 2007, pp. 7-13, Usenix.

  4. 4 Michal Piekarczyk

    Let me also add to the network backup reliability Dan and Brad talked about, again about S3. I omitted mentioning another Garfinkel observation, again from early 2007, that Amazon doesn’t provide a Service Level Agreement backing up its backup claims, nor does it provide an emergency recovery alternative.

    But, as is usually the case, Amazon has in fact issued a SLA giving you some partial rebates to your future charges with S3 [1]. If their internet uptime is less than 99%, you get a 25% rebate to your future month’s charges and 10% if it’s between 99% and 99.9%. The uptime is actually not what it sounds like. They don’t give you any rebates if they have planned “downtime”, they only consider downtime to be only an average error rate you may experience. But they take the average over all 5 minute segments over a month, so you might get back very little and there’s still no recovery assurance.

    But I would still go for it if S3 was something I could afford. A few cents per gigabyte sounds okay to me. I’m okay with the risks. There are always risks. Even though Amazon doesn’t really let you blame them for many, they are probably worth the convenience you get from not worrying about your own data storage.

    [1] Amazon S3 Service Level Agreement, Oct, 2007.

  1. 1 Q&A with Dan: Outsourced Backup at dan foo!
  2. 2 Multiple Vulnerabilities in ALL Synology Products at ISIS Blogs

Leave a Reply