submit
About us T-Shirts Archives Contact Form Advertise here Store Blog Features Video BBS Twitter Facebook Tumblr RSS

Boing Boing 

Tahoe-LAFS: a P2P filesystem that lets you use the cloud without trusting it

By Cory Doctorow at 10:32 pm Fri, Feb 5, 2010

SHARE TWEET STUMBLE COMMENTS
Zooko sez,
Tahoe-LAFS is a p2p filesystem. You pool your spare hard drive space together with that of your friends. This forms a distributed filesystem which endures even if some of your friends' computers are unreachable. Everything is automatically encrypted, so backing up your files onto the distributed filesystem doesn't necessarily mean sharing the files with your friends. But, it is easy to share specific files or directories with specific friends.

It comes with a command-line interface and a web interface. If you choose, you can allow remote HTTP clients to connect to the web interface. We've configured our test grid to do that so that you can take Tahoe-LAFS for a test drive just by clicking here.

Please try it out and contribute bug reports! We are an all-volunteer project of Free Software hackers in the public interest. We need encouragement, love, and bug reports.

This looks like some exciting stuff! From the announcement:
In addition to the core storage system itself, volunteers have developed related projects to integrate it with other tools. These include frontends for Windows, Macintosh, JavaScript, and iPhone, and plugins for Hadoop, bzr, duplicity, TiddlyWiki, and more. As of this release, contributors have added an Android frontend and a working read-only FUSE frontend. See the Related Projects page on the wiki [3].

We believe that the combination of erasure coding, strong encryption, Free/Open Source Software and careful engineering make Tahoe-LAFS safer than RAID, removable drive, tape, on-line backup or other Cloud storage systems.

ANNOUNCING Tahoe, the Least-Authority File System, v1.6 (Thanks, Zooko!)

(Image: King Cloud, a Creative Commons Attribution ShareAlike photo from akakumo's photostream)

Previously:
  • Grooveshark -- DRM-free P2P music -- pays uploaders - Boing Boing
  • Verizon teaming up with P2P companies, Yale, to make file-sharing ...
  • Boing Boing: Congress moving to criminalize P2P
  • The military applications for P2P - Boing Boing
  • Christian P2P: is it a sin? - Boing Boing
  • P2P Spam Filter - Boing Boing
• Discuss this post in our forums

20 Responses to “Tahoe-LAFS: a P2P filesystem that lets you use the cloud without trusting it”

  1. zog says:
    February 6, 2010 at 5:17 am

    Could this be a “mojonation” that’s workable?

    Reply
    • Anonymous says:
      February 10, 2010 at 8:39 am

      It literally IS mojonation, just a few versions newer. (ok ok, there may have been a rewrite but it’s the same thing)

      Reply
  2. Day Vexx says:
    February 6, 2010 at 5:24 am

    In English, please?

    Reply
    • crashsystems says:
      February 6, 2010 at 11:04 am

      With traditional backups, you store a copy of your own files on external media. The problem is this is that media could get lost, stolen or destroyed (your house could burn down). You could keep another copy at the office or a friend’s house, but it could still get lost or stolen. Also, you would have to trust your friend or coworkers not to take a peek at your data.

      There are so called “cloud based” backup solutions, such as Amazon’s EC2. They provide backups over the Internet, and have multiple geographically distributed backups. This is a great solution, but there is still the problem that you must trust Amazon not to take a peek at your data, or to let others (such as the feds) do so.

      What Tahoe-LAFS does is implement the same type of distributed remote backup technology that other cloud backup services do, but it also has a layer of encryption on your data designed so that you are the only person who can view that data. That way you get the privacy of personal at-home backups, but the reliability and redundancy of professional remote backup services.

      Reply
  3. Anonymous says:
    February 6, 2010 at 9:16 am

    I currently accomplish “safety” in the cloud by using the duplicity tool to store data at a standard provider (in my case, rsync.net). The only problem with this is that I am effectively locked into this particular provider, as very few providers offer plain old SFTP as a transport.

    I’d love to use Tahoe-LAFS with an established, fixed infrastructure. They’ve done some clueful FOSS-related things in the past, and I wonder if they would implement this…

    Reply
  4. Anonymous says:
    February 6, 2010 at 12:54 pm

    SFTP exists – but it’s not working ‘out of the box’ quite yet.

    http://allmydata.org/trac/tahoe/browser/src/allmydata/frontends/sftpd.py

    http://allmydata.org/trac/tahoe/ticket/531

    http://allmydata.org/trac/tahoe/ticket/645

    Reply
  5. David-Sarah Hopwood says:
    February 6, 2010 at 1:33 pm

    zog: Tahoe is designed by some of the same people who developed MojoNation and Mnet (including Zooko). In fact it’s the second or third rewrite of the MojoNation code base, after Mnet and “Mountain View”. This article is slightly dated but gives a good summary of the design.

    Anonymous: Tahoe does support SFTP. Currently this is a bit difficult to set up and may have some bugs, but I think getting SFTP working properly is likely to be a priority for the next version. allmydata.com is a commercial backup provider that uses Tahoe-LAFS.

    Reply
  6. Anonymous says:
    February 6, 2010 at 3:24 pm

    This is definitely one for the gearheads.

    Reply
  7. Anonymous says:
    February 6, 2010 at 9:50 pm

    This sounds remarkably similar to LOCKSS, a project out of Stanford.
    http://lockss.stanford.edu/lockss/Home

    Reply
    • dshr says:
      February 7, 2010 at 8:45 pm

      Not at all like the LOCKSS system, which is a tool allowing libraries to collaborate to preserve published, copyright material for the long term (with permission from the copyright holder), not a backup system. The LOCKSS system does not use either erasure coding or encryption, both of which are dangerous for long-term digital preservation.

      Reply
      • Anonymous says:
        February 8, 2010 at 9:18 am

        I’m curious, how is erasure coding dangerous for long-term digital preservation?

        Is it just that a large number of full copies is preferable?

        My understanding of the purpose of erasure coding is that it allows a flexible (at coding time) trade-off to be made between redundancy and storage size. I.e. without erasure coding you can have redundancy in whole number increments at a storage cost of the size of a full copy on each node. With erasure coding you can choose to require only n out of m sources at a storage cost less than n (or is it m?) full copies.

        I can see how in the very long term, where it’s possible that less than n nodes survive, and the encoding scheme may not be in common use any more the material could be lost where even one full (and not otherwise encrypted or even just anachronistically encoded) copy would preserve it…

        Reply
      • Anonymous says:
        February 8, 2010 at 9:48 am

        I’m one of the contributors to Tahoe-LAFS. I disagree that encryption and erasure-coding are dangerous for long-term digital preservation. Making sure you don’t lose all copies of the decryption key is easier than making sure that you don’t lose all copies of the file because the key is smaller. Engrave it on a steel plate.

        In fact, encryption can make long-term digital preservation safer, because it allows the set of people whom you can ask to store the ciphertext to be larger than the set of people who are allowed to read the files.

        Likewise with erasure coding — I’ve read papers by archivists arguing that fancy data formats are a potential problem, and I appreciate the argument, but the erasure-coding used in Tahoe-LAFS is bog standard Reed-Solomon, which was invented decades ago and has been implemented many times. I believe the added robustness of being able to lose most of your storage and still recover all the data is worth cost of a layer of Reed-Solomon.

        Cyborg archaeologists digging through our rubble a hundred years from now are going to have no problem with the erasure coding. They might have a problem with the encryption, so make a couple of duplicates of that steel plate.

        Reply
  8. RevEng says:
    February 6, 2010 at 11:42 pm

    This reminds of the the Freenet project. It’s not exactly the same, but it has a lot of similarities. It’s a P2P system, using distributed hosting of files, to ensure high reliability, but instead of adding encryption, they add anonymity. The idea is that people could post dissenting or otherwise questionable content without fear of censorship or retribution.

    Reply
  9. DSMVWL THS says:
    February 7, 2010 at 5:04 pm

    Concerns:

    1. Could you be held liable for unknowingly hosting material on your system — e.g., someone else’s child pr0n?

    2. Could your computer be confiscated by authorities investigating a crime committed by someone else, whose info might have been stored on it?

    Reply
    • Dewi Morgan says:
      February 7, 2010 at 5:55 pm

      1. If you’re not knowingly distributing any illegal material, then you can’t get in trouble for doing so.

      2. Sure you could, if they needed your logs for network analysis. They might return it though. Maybe even in once piece. They might not even install any back doors on it.

      Freenet is a bit like this, but includes anonymity as well as encryption, so you would be unlikely to get your machine seized.

      Reply
    • Anonymous says:
      February 8, 2010 at 9:09 am

      There is no requirement that you host any files for anyone to use this. Unlike, say, bittorrent, you are welcome to be purely a client and not also provider in a Tahoe-LAFS system.

      If you want to contribute storage in a reciprocal manner but are concerned about such things then, as suggested in the article, consider setting up a “friends net” where you and some friends or family supply reciprocal backup storage. That way you can be reasonably certain that your buddy so and so or your uncle whats his name are not going to expose you to such liability (because you know and trust their character) but also be assured that your data is safe from snooping — not so much by your friends and relatives as by anyone who may have compromised their computers. I certainly trust my friends and relative’s moral judgment much further than I trust their IT security prowess!

      Reply
  10. Dewi Morgan says:
    February 9, 2010 at 5:16 pm

    Encryption of secrets is really good for preserving them both the secrecy and the data!

    Encryption of public data is counter productive and even destructive.

    So say, a library digitising its collection would be required to encrypt those parts not in the public domain, if it’s allowed to even digitise in the first place.

    But the PD parts would be best preserved by publishing publicly.

    There’s nothing at all wrong, for secrets or no, with Reed-Solomon. If the future people can’t interpret R-S, then they’ll never even understand CDs!

    Reply
  11. Anonymous says:
    February 10, 2010 at 4:13 pm

    Have you considered more modern erasure coding systems? Particularly fountain or LDPC codes? Fountain codes seem particularly well suited to this type of problem.

    Reply
  12. Zooko Wilcox-O'Hearn says:
    February 10, 2010 at 9:31 pm

    Yes! An ancestor of Tahoe-LAFS (the first version of allmydata.com, which was never open sourced) used Digital Fountain’s proprietary erasure codes. However it turns out that this provided no actual advantage over Reed-Solomon in the Tahoe-LAFS architecture. Reed-Solomon (with the zfec implementation that we use) is sufficiently fast that it is hard to even measure the time spent doing erasure coding on a live server.

    Reply
  13. @arcieri says:
    February 17, 2010 at 8:16 am

    If security and privacy of data is a concern, and it should be, companies of all sizes should familiarize themselves with Next Generation Backup. If you run a few servers I assume you are virtualized already. Online services have a big bottleneck wen backing up increasing numbers of VM’s

    If you add a need for replication to the equation, then the case for backup to disk is more compelling.

    If you add data deduplication on top of that, it is actually easy to prove that TCO is lower, complexity is less and automation is easier when you use a Next Generation Backup approach.

    Reply

Leave a Reply

Click here to cancel reply.

Are you a member of The Oregon Trail Generation, the last before mainstream social media?

Real-life eyeshine surgery trial

8-bit instant photo gun

Get the complete Arduino starter kit & course bundle for 85% off

Universal tablet stand in the Boing Boing Store

Boing Boing Store: The Surface 3 Giveaway

Technology

BOING BOING

Submit a tip
About Us
Contact Us
Advertise here

FOLLOW

Facebook
Twitter
Tumblr
RSS

Terms of Service

The rules you agree to by using this website.

Privacy Policy

Boing Boing uses cookies and analytics trackers, and is supported by advertising, merchandise sales and affiliate links.

Community Guidelines

Who will be eaten first?

EDITORS

Mark Frauenfelder
Cory Doctorow
David Pescovitz
Xeni Jardin
Rob Beschizza

Jason Weisberger, Publisher
Ken Snider, Sysadmin


Creative Commons License