What's really on bittorrent anyway?

800px-Leech_bittorrent.png Ed Felten from the Freedom to Tinker blog has written a post with Princeton senior Sauhard Sahi called Census of Files Available via BitTorrent. The survey takes a random sample of files available on a trackerless BitTorrent system. The article is full of caveats--discussion happening in the comments--but does dig into the likely copyright status of the works they found.
"[A]ll files that were available were equally likely to appear in the sample -- the sample was not weighted by number of downloads, and it probably contains files that were never downloaded at all. So we can't say anything about the characteristics of BitTorrent downloads, or even of files that are downloaded via BitTorrent, only about files that are available on BitTorrent."
The final breakdown? File types 46% movies and shows (non-pornographic) 14% games and software 14% pornography 10% music 1% books and guides 1% images 14% could not classify


  1. “14% could not classify”

    I don’t understand. Could not classify /by file name alone/? That seems more like “did not” than “could not classify”. I’d be hard pressed to be unable to classify a file AT ALL after actually downloading it and having a look.

    1. I would say that they mean “Could not classify, without downloading the file.” It appears that they did not actually download any of the files, especially since they point out that nearly all of the files were probably being distributed illegally.

      1. Then the whole study is shoddy. A file’s name need have no relation to its content, which is what the study purports to measure. How many torrents are out there that say “XXX crazy hot porn.rar” that decompress to some virus-ridden EXE?

        This is like the census determining gender counts by going with if your name sounds like a boy’s or a girl’s.

        1. That’s still interesting. A study of the files which appear to be on bittorrent. What we need now is a comparison with the files that are actually on bittorrent.

          I figure half of it is porn and malware.

    2. My best guess would be that the 14% that are unknown just have strange file extensions. Or the could be .rar files that are not tagged and unnamed. I’ve gotten some very strange files before and then had to rename them to .exe or .txt

  2. I’d be more interested in some other data. For example, how much of that material is stuff that isn’t available legally through any other means? How much of it is being downloaded in countries where it is not otherwise available? How much of it is out of print? etc, etc.

    Really, how much of it stuff that I can buy on iTunes or Amazon in a digital form?

  3. i was surprised that only 10% was music. that’s the bulk of my usage. or would be…if i used bittorrent….

  4. Favorite part:

    “Still, the result suggests strongly that copyright infringement is widespread among BitTorrent users.”

    Nooooooooo, say it isn’t so!

    1. >”Still, the result suggests strongly that copyright infringement is widespread among BitTorrent users.”

      But you just can’t do anything FUN on the internet without infringing copyright in some respect.

      I mean, I’ve tried.

  5. “…of the 46% that are movies, fully 53% are sh***y cammed copies of Transformers 2 with half the dialogue in Russian, shot from behind some tall guy’s head.”

  6. On a slight tangent, wasn’t there a study recently that established that avid bt users, especially music downloaders, also were more likely to purchase music and media, due to them being enthusiastic fans of the aforementioned?

    And on a further tangent regarding music and artists, I’m sure we’re all aware that artists receive a higher percentage of profits from live gigs and merchandise at live gigs.

    Which is to say, if you illegally d/l some music, the least you could do (if you like them) is to go to the gig, if they’re playing nearby.

    1. Dr Wally, the only thing the survey established is what people tell researchers and what they do are completely different things.

      As an example, according to the survey the average number of music CDs & DVDs bought in the six months prior to the survey (carried out in February 2006) by just the 35-44 year-old demographic was 9.7.

      If the sample is representative of the all Canadians in that age group then it means 35-44 year-old Canadians bought just under 23 million music CDs and DVDs in 6 months.

      Total unit sales for music CDs and DVDs in Canada for all of 2005 were actually 22.3 million.

  7. A file that was ‘not downloaded at all’ is in actual fact non-infringing.

    It’s interesting that music is so much lower than movies and tv, given that both now have somewhat-acceptable legal download options, and streaming tv is more likely to be free!

    So tv and movies should be separated out. Also, regional analysis by availability would be useful. For a show available on Hulu, how many Americans torrent it? Less every day, I’d guess.

    When everyone can watch the show they want when they want, legally and easily on their big HDTV, piracy of TV will all but vanish.

    People accept small inconveniences for greater conveniences, but forced scheduling PLUS 25% commercials is too much to ask for anything but live sports.

    This is why both the vcr and dvds did so well.


  8. >>”14% could not classify”

    >>I don’t understand…

    14% are probably viruses and non-files that are used to open your firewall and turn it into swiss cheese.

  9. Only 14% was porn? What is this world coming to? (no pun intended!)

    Anyway, this study is likely misleading. What constitutes “infringement”, or “copyright”? Are we using the USA model? Because to my knowlege not all material actually originates in the USA. Should we be applying US copyright law to a work created in Russia, which I downloaded in Canada?

    Also many many things are technically “copyright” but for all intents and purposes, “copying” them is the only way to obtain them. Abandonware software is a good example of this. Companies orphan software all the time that they do not wish to support any longer, however maintain “copyright” on forever. Rarely is something actually released to the free domain. Does this actually mean there is any harm, in the copying? They have abandoned it, offer not support, nor try to sell it. I fail to see the moral ground for the copyright justification.

    Also another example, are say anime from japan. If I download a fansub of some anime TV show that was release in Japan back in the 80’s, does that constitute as infringement? Likely there is copyright. However it has been modified by fans to add subtitles. Perhaps it has never even been exported from Japan, or never had subtitles? So they are not trying to sell it to me, I can’t physically get it anywhere else. Anyway I certainly don’t see things as black and white as many would like it to be.

    I don’t argue that downloading the latest Hollywood blockbuster would perhaps be infringement, but there is a LOT out there that isn’t so cut and dry IMHO.

    BTW fansubs vary in quality. One example of this is I watched a copy of “Giant Robo”. It was a anime TV show that only survived one season in Japan in the 80’s. The copy I got, was dubbed in Italian from Japanese, and sub-titled in English. Watching it nearly blew my mind, causing me go insane. It also made me giggle a bit. You could also see where the “translations” didn’t quite match so much. Also in one episode there is opera music, which made me wonder if that was only in the Italian version? Anyway certainly something you could only find on Bit torrent. “GIANT ROBO!” LOL

    1. I don’t believe there was a 1980s Giant Robo. You may be thinking of the 7-episode ’90s OVA series.

  10. Non-pornographic movies? On bit torrent?

    I..I..I…don’t understand. I’m reading the words, but the sentence doesn’t parse.

  11. For what it’s worth…I download from the torrents. Movies mostly. I also have a Netflix account. I don’t consider what I do piracy. I download a copy of a DVD screener from bittorrent, watch it then erase it…or I wait for it to arrive from Netflix, copy it, transcode it, send the disc back, watch it then erase it. The only difference is ease of use.

    I own a TVisto (basically an external hard drive that also has video out) that can play back movies and xvids. I prefer xvid so I can keep a larger library of things to watch.

  12. Almost everything I download (from Usenet, but its the same difference) is TV shows. I probably average 1 – 2 seasons of a show per month (closer to the 1 than the 2). However, I also own stacks and stacks of DVD box sets. I was trying to think of all the boxes I’ve purchased, but its really just too many. Just adding up the South Park, Family Guy, and Futurama boxes is more than 20, and most of those purchased within the last 5 years.

    I’d say that the bastards are getting their money’s worth from me and I’m tired of them bothering me for more.

  13. He should have included a separate category for anime. I’d like to see how it would break down when you take away anime from movies and live action TV shows.

Comments are closed.