How fanfic archives lead the world in data organization

Since the earliest days of the "semantic web,", millions of dollars and hours of coding effort have been thrown at the problem of really organizing large corpuses of information, with two approaches emerging: rigid ontologies (like the Dewey Decimal system) that require a system's users to be deeply expert in the structures they're working in; and "folksonomies" (aka hashtags), which allow anyone to tag anything with anything, and leads to fragmentation (like #sign or #signs; or #photos, #pix, and #pictures, etc).

Writing in Wired, Gretchen McCulloch looks at the "tag wranglers" of Archive of Our Own (AO3), the fanfic community maintained by the nonprofit Participatory Culture Foundation (disclosure: I am a volunteer advisor to the PCF), a group of about 350 volunteers who work behind the scenes to hand-craft equivalences between different tags, so that fanfic about John Watson and Sherlock Holmes can be tagged with "Johnlock or Sherwatson or John/Sherlock or Sherlock/John or Holmes/Watson" and still all be grouped together.

The degree of organization and flexibility this brings to AO3 is significantly better than those used by "news sites, library catalogs, commercial sales websites, customer helpdesk websites, and PubMed (the most prominent database of medical research)," and where these sites perform better, it's thanks to the "ghost labor" of Mechanical Turks and other low-paid clickworkers who engage in tag-wrangling comparable to AO3, except that clickworkers are unlikely to have the deep understanding and commitment that AO3's tag-wranglers bring to the project.

Archive of Our Own is up for a Hugo Award this year for "Best Related Work."

Another of the Tag Wrangling Chairs, Qem, also thinks that machine tag wrangling is unlikely, pointing to machine translation as a cautionary tale. "There are terms in fandom which, while commonly understood in context among fans, would not be when you take it out of the fandom context," Qem says. For example, seemingly innocuous words like "slash" and "lemon" do not refer to a punctuation mark or a citrus fruit in fannish contexts, and tag wranglers are already well aware that machine translation can only manage the literal, not the subcultural, meanings. Qem's co-chair, briar_pipe, is slightly more sanguine: "I personally think it might be interesting to have AI/human partnerships for this type of data work, but you have to have humans who are aware of AI limitations and willing to call AIs on mistakes, or else that partnership is useless."

Fans Are Better Than Tech at Organizing Information Online [Gretchen McCulloch/Wired]