Automated book-culling software drives librarians to create fake patrons to "check out" endangered titles

Two employees at the East Lake County Library created a fictional patron called Chuck Finley -- entering fake driver's license and address details into the library system -- and then used the account to check out 2,361 books over nine months in 2016, in order to trick the system into believing that the books they loved were being circulated to the library's patrons, thus rescuing the books from automated purges of low-popularity titles.

Library branch supervisor George Dore was suspended for his role in the episode; he said that he was trying to game the algorithm because he knew that these books would come back into vogue and that his library would have to spend extra money re-purchasing them later. He said that other libraries were doing the same thing.

This is datification at its worst: as Cennydd Bowles writes, the pretense that the data can tell you what to optimize as well as how to optimize it makes systems incoherent -- it's the big data version of "teaching to the test." The library wants to be efficient at stocking books its patrons will enjoy, so it deploys software to measure popularity, and raises the outcomes of those measurements over the judgment of the skilled professionals who acquire and recommend books, who work with patrons every day. Instead of being a tool, the data becomes a straightjacket: in order to get the system to admit the professional judgment of librarians, the librarians have to manufacture data to put their thumbs on its scales. The point of the library becomes moving books by volume (which is only one of the several purposes of a library), and "the internal framing of users shifts. Employees start to see their users not as raison d’être but as subjects, as means to hit targets. People become masses, and in the vacuum of values and vision, unethical design is the natural result. Anything that moves the needle is fair game: no one is willing to argue with data."

Software is not objective. The designers of the library's software made a subjective decision to take the measurements they are taking, and to respond to them in the way they are responding to them. The librarians who'd use the software are treated as adversaries, not allies -- they are there to be controlled by the software, not informed by it. Just like the nurses who assign junior staffers to hit the spacebar at 10 second intervals to keep their terminals from re-prompting them for a password, the librarians who could not override the software by executive edict resorted to chicanery to get their jobs done.

That's the important takeaway here: these librarians didn't monkeywrench their software for personal gain. They did it because they wanted to make the system better, to teach it how to weight the circulation data to reflect the on-the-ground intelligence and historical perspective they had on their libraries, their collections and their patrons.

Science fiction has grappled with this exact problem in the past: Connie Willis's 20-year-old classic novella Bellwether features a patron (a social scientist who specializes in fads!) who goes to the library every week to check out titles that she knows to be out of vogue, but significant, to trick the library systems into retaining them.

The problem here isn't the collection of data: it's the blind adherence to data over human judgment, the use of data as a shackle rather than a tool. As the article in the Orlando Sentinel hints, this is because "money wars" have made enemies out of the city and its librarians -- and as this episode highlights, there is no good way to proceed amidst that enmity. Just as treating teachers as lazy welfare bums who must be measured with standardized tests has lowered educational standards and driven out the best teachers, so will any other system that treats employees as problems rather than solutions engender a continuous, spiraling arms race that will never solve the problem.

Dore and library assistant Scott Amey created “Chuck Finley” simply to save certain books from being ditched at the library, according to Dore and inspector’s general’s notes.

Records show that dozens of books were checked out and then checked back in again all in the same hour.

The fictional Chuck Finley was named after “a ballplayer,” according to the inspector’s notes. Chuck Finley is a retired major league baseball pitcher who played mostly for the California and Los Angeles Angels during a 17-year career.

Dore said in interviews with the inspector general’s office that it was happening elsewhere but didn’t provide specifics.

“He did know that other libraries have had ‘dummy’ patron cards and institutional cards,” according to the interview notes. “There was a lot of bad blood between the libraries because of money wars.” The inspector general’s report said creation of a fake library card “amounts to the creation of a false public record.”

To save books, librarians create fake 'reader' to check out titles [Jason Ruiter/Orlando Sentinel]

(via /.)

(Image: PR2 Robot reads the Mythical Man-Month, Troy Straszheim, CC-BY-SA; The Leeds Library, Michael D Beckwith, CC0)

Notable Replies

  1. Or even checking the book out.

    Even if a book is not checked out, it may still be referenced. Now I'm sure there are some fancy libraries that will know what books have been removed from the shelf and need to be re-shelved, but the libraries I'm most familiar with have no means of tracing that.

  2. When clueless people (generally upper management) see "data" - they assume some supercomputer will see something they're not seeing and magically solve all their problems. Algorithms are written by people. The computer just runs them over and over. What is the goal of the person writing the program? Do they know what they're doing? In many cases, the answer is no. Look at facebook's famous algorithms. They somehow treat fake news stories as real news. Of course! How does one write an algorithm to determine whether news is real or fake? It's not really possible.

    Unfortunately, upper management of companies, governments, institutions of all kinds will continue to think "big data" is infallible and will solve all their problems. Then punish any human who disagrees with the results of the sacred algorithms.

  3. I miss card catalogs.

    Especially the smell.

  4. That librarians had to resort to such measures indicates that the whole process was pretty messed up. Weeding is (unfortunately) a necessary and inevitable endeavor in non-TARDIS based libraries, but there's ways of minimizing the possibility of throwing away unrecognized gems. The automated identification of low or uncirculated can be useful, but you should also

    1) Have librarians and informed/expert interested parties (like University departmental faculty) look at the candidates and have an opportunity to save them - basically what these librarians where trying to do and should have had the open chance to do so
    2) Check possible discards for copies available in electronic archives or library consortia -if it's the last available physical copy, even if it's in an archive electronically, it might be a good idea to hang on to it. Tangential to that - is there something inherent to the object that can't be replicated electronically? Pop-up ebooks aren't much fun.
    3) Consider the purpose/scope of the library - is it a comprehensive research collection or a community college nursing program - in one it'd be useful to have outdated information for history of medicine purposes. In the other where students may not yet be savvy enough to recognize stuff that shouldn't be used in current practice, books even 10 years old can be mostly noise.
    4) Look at frequently circulated stuff as well - is a 7 year old ACT study guide that checked out 300 times really still worth keeping?

    I'm sure I'm missing at least a few other considerations as well, but those are some of the things I've learned from the weeding I've been involved in.

  5. So many things to comment on! I am a librarian (MLIS) but I don't work in a library. On the other hand, I've worked with hundreds of public libraries as a materials handling consultant. Here's my take on this thread:

    1) About weeding algorithms. There are lots of reports that librarians can run that will generate lists of books to be considered for weeding. Dusty Shelves List is basically the one being discussed and it refers to an item that hasn't circulated in some period of time (set by the librarian running the report). These books are always reviewed by a librarian before being confirmed for discard.

    2) Last Copy. Librarians pay close attention to the "last copy" of a title and won't get rid of it unless they are sure it can be gotten from somewhere else (downloaded for free from Project Gutenberg, borrowed from a partner library or interlibrary loaned from afar) or if it is really outdated and not "earning his position on the shelf."

    3) Public libraries versus Academic Libraries versus Archives. Public libraries (as many have noted) are not about retaining the knowledge of humankind, they are about providing entertainment, current news and reference material. They are very limited in what they can afford to keep on the shelves so have to work very hard to keep their collections rotating out the old stuff (or just unpopular stuff) to make room for the new stuff. They can't keep all those Catcher in the Rye copies when they have a waitlist of 150 people for the latest Patterson novel. However, they would sure be happy to get you a Catcher in the Rye or direct you to the Internet Archive where you can download a copy in multiple formats (https://archive.org/details/CatcherInTheRye).

    4) About tossing out good DVDs. Most public libraries toss out DVDs after some number of circulations just because they start causing problems during playback and that ends up being extremely frustrating for patrons who go home, make their popcorn, and are ready to enjoy a good movie but can't. A high circulating DVD will "usually" be replaced when taken out of the collection unless they can't afford it.

    5) Automated storage and retrieval systems. Academic libraries use big storage warehouses to keep low circulating items or "last copies" and they can usually be ordered online and delivered to their patrons within a day or two. These systems cost millions of dollars to build and maintain but they allow many more books to be archived. Public libraries cannot afford such a thing although NYPL is building one (first one for public libraries). This will allow them to keep more of the kinds of books we've been discussing while leaving room for the popular titles that are more actively circulating in their 150 branches (or something like that).

    6) Resource sharing and Interlibrary loan. Many libraries partner with other libraries and have "reciprocal borrowing" relationships that allow everyone's stuff to be freely shared with everyone else. You can find it in the catalog and request it and it comes to your library, even if it actually belongs to another library. This is the happening more and more (thankfully). But sometimes, the deals these libraries make are that they'll share everything except, say, the popular DVDs. They do that to ensure the money they spend on those pricey DVDs goes to their patrons. Perhaps their library board or city council is ornery because they don't want them lending their stuff to other libraries (without recognizing the benefit of the totality of a resource sharing arrangement to provide a 10-20x larger collection for their patrons to choose titles from). Interlibrary loan (ILL) is when you have no such relationship but you can still ask another library for a title. Pretty much all libraries will request anything you really want for you, you just have to ask. And ILL takes longer since it involves a bunch of paperwork.

    7) LGBTQ issues and books on the shelf. As a lesbian, I very much appreciate that libraries keep things on the shelves for a questioning LGBTQ person to find as I did when I was younger (thank you for Patience and Sarah). I agree that this is critical and provides more privacy than a Google search can. And besides, with the Google algorithms, there's a good chance they are going to find out that Gays are Evil or some such horrible thing and not be guided to authoritative resources that they would otherwise find courtesy of a professional librarian. Also, many libraries are forced to employ Internet content filters which are not well managed (IMHO) so they might not be able to find ANYTHING that would help them if the content filters were blocking that category of content (which some unfortunately do but the folks in the library aren't always paying as much attention as I wish they were to this issue).

    I think that's it. I hope you don't mind my sharing my experience! I'm hoping you will find it interesting to hear more about the actual workings of public libraries (beyond what your particular library might be doing).

Continue the discussion bbs.boingboing.net

109 more replies

Participants