Fact-Checkers and Certified Public Logicians

Guestblogger Paul Spinrad is a freelance writer/editor, and is Projects Editor for MAKE magazine. He is the author of The VJ Book and The Re/Search Guide to Bodily Fluids, and was an early contributor to bOING bOING when it was an online zine. He lives in San Francisco.

It's fantastic that so much written knowledge is becoming generally accessible and cross-linked these days, but this is just an intermediate stage-- a universal library on the way to becoming a universal brain. The missing piece is encoding the underlying meaning of the stored text, the deep-structure logic behind it. It's one of the oldest challenges in Computer Science, and there has been lots of progress and companies dedicated to doing this. Powerset, for example, has software that has parsed and can answer questions from all of Wikipedia.

The thing is, you really still need a person to get it most reliably right, because people understand the way the world works. Luckily, we already have people whose job is very close to doing this already-- they're called fact-checkers or researchers, and they work for every reputable publication.

I don't think the fact-checking process is very well understood by the public-- it's hidden from view and uncredited (which is lame), and I didn't understand it myself until I began working with magazines. Basically, someone combs through a piece of text and makes sure every fact is verified. They look things up in established references, they call people on the phone, they call their friends who have experience in some area, or whatever else it takes. If they're doing it on paper, they start with a printout of the article, and then when they're done every word, every clause, and every spelling of every proper name, has a pencil mark through it.

I have wondered for years, as magazines, newspapers, and other news organizations have been hemorrhaging money and employees, why someone hasn't gone into the contract fact-checking business. Like, it could be an extension of Snopes.com. There's a huge redundancy in every publication having their own research desks, so they could lay off all of their fact-checkers and then outsource the job to the new, independent company that the best of them then all go to work for. Meanwhile, the company could also be hired by anyone else. Then, when the public sees the "Fact-Checked by MiniTrue (SM)" seal on someone's independent blog, they know the information there has the same credibility as the big boys.

Now, what if these fact-checkers didn't just vet and correct the text? While they dig into the logic and accuracy of everything, as usual, they could also use some simple application to diagram the sentences and disambiguate the semantics into a machine-friendly representation. Just a little extra clicking, and they could bind all the pronouns to their antecedents, and select from a dropdown box to specify whether an instance of the string "Prince" refers to the musician Prince or to Erik Prince-- the president of XE, the company formerly known as Blackwater-- within an article that for whatever reason mentions both of them.

Then you would really have something. The text wouldn't just be fact-checked; its underlying meaning could be added into a shared pool of human knowledge, chained through, verified or denied, and used in other ways by any technology that may now exist or may exist in the future.

Many of big ideas that computer visionary Douglas Engelbart came up with in the 1960's have come true, but a couple of them haven't yet. One of these is his notion of the "Certified Public Logician." Engelbart predicted that a new class of knowledge worker would act as front-ends to the machine-enabled collective intelligence. Part logician, part notary, these "Certified Public Logicians" would review texts for logical consistency and then tag them up with appropriate envelope information and enter them into the machine. It's a great idea, and I think we could promote all of our fact-checkers into Certified Public Logicians pretty easily.



  1. Thank you for this rare encomium to my profession! There are far fewer of us employed on magazines these days than when I first started, but we take our jobs seriously and fight behind the scenes for truth and clarity of expression. It’s cool to think of becoming a Certified Public Logician, but we generally work under pretty tight deadlines & I can’t imagine having the time to diagram sentences as well as check their accuracy.

    As this “Fact Checkers Unit” Funny or Die video with Bill Murray reveals, there’s more to this job than consulting Wikipedia:

  2. Fact checkers in many publishers and news organizations have a legally driven bias, and the outcome is not always a better article.

    Specifically, their employers are most concerned with avoiding lawsuits. Therefore, facts in an article which could lead to a lawsuit if incorrect are checked more carefully than facts that could not. If the article states that “Senator Smith said that the moon is made of green cheese”, then it will be checked that he did indeed say this. However, “The moon is made of green cheese” is less likely to be checked, because the moon is much less likely to sue. Again, this varies a lot from publication to publication – The New Yorker for instance employs superb fact checkers and would undoubtedly check both things – but this generally seems to me to be the bias.

  3. this strikes me simultaneously as an awesome idea and as an idea that will probably never be implemented, unfortunately. we’re ‘merkins. we don’t care about facts :[

  4. Certified Public Logicians already exist. They’re called librarians. And they already review texts for logical consistency and tag them with appropriate envelope information. It’s called cataloging and metadata production. Y’know, the people and activity we don’t need and can’t provide for the web because there’s too much stuff on the internet and you can’t trust people to tell the truth. Besides, web resources should describe themselves and we can automate the meaning generation process. Right?

  5. All well and good, except for the part where you’re forgetting that facts are less and less important in the big media organizations which you’re talking about as customers.

  6. Makes me think of the ‘fair witness’ idea, in _Stranger in a Strange Land_.

    Certified public accountants exist because business can’t happen smoothly if everyone’s tallying their books differently. If there were a standardized fact-checking system, then lying powerbrokers would be at a competitive disadvantage, and they’re not going to allow *that*.

    I’m trying to imagine wikipedia growing some teeth and becoming a grass roots government in itself. Nah, there’s too much money to be made in a wild-west internet, all the way up until the whole thing collapses because someone has been gaming the power grid.

  7. Sounds kind of like a Fair Witness, except for the “underlying meaning” part. But if you centralize it, by God, who fact-checks the fact-checkers?

  8. Checking facts may be do-able. Harder would be checking the use of statistics in support of an argument: most of the time when someone quotes numbers in an attempt to show that a policy is effective, those numbers are worthless for lack of a control group or because they’re not allowing for reversion to the mean.

  9. Also librarians. More of them, and they’re even less prone to bias than folks who work for a particular publication.

  10. I think people are misremembering, or maybe idealizing, Heinlein’s notion of the Fair Witness. The Fair Witness didn’t check facts, she just reported what she personally witnessed, and had been trained not to make suppositions about things she hadn’t witnessed.

    Remember the example of the house? “It’s painted white on this side.” A fact checker would have either gone to the house and circled it to find out the color on all sides, or (more likely) phoned the owners or someone who lived nearby.

  11. I wholeheartedly support this position.

    I could see the need for not one, but several competing organizations concerned with analyzing and verifying both the facts and logic of a variety of texts, positions, and public statements. Truthout.org and Snopes come to mind, as well as the science section of Ars Technica, though I would love to see these expanded even further.

    These groups members should be subject to rigorous standards in order to keep the organization’s standard at its highest value.

    As reputations become more valuable as a form of social currency, some sort of truthing group, network, or process becomes vital to stopping those that would game the system.

  12. the logicians you describe already exist as copyeditors, another underappreciated and uncredited (and largely forgotten?) profession.

  13. fact-checking, copyediting, tagging, … this sounds like the kind of minutiae I would love to do for a living.

  14. Doesn’t a logician know a smattering about – um – logic? If that bar’s unrealistically high, do go with the librarians. No need for sci fi. Theirs is an interesting world that deals with relevance issues by comfortably viewing a masters in library science as somehow equivalent to a Ph.D. in computer science. See http://bit.ly/6qVDCa

  15. Powerset, for example, has software that has parsed and can answer questions from all of Wikipedia.

    O Rly? Parsed? It can’t even parse the questions. It looks to me like it just does a keyword search.

    And anyway, parsing is the easy bit. Encoding the underlying meaning is, as you say, the real missing piece. And AFAIK, we’ve so far gotten precisely nowhere on that one. (I’d be delighted to be shown that I’m wrong. But you’ll need to do better than powerset.)

    (And btw your powerset hotlink is borked too.)

    1. misterfricative,

      Agreed! Although they do tout “understanding meaning” on the website, I asked powerset “not including tuna, which fish have yellow fins?” and got back tuna + generic fish-related results. It can pick out the meaning of an obvious query, but conditions aren’t understood.

      That said, I was comparing Ask and Powerset, and asked both a question I thought they might falter at. “Where is the earth?” Powerset surprised me with a terse: Earth: Contained by Solar System” :)

      And the Powerset link should be fixed now, thanks.

  16. Every day I see more and more of a need for librarians and yet still everyone seems to be afraid to use the “L” word. Perhaps it conveys some sense of the dusty, the disused, the pre-modern? As if we’re all just stack mice who scurry about tut-tutting about the noise those darn kids are making by the car catalog or something.

    Librarians not only provide information, we spend our careers parsing sources for reliability and usefulness. We might not be experts in Medicine or biotech or law, but we can look at a database covering one of those topics and give a sense of how on the ball they are. We evaluate, process, fact check and deliver everything from cursory questions to in-depth reference projects. Academic librarians are often the first people thanked in any decent treatsie or work of non-fiction, and rightly so. Oh and public librarians do all that while providing programming from lapsits to advice on retirement.

    We’re also criminally under-paid, under-staffed and under-valued.

    Honestly, if you want to super-charge the economy, get science and math and general education going gangbusters and give families a “third place” to build a stronger community, INVEST IN LIBRARIES.

    Seriously, bailout the libraries, save the world man.

  17. This is an interesting read, but if anthropology, literary studies, linguistics and the philosophy of language have taught us anything over the past century, it’s that there isn’t any innate meaning underlying a text outside of what gets attributed to it.

    Meaning is always contested. And it is these debates/conflicts over meaning that are the heart of politics. So even if the technical and programming issues can be overcome, would people really want to live in social system where someone was deciding for them what stuff ‘means’?

    P.S. big props to the fact-checkers, copy-editors, and other unsung heroes of the publishing industry.

    1. Leviathan,

      But surely there are two layers to meaning, the intrinsic (almost mathematical) relationship of words and sentence-stuctures on one hand, and then the deeper interpretive meaning on the other.

      eg. “Horses can fly.”

      The first meaning has little subjective argument, is unconcerned with the veracity of the statement and is likely machine-achievable : [object][relationship][action]

      The second, as you suggest, is factual, political and social, and dependent on the reader: Can horses fly? What is the reason for asserting that they can? What is the context of the statement? Is it fantasy?

      I think the first is doable (far better than now) without getting into the sort of cunundrums of meaning that incite argument and side-taking. I imagine the second will take heavy AI with a massive data cache. I can picture the big search engines of the day, each with their flagship AI personality being promoted on the value of its informed and creative opinion rather than just the matrix of facts it can access.

      Jesus, Johnny Bing has rubbish taste in music, I usually just ask Lisa Limewire for what’s new.”

  18. Wait, isn’t this what the semantic web was supposed to be? Did Clay Shirkey say that categorizing things is a waste of time?

  19. mgfarrelly,

    I’m with you. I’m currently working on my MLIS degree and I’m pleased to echo what you’ve said. From the American Library Association’s Code of Ethics (which has been practically bludgeoned into my tiny little mind):

    I. We provide the highest level of service to all library users through appropriate and usefully organized resources; equitable service policies; equitable access; and accurate, unbiased, and courteous responses to all requests.

    VI. We do not advance private interests at the expense of library users, colleagues, or our employing institutions.

    VII. We distinguish between our personal convictions and professional duties and do not allow our personal beliefs to interfere with fair representation of the aims of our institutions or the provision of access to their information resources.

    (From the ALA’s website: http://www.ala.org/ala/aboutala/offices/oif/statementspols/c
    odeofethics/codeethics.cfm, but that link is long, and may break. Try this one: http://tinyurl.com/6×3246 )

    Fact-checking is what we do—it’s out lifeblood. Unfortunately, in this world of reality TV and the cult of the personality, veracity just ain’t what it used to be.


    re: the comparative worth of MLIS v. PhD in CS,

    They’re asking for someone highly qualified in the field to help develop and organize CS information, a person that will partner with others in the Uni “in initiating projects to enhance the university’s research and scholarly data management and curation programs”, among other things. They’re not looking for a Computer Scientist, but a highly informed person to collect, organize, and make available the best CS info currently known. If anything, it’s a step down for a PhD. to take that job—if they get someone with a MLIS, then that’s good. With a PhD., then all the better.

  20. Good article. I agree with others that versions of this already exist. My mother (a librarian by training) worked for Shepherds, later Shepherd-McGraw-Hill, and still later a wholy owned subsidiary of Lexis-Nexis. She and her coworkers parsed judicial rulings for precident and relationship to other rulings and case law. Esentially Shepherds and companies like it created enormous books of logical metadata (the leather books you have seen behind lawyer’s desks) which make case law work. Lexis still does the same thing though it has all been computerized now. The main impediment to this becoming the envisioned world brain is the enormous fees requred to access this data.

  21. Why bother outsourcing? From what I have seen over the past several years, they have already eliminated the practice.

