The citation graph is one of humankind's most important intellectual achievements
When researchers write, we don't just describe new findings -- we place them in context by citing the work of others. Citations trace the lineage of ideas, connecting disparate lines of scholarship into a cohesive body of knowledge, and forming the basis of how we know what we know.
Today, citations are also a primary source of data. Funders and evaluation bodies use them to appraise scientific impact and decide which ideas are worth funding to support scientific progress. Because of this, data that forms the citation graph should belong to the public. The Initiative for Open Citations was created to achieve this goal.
Back in the 1950s, reference works like Shepard's Citations provided lawyers with tools to reconstruct which relevant cases to cite in the context of a court trial. No such a tool existed at the time for identifying citations in scientific publications. Eugene Garfield -- the pioneer of modern citation analysis and citation indexing -- described the idea of extending this approach to science and engineering as his Eureka moment. Garfield's first experimental Genetics Citation Index, compiled by the newly-formed Institute for Scientific Information (ISI) in 1961, offered a glimpse into what a full citation index could mean for science at large. It was distributed, for free, to 1,000 libraries and scientists in the United States.
Fast forward to the end of the 20th century. the Web of Science citation index -- maintained by Thomson Reuters, who acquired ISI in 1992 -- has become the canonical source for scientists, librarians, and funders to search scholarly citations, and for the field of scientometrics, to study the structure and evolution of scientific knowledge. ISI could have turned into a publicly funded initiative, but it started instead as a for-profit effort. In 2016, Thomson Reuters sold its Intellectual Property & Science business to a private-equity fund for $3.55 billion. Its citation index is now owned by Clarivate Analytics.
Raw citation data being non-copyrightable, it's ironic that the vision of building a comprehensive index of scientific literature has turned into a billion-dollar business, with academic institutions paying cripplingly expensive annual subscriptions for access and the public locked out.
Companies such as Clarivate Analytics or Elsevier (who owns its own citation index, Scopus) have put substantial efforts into creating proprietary high-quality indexes out of raw citation data, and proprietary metrics based on this data to assess the impact of scientific publications. But the fact that the citation data itself -- produced by the labor of millions of researchers as part of their scientific communication activity -- is not a public good that anyone can access is nothing short of "a scandal", as long-standing open citations advocate David Shotton eloquently put it.
"Openness is central to the research endeavor," says Cassidy Sugimoto and collaborators in an open letter published by the International Society for Scientometrics and Informetrics. "It is essential to promote reproducibility and appraisal of research, reduce misconduct, and ensure equitable access to and participation in science. Yet, calls for increased openness in science are often met with initial resistance."
Proprietary citation databases are available to universities and funding bodies via expensive subscriptions, but the restrictive nature of their licenses means that these databases don't allow any kind of reuse or fully reproducible data analysis. Building on citation data is only possible to those people and organizations licensed to access proprietary databases.
There are no citation databases that support the open, unconstrained reuse of their underlying data. Opening up the data that forms the citation graph -- to quote the open letter from ISSI -- "is a matter of scientific integrity, scientific progress, and equity."
Enter the Initiative for Open Citations.
In 2016, a small group founded the Initiative for Open Citations (I4OC) as a voluntary effort to work with scholarly publishers -- who routinely publish this data -- to persuade them to release it in the open and promote its unrestricted availability. Before the launch of the I4OC, only 1% of indexed scholarly publications with references were making citation data available in the public domain. When the I4OC was officially announced in 2017, we were able to report that this number had shifted from 1% to 40%. In the main, this was thanks to the swift action of a small number of large academic publishers.
In April 2018, we are celebrating the first anniversary of the initiative. Since the launch, the fraction of indexed scientific articles with open citation data (as measured by Crossref) has surpassed 50% and the number of participating publishers has risen to 490. Over half a billion references are now openly available to the public without any copyright restriction. Of the top-20 biggest publishers with citation data, all but 5 -- Elsevier, IEEE, Wolters Kluwer Health, IOP Publishing, ACS -- now make this data open via Crossref and its APIs. Over 50 organisations -- including science funders, platforms and technology organizations, libraries, research and advocacy institutions -- have joined us in this journey to help advocate and promote the reuse of open citations.
Data liberated by the I4OC is now integrated into bibliometric analysis tools, reused as linked open data in citation corpora, used by volunteer contributors in collaborative knowledge bases and it powers the catalogues of a growing number of scholarly databases.
The publishers who have released their raw citation data into the public domain are making the vision of an open citation graph a reality. But we are only halfway there. We urge the remaining publishers to join this effort -- and researchers, practitioners, librarians, scholarly societies, and members of the public who believe in this vision to help us reach our 100% target. The world is waiting for the citation graph to become a public good.
Dario Taraborelli (@readermeter) is an open knowledge advocate and the Director of Research at the @Wikimedia Foundation.
(Image: Dartar, CC-BY)
Karl Schroeder's "Stealing Worlds": visionary science fiction of a way through the climate and inequality crises
Karl Schroeder (previously) is literally the most visionary person I know (and I've known him since 1986!): he was the first person to every mention "fractals" to me, then "the internet" and then "the web" -- there is no one, no one in my circle more ahead of more curves, and it shows in his […]
Veteran reviewer/interviewer Rick Kleffel (previously) has just posted a long podcast interview (MP3) with Neal Stephenson, discussing his latest novel, Fall; or, Dodge in Hell ("a science fiction novel with a fantasy novel stuck inside of it").
A couple of weeks ago, I was thrilled to hear Subgenius founder, the Reverend Ivan Stang, interviewed on the Comedy on Vinyl podcast (previously) and to discover that the Church of the Subgenius was selling a $35 Salvation/Membership/Ordainment kit that was chock full of goodies.
This all-in-one computing solution packs a healthy dose of processing power packed inside a 21.5″ HD LED display. It also features an Intel Core i3-2100 Dual-Core 3.1GHz CPU with 4 GB of DDR3 RAM for next-level multitasking and an impressive 250 GB SATA hard drive that can safely store your important files and media. So […]
So you cut the cord and got rid of cable? Join the steadily growing club. But while you’re out picking a streaming service, you might find one big blind spot: Local TV and sports, not to mention first-run programming from the big cable networks. Luckily, there’s a throwback way to get it for free: The […]
Even if you feel like AirPods are worth the price tag, you’ve got to admit there’s a certain anxiety that comes with using them. What if I lose them? What if they get wet in the rain? Or drenched in sweat? Or fall into the drink you dropped them into? Shiny tech is great, but […]