The citation graph is one of humankind's most important intellectual achievements
When researchers write, we don't just describe new findings -- we place them in context by citing the work of others. Citations trace the lineage of ideas, connecting disparate lines of scholarship into a cohesive body of knowledge, and forming the basis of how we know what we know.
Today, citations are also a primary source of data. Funders and evaluation bodies use them to appraise scientific impact and decide which ideas are worth funding to support scientific progress. Because of this, data that forms the citation graph should belong to the public. The Initiative for Open Citations was created to achieve this goal.
Back in the 1950s, reference works like Shepard's Citations provided lawyers with tools to reconstruct which relevant cases to cite in the context of a court trial. No such a tool existed at the time for identifying citations in scientific publications. Eugene Garfield -- the pioneer of modern citation analysis and citation indexing -- described the idea of extending this approach to science and engineering as his Eureka moment. Garfield's first experimental Genetics Citation Index, compiled by the newly-formed Institute for Scientific Information (ISI) in 1961, offered a glimpse into what a full citation index could mean for science at large. It was distributed, for free, to 1,000 libraries and scientists in the United States.
Fast forward to the end of the 20th century. the Web of Science citation index -- maintained by Thomson Reuters, who acquired ISI in 1992 -- has become the canonical source for scientists, librarians, and funders to search scholarly citations, and for the field of scientometrics, to study the structure and evolution of scientific knowledge. ISI could have turned into a publicly funded initiative, but it started instead as a for-profit effort. In 2016, Thomson Reuters sold its Intellectual Property & Science business to a private-equity fund for $3.55 billion. Its citation index is now owned by Clarivate Analytics.
Raw citation data being non-copyrightable, it's ironic that the vision of building a comprehensive index of scientific literature has turned into a billion-dollar business, with academic institutions paying cripplingly expensive annual subscriptions for access and the public locked out.
Companies such as Clarivate Analytics or Elsevier (who owns its own citation index, Scopus) have put substantial efforts into creating proprietary high-quality indexes out of raw citation data, and proprietary metrics based on this data to assess the impact of scientific publications. But the fact that the citation data itself -- produced by the labor of millions of researchers as part of their scientific communication activity -- is not a public good that anyone can access is nothing short of "a scandal", as long-standing open citations advocate David Shotton eloquently put it.
"Openness is central to the research endeavor," says Cassidy Sugimoto and collaborators in an open letter published by the International Society for Scientometrics and Informetrics. "It is essential to promote reproducibility and appraisal of research, reduce misconduct, and ensure equitable access to and participation in science. Yet, calls for increased openness in science are often met with initial resistance."
Proprietary citation databases are available to universities and funding bodies via expensive subscriptions, but the restrictive nature of their licenses means that these databases don't allow any kind of reuse or fully reproducible data analysis. Building on citation data is only possible to those people and organizations licensed to access proprietary databases.
There are no citation databases that support the open, unconstrained reuse of their underlying data. Opening up the data that forms the citation graph -- to quote the open letter from ISSI -- "is a matter of scientific integrity, scientific progress, and equity."
Enter the Initiative for Open Citations.
In 2016, a small group founded the Initiative for Open Citations (I4OC) as a voluntary effort to work with scholarly publishers -- who routinely publish this data -- to persuade them to release it in the open and promote its unrestricted availability. Before the launch of the I4OC, only 1% of indexed scholarly publications with references were making citation data available in the public domain. When the I4OC was officially announced in 2017, we were able to report that this number had shifted from 1% to 40%. In the main, this was thanks to the swift action of a small number of large academic publishers.
In April 2018, we are celebrating the first anniversary of the initiative. Since the launch, the fraction of indexed scientific articles with open citation data (as measured by Crossref) has surpassed 50% and the number of participating publishers has risen to 490. Over half a billion references are now openly available to the public without any copyright restriction. Of the top-20 biggest publishers with citation data, all but 5 -- Elsevier, IEEE, Wolters Kluwer Health, IOP Publishing, ACS -- now make this data open via Crossref and its APIs. Over 50 organisations -- including science funders, platforms and technology organizations, libraries, research and advocacy institutions -- have joined us in this journey to help advocate and promote the reuse of open citations.
Data liberated by the I4OC is now integrated into bibliometric analysis tools, reused as linked open data in citation corpora, used by volunteer contributors in collaborative knowledge bases and it powers the catalogues of a growing number of scholarly databases.
The publishers who have released their raw citation data into the public domain are making the vision of an open citation graph a reality. But we are only halfway there. We urge the remaining publishers to join this effort -- and researchers, practitioners, librarians, scholarly societies, and members of the public who believe in this vision to help us reach our 100% target. The world is waiting for the citation graph to become a public good.
Dario Taraborelli (@readermeter) is an open knowledge advocate and the Director of Research at the @Wikimedia Foundation.
(Image: Dartar, CC-BY)
3D printing is a dumpster fire of stupid, obvious patents, but thankfully many of these are expiring; this year, the stupid patent on putting sides on a 3D printer (extrusion printers are very sensitive to errant breezes and just a puff of wind can ruin a print that took hours, just minutes before it completes).
Hubert Horan (previously) is a transport industry analyst who has written more than 20 essays for Naked Capitalism as well as two peer-reviewed scholarly articles explaining why Uber is a "bezzle" -- that is, a scam that can't possibly ever make money, no matter how much it preys on drivers, ignores passenger safety, and destroys […]
Artist Coop's revival of the outstanding skulls of the Randotti Corporation (as seen at Disneyland and Walt Disney World!) continues with a line of tights featuring Haunted Skull, Voodoo Skull and Pirate Skull.
When the SNES launched back in the early 1990s, it changed gaming forever. One of the innovations was a gamepad with four action buttons — something that has remained a constant on controllers ever since. The 8BitDo SN30 Bluetooth Gamepad brings that iconic design up to date, with Bluetooth connectivity and support for multiple platforms. […]
After a long day at work, cooking a meal from scratch can seem like too much trouble. Unfortunately, the alternative is usually something unhealthy. Enter the Mellow Sous Vide Precision Cooker. This compact water bath uses cutting-edge technology to cook meat and veggies at the perfect temperature for exactly the right amount of time. It […]
In the course of any day, we encounter many different audio environments. If you are wearing earbuds, the ambient noise level can affect your listening experience. The HUB wireless earbuds adapt to different surroundings using smart noise-cancellation technology. They can either block out distractions or enhance conversations. They are normally priced at $250, but you […]