The citation graph is one of humankind's most important intellectual achievements
When researchers write, we don't just describe new findings -- we place them in context by citing the work of others. Citations trace the lineage of ideas, connecting disparate lines of scholarship into a cohesive body of knowledge, and forming the basis of how we know what we know.
Today, citations are also a primary source of data. Funders and evaluation bodies use them to appraise scientific impact and decide which ideas are worth funding to support scientific progress. Because of this, data that forms the citation graph should belong to the public. The Initiative for Open Citations was created to achieve this goal.
Back in the 1950s, reference works like Shepard's Citations provided lawyers with tools to reconstruct which relevant cases to cite in the context of a court trial. No such a tool existed at the time for identifying citations in scientific publications. Eugene Garfield -- the pioneer of modern citation analysis and citation indexing -- described the idea of extending this approach to science and engineering as his Eureka moment. Garfield's first experimental Genetics Citation Index, compiled by the newly-formed Institute for Scientific Information (ISI) in 1961, offered a glimpse into what a full citation index could mean for science at large. It was distributed, for free, to 1,000 libraries and scientists in the United States.
Fast forward to the end of the 20th century. the Web of Science citation index -- maintained by Thomson Reuters, who acquired ISI in 1992 -- has become the canonical source for scientists, librarians, and funders to search scholarly citations, and for the field of scientometrics, to study the structure and evolution of scientific knowledge. ISI could have turned into a publicly funded initiative, but it started instead as a for-profit effort. In 2016, Thomson Reuters sold its Intellectual Property & Science business to a private-equity fund for $3.55 billion. Its citation index is now owned by Clarivate Analytics.
Raw citation data being non-copyrightable, it's ironic that the vision of building a comprehensive index of scientific literature has turned into a billion-dollar business, with academic institutions paying cripplingly expensive annual subscriptions for access and the public locked out.
Companies such as Clarivate Analytics or Elsevier (who owns its own citation index, Scopus) have put substantial efforts into creating proprietary high-quality indexes out of raw citation data, and proprietary metrics based on this data to assess the impact of scientific publications. But the fact that the citation data itself -- produced by the labor of millions of researchers as part of their scientific communication activity -- is not a public good that anyone can access is nothing short of "a scandal", as long-standing open citations advocate David Shotton eloquently put it.
"Openness is central to the research endeavor," says Cassidy Sugimoto and collaborators in an open letter published by the International Society for Scientometrics and Informetrics. "It is essential to promote reproducibility and appraisal of research, reduce misconduct, and ensure equitable access to and participation in science. Yet, calls for increased openness in science are often met with initial resistance."
Proprietary citation databases are available to universities and funding bodies via expensive subscriptions, but the restrictive nature of their licenses means that these databases don't allow any kind of reuse or fully reproducible data analysis. Building on citation data is only possible to those people and organizations licensed to access proprietary databases.
There are no citation databases that support the open, unconstrained reuse of their underlying data. Opening up the data that forms the citation graph -- to quote the open letter from ISSI -- "is a matter of scientific integrity, scientific progress, and equity."
Enter the Initiative for Open Citations.
In 2016, a small group founded the Initiative for Open Citations (I4OC) as a voluntary effort to work with scholarly publishers -- who routinely publish this data -- to persuade them to release it in the open and promote its unrestricted availability. Before the launch of the I4OC, only 1% of indexed scholarly publications with references were making citation data available in the public domain. When the I4OC was officially announced in 2017, we were able to report that this number had shifted from 1% to 40%. In the main, this was thanks to the swift action of a small number of large academic publishers.
In April 2018, we are celebrating the first anniversary of the initiative. Since the launch, the fraction of indexed scientific articles with open citation data (as measured by Crossref) has surpassed 50% and the number of participating publishers has risen to 490. Over half a billion references are now openly available to the public without any copyright restriction. Of the top-20 biggest publishers with citation data, all but 5 -- Elsevier, IEEE, Wolters Kluwer Health, IOP Publishing, ACS -- now make this data open via Crossref and its APIs. Over 50 organisations -- including science funders, platforms and technology organizations, libraries, research and advocacy institutions -- have joined us in this journey to help advocate and promote the reuse of open citations.
Data liberated by the I4OC is now integrated into bibliometric analysis tools, reused as linked open data in citation corpora, used by volunteer contributors in collaborative knowledge bases and it powers the catalogues of a growing number of scholarly databases.
The publishers who have released their raw citation data into the public domain are making the vision of an open citation graph a reality. But we are only halfway there. We urge the remaining publishers to join this effort -- and researchers, practitioners, librarians, scholarly societies, and members of the public who believe in this vision to help us reach our 100% target. The world is waiting for the citation graph to become a public good.
Dario Taraborelli (@readermeter) is an open knowledge advocate and the Director of Research at the @Wikimedia Foundation.
(Image: Dartar, CC-BY)
When I was in Berlin last month, I stopped into the offices of Netzpolitik (previously), the outstanding German digital rights activist group, where I recorded an interview for their podcast (MP3), talking about science fiction, utopianism, dystopianism, how we can change the world, and why my kid has so many names.
Prior to 1976, the FDA did not regulate medical implants, and so shoddy and even deadly devices proliferated, inserted into Americans' body.
Michael Nielsen was a Fulbright Scholar who got his Ph.D. in Physics at 24. He was already tenured when he decided just three years later to shift his attention to helping democratize Science. He’s published three books, most recently Reinventing Discovery: The New Era of Networked Science. Currently, he’s a Research Fellow at YC Research […]
Digital or analog, there’s a path of least resistance for any project. Finding that path is what the Agile methodology is all about, which is why proficiency in it is a must for any project management position – and the paycheck that comes with it. And the quickest path to learning Agile? The Agile Project […]
Everybody’s flown a paper airplane. But what if you could fly on a paper airplane? Until we invent shrink-ray technology, the PowerUp X FPV Video Paper Airplane Kit will have to do – but it’s as fun as that sounds and more. The original version of this creative toy added drone tech to the old, […]
Adobe’s design software catalog is essential to any graphics program, as much for their simplicity as their versatility. Anyone can be an effective graphic designer with tools like Illustrator and InDesign – and the right training in their potential. That’s where the Adobe CC A-Z Lifetime Bundle comes in. Whether you’re getting your feet wet […]