GDELT, a digital news monitoring service backed by Google Jigsaw, has released a massive, open set of linking data, containing 1.78 billion links in CSV, with four fields for each link: "FromSite,ToSite,NumDays,NumLinks."
The dataset has been purged of boilerplate links from headers and footers and is intended to help researchers analyze trends in linking behavior, in service of GDELT's mission to "support new theories and descriptive understandings of the behaviors and driving forces of global-scale social systems from the micro-level of the individual through the macro-level of the entire planet."
It's 396MB compressed, or 986MB uncompressefd.
One of the most useful ways to use this dataset is to sort by the "NumDays" field to rank the top outlets linking to a given site or the top outlets that linked to another outlet. Using the NumDays field allows you to rank connections based on their longevity and filter out momentary bursts (such as a major story leading an outlet to run dozens and dozens of articles linking to an outside website for several days and then never linking to that website again).
The entire dataset was created with a single line of SQL in Google BigQuery, taking just 64.9 seconds and processing 199GB.
Who Links To Whom? The 30M Edge GKG Outlink Domain Graph April 2016 To Jan 2019 – The GDELT Project [GDELT Project]
(via Naked Capitalism)
A year ago, smarting over public criticism of its role in promoting division and stoking racism, Facebook announced a major shift in its newsfeed algorithm which would downrank posts from media organizations and uprank the things sent by your friends on the network, in the name of promotion a gentler form of "engagement" that would […]
A new report from Edison Research finds Facebook's American user-base contracted for the second consecutive year in 2018, shrinking by 15,000,000, and that the biggest declines have come from the coveted 12-34 year old group.
Ulrich Kelber is the German Data Privacy Commissioner, and also a computer scientist, and as such, he is uniquely qualified to comment on the potential consequences of the proposed new EU Copyright Directive, which will be voted on at the end of this month, and whose Article 13 requires that all online communities, platforms and […]
Are you super organized? You’re going to love the Genius Pack G4 and its seemingly limitless, well-placed compartments. Not that organized? You’re still going to love this piece of luggage because it’s so well thought out that it practically does the packing for you. We’ve all tried to stuff a piece of carry-on so full […]
Despite government legislation and improving caller ID technology, robocalls and scam artists are rampant on the phone lines – up to 35 billion a year in the US alone. They can be annoying at best and a financial threat at worst, but there’s a way to take security into your own hands. One good example […]
If you’re a Mac user, you thrive on simplicity. Everything in its place and a place for everything. Unsurprisingly, there’s a ton of great organizational apps out there for Mac, and now someone’s had the great idea to bundle them all together. Whether you’re running a demanding business or just getting through the day to […]