The European Organization for Nuclear Research, or CERN, has been releasing portions of its research to the public for years. This week's massive 300 terabyte dump of Large Hadron Collider (LHC) data is the biggest yet by a long shot — and it's all out there, open source, free for the exploration.
"As scientists, we should take the release of data from publicly funded research very seriously," CMS collaboration physicist Salvatore Rappoccio said in the release statement. "In addition to showing good stewardship of the funding we have received, it also provides a scientific benefit to our field as a whole. While it is a difficult and daunting task with much left to do, the release of CMS data is a giant step in the right direction."
CMS stands for Compact Muon Solenoid. This experiment is one of two large general-purpose particle physics detectors built on the Large Hadron Collider.
From CERN's announcement:
Today, the CMS Collaboration at CERN has released more than 300 terabytes (TB) of high-quality open data. These include over 100 TB, or 2.5 inverse femtobarns (fb−1), of data from proton collisions at 7 TeV, making up half the data collected at the LHC by the CMS detector in 2011. This follows a previous release from November 2014, which made available around 27 TB of research data collected in 2010.
Available on the CERN Open Data Portal — which is built in collaboration with members of CERN's IT Department and Scientific Information Service — the collision data are released into the public domain under the CC0 waiver and come in types: The so-called "primary datasets" are in the same format used by the CMS Collaboration to perform research. The "derived datasets" on the other hand require a lot less computing power and can be readily analysed by university or high-school students, and CMS has provided a limited number of datasets in this format.
Notably, CMS is also providing the simulated data generated with the same software version that should be used to analyse the primary datasets. Simulations play a crucial role in particle-physics research and CMS is also making available the protocols for generating the simulations that are provided. The data release is accompanied by analysis tools and code examples tailored to the datasets. A virtual-machine image based on CernVM, which comes preloaded with the software environment needed to analyse the CMS data, can also be downloaded from the portal.
[via Washington Post]