LinkThe new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.
The paper provides a number of measures of the success of this approach. For one, the import process has identified over a million individual authors, and a similar number of concepts that connect them and the other items stored in the database. The different data sources also seem to have paid off, as the authors determined that well over half of the protein-protein interactions brought in from curated databases could not have been identified by data-mining PubMed abstracts.
In calling for biologists to get involved in the beta process, the people who generated WikiProteins have a number of roles in mind. For starters, they expect that the data mining process has generated a significant number of spurious connections, and hope that the community will help in pruning those. For example, they noted that the gene abbreviation "CLB2" mapped to at least five different genes (depending on the organism), as well as a material used in dentistry, Clearfil Liner Bond 2; manual intervention may be needed to sort these out. They're also hoping that contributors will simply dump sentences from the literature into WikiProteins in order for them to be indexed and further connections mined.
I write books. My latest is a YA science fiction novel called Homeland (it's the sequel to Little Brother). More books: Rapture of the Nerds (a novel, with Charlie Stross); With a Little Help (short stories); and The Great Big Beautiful Tomorrow (novella and nonfic). I speak all over the place and I tweet and tumble, too.
MORE: Copyfight • Happy Mutants • Science
More at Boing Boing
-
wastrel
-
ultramoderate










The new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.
