"pete warden"

A machine-learning wishlist for hardware designers

Pete Warden (previously) is one of my favorite commentators on machine learning and computer science; yesterday he gave a keynote at the IEEE Custom Integrated Circuits Conference, on the ways that hardware specialization could improve machine learning: his main point is that though there's a wealth of hardware specialized for creating models, we need more hardware optimized for running models. Read the rest

Compression could be machine learning's "killer app"

Pete Warden (previously) writes persuasively that machine learning companies could make a ton of money by turning to data-compression: for example, ML systems could convert your speech to text, then back into speech using a high-fidelity facsimile of your voice at the other end, saving enormous amounts of bandwidth in between. Read the rest

Machine learning may be most useful in tiny, embedded, offline processors

The tiny embedded processors in smart gadgets -- including much of the Internet of Shit -- are able to do a lot of sensing without exhausting their batteries, because sensing is cheap in terms of power consumption. Read the rest

Garbage In, Garbage Out: machine learning has not repealed the iron law of computer science

Pete Warden writes convincingly about computer scientists' focus on improving machine learning algorithms, to the exclusion of improving the training data that the algorithms interpret, and how that focus has slowed the progress of machine learning. Read the rest

Machine learning has a reproducibility crisis

Machine learning is often characterized as much an "art" as a "science" and in at least one regard, that's true: its practitioners are prone to working under loosely controlled conditions, using training data that is being continuously tweaked with no versioning; modifying parameters during runs (because it takes too long to wait for the whole run before making changes); squashing bugs mid-run; these and other common practices mean that researchers often can't replicate their own results -- and virtually no one else can, either. Read the rest

How much energy can dust-sized computers harvest from sun and motion, and how much work can they do with it?

Pete Warden reports in from the ARM Research Summit, where James Myers presented on "energy harvesting" by microscopic computers -- that is, using glints of sunlight and the jostling of motion from bumping into things or riding on our bodies to provide power for computation. Read the rest

Anonymizing data is hard-verging-on-impossible -- what do we do about it?

Pete Warden writes on O'Reilly Radar about the problems of anonymizing datasets. AOL, Netflix and others have been burned by releasing datasets that they thought had been stripped of identifiable elements, only to discover that de-anonymizing some or all of the data was easier than they thought. He cites research by Arvind Naryanan, and then makes some practical recommendations for handling user data (including the most important principle: minimize what data you collect in the first place):

Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone's actions has a good chance of matching identifiable public records. Arvind first demonstrated this when he and his fellow researcher took the "anonymous" dataset released as part of the first Netflix prize, and demonstrated how he could correlate the movie rentals listed with public IMDB reviews. That let them identify some named individuals, and then gave access to their complete rental histories. More recently, he and his collaborators used the same approach to win a Kaggle contest by matching the topography of the anonymized and a publicly crawled version of the social connections on Flickr. They were able to take two partial social graphs, and like piecing together a jigsaw puzzle, figure out fragments that matched and represented the same users in both.

Why you can't really anonymize your data Read the rest

iOS devices secretly log and retain record of every place you go, transfer to your PC and subsequent devices

Security researchers presenting at the Where 2.0 conference have revealed a hidden, secret iOS file that keeps a record of everywhere you've been. The record is synched to your PC and subsequently resynched to your other mobile devices. The file is not transmitted to Apple, but constitutes a substantial privacy breach if your PC or mobile device are lost or seized. The researchers, Alasdair Allan and Pete Warden, have released a free/open application called "iPhone Tracker" that allows you to retrieve the location data on your iOS device and examine it. They did not discover a comparable file on Android devices.

The file contains the latitude and longitude of the phone's recorded coordinates along with a timestamp, meaning that anyone who stole the phone or the computer could discover details about the owner's movements using a simple program.

For some phones, there could be almost a year's worth of data stored, as the recording of data seems to have started with Apple's iOS 4 update to the phone's operating system, released in June 2010.

"Apple has made it possible for almost anybody - a jealous spouse, a private detective - with access to your phone or computer to get detailed information about where you've been," said Pete Warden, one of the researchers.

iPhone keeps record of everywhere you go

iPhone Tracker Read the rest