# Cory's Berlin talk: "It's not a fax machine connected to a waffle iron"

Here's the video of "It's not a fax machine connected to a waffle iron," the talk I gave at the Re:publica conference in Berlin this week: "Lawmakers treat the Internet like it's Telephone 2.0, the Second Coming of Video on Demand, or the World's Number One Porn Distribution Service, but it's really the nervous system of the 21st Century. Unless we stop the trend toward depraved indifference in Internet law, making – and freedom – will die."

# Scratch 2.0: programming for kids, now in the browser

The MIT Media Lab's Lifelong Kindergarten Group has shipped version 2.0 of Scratch, the justly famed and much-loved programming language for kids. Scratch makes it easy to create powerful simulations and games, even for small kids (basically, if you can read, you're ready for Scratch). The new version of Scratch runs right in a browser (no downloads or installs required), and is remarkable in its polish and power to excite. The programming environment is embedded in a sharing and shareable community, with millions of Scratch projects ready to be downloaded and remixed. It's just amazing.

With Scratch, you can program your own interactive stories, games, and animations — and share your creations with others in the online community.

Scratch helps young people learn to think creatively, reason systematically, and work collaboratively — essential skills for life in the 21st century.

# OpenWorm: an artificial life sim of an earthworm

Wagner James Au sez, "OpenWorm, as the name suggests, is a collaborative open source project to computationally create a simple artificial life form -- an earth worm -- from the cellular level to a point where it's sophisticated enough to solve basic problems. They're still in early stages, with the latest demo, a developer on the project tells me, being 'a particle simulation of five connected muscle segments moving together through a body of water.'"

# HOWTO play Tetris forever

Given a standard Tetris engine (which drops pieces in a pseudorandom order, has previews, and allows holding), this method will allow you to play Tetris forever. As always, the most fascinating thing about this is the specialized vocabulary used to describe the method:

Worst case bag distributions such as H?XX?X? and H?XXX?? deserve a special mention. The first piece 'H' denotes a piece which must be placed in Hold in order to follow the STZ loop procedure. Pieces from the LJO loop are denoted by '?', and the remaining pieces are denoted by 'X'. Using 3 previews and Hold, it is only possible to see the first 4 pieces of the bag before the second piece enters the screen. This means you only see H?XX, and only know the first piece of the LJO loop. Because H must be put in Hold, you are forced to make a decision without knowing the order of the rest of the LJO loop. If the O comes first, you can follow the procedure above without problems. The rest of the time you will run into complications like this:

# Teaching TCP/IP headers with legos

A Hal Pomeranz from 2010 suggests a great way to teach TCP/IP header structure to students: he builds header diagrams out of legos, then mixes them up and has the students reconstruct them.

The use of color here really highlights certain portions of the packet header. For example, the source and destination addresses and ports really jump out. But there are some other, more subtle color patterns that I worked in here. For example, if you look closely you’ll see that I matched the color of the ACK bit with the blue in the ACK number field. Similarly the colors of the SYN bit and the sequence number match, as do the URG bit and urgent pointer field.

Actually I wish I had a couple of more colors available. Yes, Lego comes in dozens of colors these days, but they only make 2×8 blocks (aka one “Lego Byte”) in six colors: White, Black, Red, Yellow, Blue, and Beige.

So while I tried to use Beige exclusively for size fields, Red for reserved bits, Yellow for checksums, and so on, I ultimately ended up having to use these colors for other fields as well– for example, the yellow sequence number fields in the TCP header. Maybe I should have just bought a bunch of “nibbles” (2×4 blocks) in other colors and not been so choosy about using full “Lego Bytes”.

Since 2010, the lego patent has expired and cheapish wire-extrusion 3D printing has become a reality -- and there's cool procedural models for generating arbitrary-sized bricks and labelling them with arbitrary type. Someone needs to make a printable TCP diagramming set on Thingiverse!

# Computer-vision boob-tracker

There's precious little info available about Mizirk "Boob Tracker," a computer vision project (based on a Kinekt?) that automatically detects boob-like objects and masks them with user-selectable bitmaps, following them as they move around the field of view. Mizirk's total delight in the performance of this little confection is what makes it.

(Thanks, Fipi Lele!)

# Controlling a robot arm with an Android phone

Paul sez, "This past semester, three engineering grad students at the University of Toronto (myself and two others) created an Android app for a course project that allows for wireless and intuitive control of a robotic arm from an Android-powered smartphone. We're pretty proud of the results (the link is to a demo we put together) and have released the code open source."

Android Robotic Manipulator Demo (Thanks, Paul!)

# Detect your pulse with your webcam

Thearn released a free/open program for detecting and monitoring your pulse using your webcam. The code is on github for you to download, play with and modify. If this stuff takes your fancy, be sure and read Eulerian Video Magnification for Revealing Subtle Changes in the World, an inspiring paper describing the techniques Thearn uses in his code:

This application uses openCV (http://opencv.org/) to find the location of the user's face, then isolate the forehead region. Data is collected from this location over time to estimate the user's heartbeat frequency. This is done by measuring average optical intensity in the forehead location, in the subimage's green channel alone. Physiological data can be estimated this way thanks to the optical absorbtion characteristics of oxygenated hemoglobin.

With good lighting and minimal noise due to motion, a stable heartbeat should be isolated in about 15 seconds. Other physiological waveforms, such as Mayer waves (http://en.wikipedia.org/wiki/Mayer_waves), should also be visible in the raw data stream.

Once the user's pulse signal has been isolated, temporal phase variation associated with the detected hearbeat frequency is also computed. This allows for the heartbeat frequency to be exaggerated in the post-process frame rendering; causing the highlighted forhead location to pulse in sync with the user's own heartbeat (in real time).

Support for pulse-detection on multiple simultaneous people in an camera's image stream is definitely possible, but at the moment only the information from one face is extracted for cardiac analysis

# Solving classic NES games computationally

Dr. Tom Murphy VII gave a research paper called "The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travel . . . after that it gets a little tricky," (PDF) (source code) at SIGBOVIK 2013, in which he sets out a computational method for solving classic NES games. He devised two libraries for this: learnfun (learning fuction) and playfun (playing function). In this accompanying video, he chronicles the steps and missteps he took getting to a pretty clever destination.

# Algorithmically constructed news

In Wired, Steven Levy has a long profile of the fascinating field of algorithmic news-story generation. Levy focuses on Narrative Science, and its competitor Automated Insights, and discusses how the companies can turn "data rich" streams into credible news-stories whose style can be presented as anything from sarcastic blogger to dry market analyst. Narrative Science's cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won't be due to computers stealing journalists' jobs -- rather, it will be because automation will enable the creation of whole classes of news stories that don't exist today, such as detailed, breezy accounts of every little league game in the country.

Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects: Both involve the fluctuations of numbers—earnings per share, stock swings, ERAs, RBI. And stats geeks are always creating new data that can enrich a story. Baseball fans, for instance, have created models that calculate the odds of a team’s victory in every situation as the game progresses. So if something happens during one at-bat that suddenly changes the odds of victory from say, 40 percent to 60 percent, the algorithm can be programmed to highlight that pivotal play as the most dramatic moment of the game thus far. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?

Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers. (For baseball, the meta-writers seem to have relied heavily on famed early-20th-century sports columnist Ring Lardner. People are always whacking home runs, swiping bags, tallying runs, and stepping up to the dish.) The company calls its finished product “the narrative.”

Both companies claim that they'll be able to make sense of less-quantifiable subjects in the future, and will be able to generate stories about them, too.

# Corrected notes on the feeding of yes to yes

This morning, I posted M Tangs funny experiment in feeding the Unix "yes" command to itself. Now, Seth David Schoen writes in to correct and expand upon the principles therein:

yes yes no

is based on a bit of a misconception. The problem is _not_ about combining one yes command with another yes command. Whenever you use the backtick syntax , like in a hypothetical command

foo bar

the shell will first run the command bar (to completion) before it even tries to start foo. The shell will also save the complete output of bar in memory, and then present it as a set of command-line arguments to foo.

In this case, the shell is trying to run the command "yes no" to completion, saving its output in memory, before even starting the other yes command. Of course, "yes no" never finishes, but it does use up an arbitrarily large amount of memory.

To see that the problem is with the use of yes rather than with the combination of two yes commands, just try

echo yes no

or even

true yes no

Both of these forms have exactly the same memory-consumption problem as the original command, and for exactly the same reason! So, Tang is wrong to think that he is somehow creating a problem by combining multiple yesses. The problem is in asking the shell to remember an infinite amount of output.

As other people have mentioned in comments, the  syntax is also not piping. Piping is done with |, while  refers to substitution. The distinction is whether the output of program A appears as input to program B (piping) or as command-line arguments to program B (substitution). For example,

echo foo bar | wc -w

outputs the number 2 (thats the total number of words in the text "foo bar"), while

wc -w echo foo bar

counts the number of words in the files foo and bar.

# Stupid Unix trick: why you shouldn't pipe yes into yes

Update: M Tang's explanation for this is wrong, but Seth Schoen sent us a great correction

There's a GNU-coreutils program called yes whose function is to "output a string repeatedly until killed." M Tang tried piping the output of one yes command into another. It ended badly:

Taking a look at the source code for yes, it looks like the single argument is being stored in a char array, then, in a while(true) and for loop, each character is printed to the stdout, followed by a new line (\n) character.

So when we use the output of one yes command as the argument for another, the outer yes command fills up the computer’s memory with the output of the inner yes command. Then I have to restart my computer and feel stupid.

yes 'yes no' (via Hacker News)

# Aaron Swartz's unfinished monograph on the "programmable Web"

Michael B. Morgan, CEO of Morgan & Claypool Publishers, writes:

In 2009, we invited Aaron Swartz to contribute a short work to our series on Web Engineering (now The Semantic Web: Theory and Technology). He produced a draft of about 40 pages -- a "first version" to be extended later -- which unfortunately never happened.

After his death in January, we decided (with his family's blessing) that it would be a good idea to publish this work so people could read his ideas about programming the Web, his ambivalence about different aspects of Semantic Web technology, his thoughts on Openness, and more.

(Thanks, Michael!)

# Big Data: A Revolution That Will Transform How We Live, Work, and Think

Big Data is a new book from Viktor Mayer-Schonberger, a respected Internet governance theorist; and Kenneth Cukier, a long-time technology journalist who's been on the Economist for many years. As the title and pedigree imply, this is a business-oriented book about "Big Data," a computational approach to business, regulation, science and entertainment that uses data-mining applied to massive, Internet-connected data-sets to learn things that previous generations weren't able to see because their data was too thin and diffuse.

Big Data is an eminently practical and sensible book, but it's also an exciting and excitable text, one that conveys enormous enthusiasm for the field and its fruits. The authors use well-chosen examples to show how everything from shipping logistics to video-game design to healthcare stand to benefit from studying the whole data-set, rather than random samples. They even pose this as a simple way of thinking of big data versus "small data." Small data relies on statistical sampling, and emphasises the reliability and accuracy of each measurement. With big data, you sample the entire pool of activities -- all the books sold, all the operations performed -- and worry less about inaccuracies and anomalies in individual measurements, because these are drowned out by the huge numbers of observations performed.

As you'd expect, Big Data is particularly fascinating when it explores the business implications of all this: the changing leverage between firms that own data versus the firms that know how to make sense of it, and why sometimes data is best processed by unaffiliated third parties who can examine data from rival firms and find out things from which all parties stand to benefit, but which none of them could have discovered on their own. They also cover some of the bigger Big Data business blunders through history -- companies whose culture blinkered them to the opportunities in their data, which were exploited by clever rivals.

The last fifth of the book is dedicated to issues of governance, regulation, and public policy. This is some of the most interesting material in the book and probably needs to be expanded into its own volume. As it is, there's a real sense that the authors are just scraping the surface. For example, many of the stories told in the book have deep privacy implications, and the authors make a point of touching on these, cabining them with phrases like "so long as the data is anonymized" or "adhering to privacy policy, of course." But in the final third, the authors examine the transcendental difficulty of real-world anonymization, and the titanic business blunders committed by firms that believed they'd stripped out the personal information from the data, only to have the data "de-anonymized" and their customers' privacy invaded in small and large ways. These two facts -- that many of the opportunities require effective anonymization and that no one knows how to do anonymization -- are a pretty big stumbling block in the world of Big Data, but the authors don't explicitly acknowledge the conundrum.

While Big Data is an excellent primer on the opportunities of the field, it's thin on the risks, overall. For example, Big Data is rightly fascinated with stories about how we can look at data sets and find predictors of consequential things: for example, when Google mined its query-history and compared it with CDC data on flu outbreaks, it found that it could predict flu outbreaks ahead of the CDC, which is amazingly useful. However, all those search-strings were entered by people who didn't expect to have them mined for subsequent action. If searching for "scratchy throat" and "runny nose" gets your neighborhood quarantined (or gets it extra healthcare dollars), you might get all your friends to search on those terms over and over -- or not at all. Google knows this -- or it should -- because when it started measuring the number of links between sites to define the latent authority of different parts of the Internet, it got great results, but immediately triggered a whole scummy ecosystem of linkfarms and other SEO tricks that create links whose purpose is to produce more of the indicators Google is searching for.

Another important subject is looking at algorithmic prediction in domains where the outcome is punishment, instead of reward. British Airways may get great results from using an algorithm to pick out passengers for upgrades, trying to find potential frequent fliers. But we should be very cautious about applying the same algorithm to building the TSA's No-Fly list. If BA's algorithm fails 20% of the time, it just means that a few lucky people get to ride up front of the plane. If the TSA has a 20% failure rate, it means that one in five "potential terrorists" is an innocent whose fundamental travel rights have been compromised by a secretive and unaccountable algorithm.

Secrecy and accountability are the third important area for examination in a Big Data world. Cukier and Mayer-Schonberger propose a kind of inspector-general for algorithms who'll make sure they're not corrupted to punish the undeserving or line someone's pockets unjustly. But they also talk about the fact that these algorithms are likely to be illegible -- the product of a continuously evolving machine-learning system -- and that no one will be able to tell you why a certain person was denied credit, refused insurance, kept out of a university, or blackballed for a choice job. And when you get into a world where you can't distinguish between an algorithm that gets it wrong because the math is unreliable (a "fair" wrong outcome) from an algorithm that gets it wrong because its creators set out to punish the innocent or enrich the undeserving, then we can't and won't have justice. We know that computers make mistakes, but when we combine the understandable enthusiasm for Big Data's remarkable, counterintuitive recommendations with the mysterious and oracular nature of the algorithms that produce those conclusions, then we're taking on a huge risk when we put these algorithms in charge of anything that matters.

# How an algorithm came up with Amazon's KEEP CALM AND RAPE A LOT t-shirt

You may have heard that Amazon is selling a "KEEP CALM AND RAPE A LOT" t-shirt. How did such a thing come to pass? Well, as Pete Ashton explains, this is a weird outcome of an automated algorithm that just tries random variations on "KEEP CALM AND," offering them for sale in Amazon's third-party marketplace and printing them on demand if any of them manage to find a buyer.

The t-shirts are created by an algorithm. The word “algorithm” is a little scary to some people because they don’t know what it means. It’s basically a process automated by a computer programme, sometimes simple, sometimes complex as hell. Amazon’s recommendations are powered by an algorithm. They look at what you’ve been browsing and buying, find patterns in that behaviour and show you things the algorithm things you might like to buy. Amazons algorithms are very complex and powerful, which is why they work. The algorithm that creates these t-shirts is not complex or powerful. This is how I expect it works.

1) Start a sentence with the words KEEP CALM AND.

2) Pick a word from this long list of verbs. Any word will do. Don’t worry, I’m sure they’re all fine.

3) Finish the sentence with one of the following: OFF, THEM, IF, THEM or US.

4) Lay these words out in the classic Keep Calm style.

5) Create a mockup jpeg of a t-shirt.

6) Submit the design to Amazon using our boilerplate t-shirt description.

7) Go back to 1 and start again.

There are currently 529,493 Solid Gold Bomb clothing items on Amazon. Assuming they survive this and don’t get shitcanned by Amazon I wouldn’t be at all surprised if they top a million in a few months.

It costs nothing to create the design, nothing to submit it to Amazon and nothing for Amazon to host the product. If no-one buys it then the total cost of the experiment is effectively zero. But if the algorithm stumbles upon something special, something that is both unique and funny and actually sells, then everyone makes money.

# Profane commit-messages from GitHub

Commit Logs From Last Night: highlights funny, profane source-code commit-messages from GitHub, as bedraggled hackers find themselves leaving notes documenting their desperate situations. Some recent ones:

WHY THE GODDAMMIT WHY WHY WHY HAROGIHAROGIAHRGOIA FUCK ME

render testing I DREW SOME LINES! reverted render panel to grew (white looks shit)

Merge pull request #15 from ruvetia/font_awesome_is_fucking_awesome include font-awesome into the projcet

# Students get class-wide As by boycotting test, solving Prisoner's Dilemma

Johns Hopkins computer science prof Peter Fröhlich grades his students' tests on a curve -- the top-scoring student gets an A, and the rest of the students are graded relative to that brainiac. But last term, his students came up with an ingenious, cooperative solution to this system: they all boycotted the test, meaning that they all scored zero, and that zero was the top score, and so they all got As. The prof was surprisingly cool about it:

Fröhlich took a surprisingly philosophical view of his students' machinations, crediting their collaborative spirit. "The students learned that by coming together, they can achieve something that individually they could never have done," he said via e-mail. “At a school that is known (perhaps unjustly) for competitiveness I didn't expect that reaching such an agreement was possible.”

The story of the boycott is a sterling example of how computer networks solve collective action problems -- the students solved a prisoner's dilemma in a mutually optimal way without having to iterate, which is impressive:

“The students refused to come into the room and take the exam, so we sat there for a while: me on the inside, they on the outside,” Fröhlich said. “After about 20-30 minutes I would give up.... Then we all left.” The students waited outside the rooms to make sure that others honored the boycott, and were poised to go in if someone had. No one did, though.

Andrew Kelly, a student in Fröhlich’s Introduction to Programming class who was one of the boycott’s key organizers, explained the logic of the students' decision via e-mail: "Handing out 0's to your classmates will not improve your performance in this course," Kelly said.

"So if you can walk in with 100 percent confidence of answering every question correctly, then your payoff would be the same for either decision. Just consider the impact on your other exam performances if you studied for [the final] at the level required to guarantee yourself 100. Otherwise, it's best to work with your colleagues to ensure a 100 for all and a very pleasant start to the holidays."

Fröhlich's changed the grading system -- but he's also now offering the students a final project instead of a final exam, should they choose.

Dangerous Curves [Zack Budryk/Inside Higher Ed]

# Malware-Industrial Complex: how the trade in software bugs is weaponizing insecurity

Here's a must-read story from Tech Review about the thriving trade in "zero-day exploits" -- critical software bugs that are sold off to military contractors to be integrated into offensive malware, rather than reported to the manufacturer for repair. The stuff built with zero-days -- network appliances that can snoop on a whole country, even supposedly secure conversations; viruses that can hijack the camera and microphone on your phone or laptop; and more -- are the modern equivalent of landmines and cluster bombs: antipersonnel weapons that end up in the hands of criminals, thugs and dictators who use them to figure out whom to arrest, torture, and murder. The US government is encouraging this market by participating actively in it, even as it makes a lot of noise about "cyber-defense."

Exploits for mobile operating systems are particularly valued, says Soghoian, because unlike desktop computers, mobile systems are rarely updated. Apple sends updates to iPhone software a few times a year, meaning that a given flaw could be exploited for a long time. Sometimes the discoverer of a zero-day vulnerability receives a monthly payment as long as a flaw remains undiscovered. “As long as Apple or Microsoft has not fixed it you get paid,” says Soghioan.

No law directly regulates the sale of zero-days in the United States or elsewhere, so some traders pursue it quite openly. A Bangkok, Thailand-based security researcher who goes by the name “the Grugq” has spoken to the press about negotiating deals worth hundreds of thousands of dollars with government buyers from the United States and western Europe. In a discussion on Twitter last month, in which he was called an “arms dealer,” he tweeted that “exploits are not weapons,” and said that “an exploit is a component of a toolchain … the team that produces & maintains the toolchain is the weapon.”

The Grugq contacted MIT Technology Review to state that he has made no “public statement about exploit sales since the Forbes article.”

# Flying malware: the Virus Copter

At the latest San Francisco Drone Olympics (now called DroneGames, thanks, no doubt, to awful bullying from the organized crime syndicate known as the International Olympic Committee), there were many fascinating entries, but the champion was James "substack" Halliday's Virus-Copter (github), which made wireless contact with its competitors, infected them with viruses that put them under its control, sent them off to infect the rest of the cohort, and then caused them to "run amok."

Many people have written to point out that Virus-Copter shares some DNA with one of the plot elements in my novel Pirate Cinema, but I assure you the resemblance is entirely coincidental. Drones, after all, are stranger than technothrillers.

Here's the \$300 drone the competitors were flying.