Wired's Kim Zetter rounds up some of the highlights from Untangling the Web: A Guide to Internet Research [PDF], an NSA guide to finding unintentionally published confidential material on the Web produced by the NSA and released in response to a Muckrock Freedom of Information Act request. As Zetter notes, the tactics discussed as described as legal, but are the kind of thing that weev is doing 3.5 years in a Federal pen for:
Want to find spreadsheets full of passwords in Russia? Type “filetype:xls site:ru login.” Even on websites written in non-English languages the terms “login,” “userid,” and “password” are generally written in English, the authors helpfully point out.
Misconfigured web servers “that list the contents of directories not intended to be on the web often offer a rich load of information to Google hackers,” the authors write, then offer a command to exploit these vulnerabilities — intitle: “index of” site:kr password.
“Nothing I am going to describe to you is illegal, nor does it in any way involve accessing unauthorized data,” the authors assert in their book. Instead it “involves using publicly available search engines to access publicly available information that almost certainly was not intended for public distribution.” You know, sort of like the “hacking” for which Andrew “weev” Aurenheimer was recently sentenced to 3.5 years in prison for obtaining publicly accessible information from AT&T’s website.
Use These Secret NSA Google Search Tips to Become Your Own Spy Agency
A NY federal judge handed down a terrible ruling in AP vs Meltwater, which turned on whether providing a search-engine for newswire articles that showed the first sentence or two of the article was fair use. The Electronic Frontier Foundation's Corynne McSherry sums up many of the ways in which this judge got it wrong. We can only hope for an appeal and a better ruling.
Second, the court implicitly adopted AP’s dangerous “heart of the work” theory. AP contended that sharing excerpts of a news article must weigh against fair use if those excerpts contain the lede. The court stressed that the lede is “consistently important” and takes “significant journalistic skill to craft.” But that is beside the point – there is no extra protection because something is extra difficult. More important to the fair use analysis is the fact that (1) is primarily factual; and (2) contains precisely the information the user wishes to make known to others. As we explained in our amicus brief, this case illustrates why the heart of the work doctrine does not mesh well with highly factual, published, news articles. When it comes to news articles, an excerpt that is shared will very often be the most “important” aspect of the work – but that importance will derive from the uncopyrightable factual content, not the expression. It is not the “heart of the work,” but a piece of the factual skeleton upon which the expression hangs.
AP v. Meltwater: Disappointing Ruling for News Search
The Calvin & Hobbes Search Engine performs pretty much as you'd expect: it's a search engine that runs against the full text and descriptions of all the Calvin and Hobbes strips. For example, a search for "snowman" returns,
Mom is sitting at the table when Calvin walks by dressed in his coat and hat. Puzzled, Mom goes upstairs and opens the bedroom door. There, she finds Calvin has opened the window letting snow into the room. Calvin is working on a snowman. Mom just covers her face.
and several others. Handy!
Calvin & Hobbes Search Engine - by Bing
(Thanks, Fipi Lele!)
My latest Guardian column is "Google admits that Plato's cave doesn't exist," a discussion of how Google has changed the way it talks about its search-results, shifting from the stance that rankings are a form of pure math to the stance that rankings are a form of editorial judgment.
Google has, to date, always refused to frame itself in those terms. The pagerank algorithm isn't like an editor arguing aesthetics around a boardroom table as the issue is put to bed. The pagerank algorithm is a window on the wall of Plato's cave, whence the objective, empirical world of Relevance may be seen and retrieved.
That argument is a convenient one when the most contentious elements of your rankings are from people who want higher ranking. "We have done the maths, and your page is empirically less relevant than the pages above it. Your quarrel is with the cold, hard reality of numbers, not with our judgement."
The problem with that argument is that maths is inherently more regulatable than speech. If the numbers say that item X must be ranked over item Y, a regulator may decide that a social problem can be solved by "hard-coding" page Y to have a higher ranking than X, regardless of its relevance. This isn't censorship – it's more like progressive taxation.
Google admits that Plato's cave doesn't exist
Greg from the British Columbia Civil Liberties Association sez, "The BCCLA
is releasing its 'Electronic Devices Privacy Handbook
' (PDF) on Monday. It's a know-your-rights guide and a how-to manual designed to help you keep your data and devices secure when you cross the border into Canada. If you're in Vancouver, BC, handbook author Greg McMullen is giving a talk
to officially launch the handbook on Monday, March 5 at 12:30 at UBC Law." This is a very interesting document -- did you know that you don't have to unlock/decrypt your password for Canadian border officials without a court order (though they'll happily ghost your hard disk and try to brute force it, and Greg adds, "you might also get arrested (or refused entry, if not Canadian) for failing to provide your password if they are feeling especially mean")?
[Video Link] The hard-working star of Der Untergang learns of a recently-launched set of tweaks to Google search results that push Google+ content to the top, integrating social information into search. Steven Levy has a smart piece up today on the launch of Google's "Search Plus Your World" (SPYW) at Wired.com. Internet critics will likely not be the only ones filing complaints about SPYW: The FTC may well have issues with this, too. (via @elinormills)
Google has changed its procedures to enable "forward secrecy" by default on all its search-traffic. This means that part of the key needed to decrypt the traffic is never stored, so that in the event that there is a security breach at Google, older, intercepted traffic can't be descrambled. It's the absolute best practice for secure communications, and Google is to be commended for adopting it.
Other web sites have implemented HTTPS with forward secrecy before — we have it enabled by default on https://www.eff.org/ — but it hasn’t yet been rolled out on a site of Google’s scale. Some sites have publicly resisted implementing forward secrecy because it is more CPU intensive than standard HTTP or HTTPS. In order to address that problem, Google made improvements to the open source OpenSSL library, and has incorporated those changes into the library for anybody to use.
Forward secrecy is an important step forward for web privacy, and we encourage sites, big and small, to follow Google’s lead in enabling it!
Long Term Privacy with Forward Secrecy
A report from Ethnographic Research in Illinois Academic Libraries
documents research done on "digital native" college students to evaluate their skill with refining searches, evaluating search results, and navigating thorny questions of authority and trust in online sources. The researchers concluded that despite their subjects' fluency with the technology, the mental models and critical thinking they brought to bear on search results had real problems:
The prevalence of Google in student research is well-documented, but the Illinois researchers found something they did not expect: students were not very good at using Google. They were basically clueless about the logic underlying how the search engine organizes and displays its results. Consequently, the students did not know how to build a search that would return good sources. (For instance, limiting a search to news articles, or querying specific databases such as Google Book Search or Google Scholar.)
Duke and Asher said they were surprised by “the extent to which students appeared to lack even some of the most basic information literacy skills that we assumed they would have mastered in high school.” Even students who were high achievers in high school suffered from these deficiencies, Asher told Inside Higher Ed in an interview.
In other words: Today’s college students might have grown up with the language of the information age, but they do not necessarily know the grammar.
Lisa Gold concludes: "How are students supposed to acquire these important digital and information literacy skills if they aren’t being taught in schools, many parents and teachers lack these skills themselves, and the librarians who have the skills are basically ignored or fired as libraries close in record numbers?"
What Students Don't Know (Inside Higher Education)
Yet another study shows that “digital natives” suck at searching (Lisa Gold)
(Image: Digital Native, a Creative Commons Attribution Share-Alike (2.0) image from wakingtiger's photostream)