Spidering Word files for embarrassing metadata

A hacker spidered every English microsoft.com site and sucked down all the Word documents, then used a script to identify interesting erasures left behind by the revision-tracking feature. Some interesting stuff fell out of his investigation.

A pointless idea came to my mind that instant: why not run a gentle web spider against all Microsoft sites in English, specifically looking for other instances of tracking data not removed from documents? I coded a bunch of scripts and let them run through the night, fetching approximately 10,000 unique documents; over 10% was identified as containing change tracking records. I decided to collect only those with deleted text still present, yielding a crop of over 5% of all documents. Quite impressive. Below, you will find a brief (and rest assured, incomplete) list of the most entertaining samples I've run into, along with some speculation (and only speculation) as to the reasons we see them.

Link

(Thanks, Eli the Bearded!)