Documentation Gathering, Sanitization, and Storage: an excerpt from "A Public Service"

[Yesterday, we published my review of Tim Schwartz's new guide for whistleblowers, A Public Service: Whistleblowing, Disclosure and Anonymity; today, I'm delighted to include this generous excerpt from Schwartz's book. Schwartz is an activist whom I've had the pleasure of working with and I'm delighted to help him get this book into the hands of the people who need to read it. -Cory]


As you collect documents and bring new information to light, be aware that you are in an escalating digital arms race. There will always be new ways that data forensics can identify you, or uncover information based on data that you inadvertently leave in your files, or data that is retained in logs noting who has accessed what files on what network. Recently it was discovered that noise from electrical grids can be used to quite accurately pinpoint when, and potentially where, an audio recording was made. The best way to win this war—or at least to avoid becoming collateral damage—is to work outside the standard methods and find partners who have experience.

Of course, the actual collection of documents has changed dramatically over the years. In 1969, Daniel Ellsberg systematically removed documents, including the Pentagon Papers, from the RAND Corporation in his briefcase, taking them to an advertising agency where he (sometimes with the help of his 13-year-old son) photocopied them, one page at a time. Though this took enormous courage and psychological stamina—and in 1969 all that copying was certainly time-consuming and undoubtedly tiresome—it was also technologically straightforward and relatively safe. As long as the guards didn't stop and check his briefcase, and as long as no one saw him remove and return the reports, Ellsberg could duplicate the papers undetected.

If Ellsberg was trying to do the same thing in 2019 with physical documents, he would have to be sure there weren't cameras looking over his shoulder. He would have to make sure that the documents themselves didn't have watermarks that would lead back to him. And he would have to make sure that the copying method didn't log his activity. If Ellsberg's 21st-century counterpart were to take digital documents, there would be many more potential technological risks and traps to avoid along the way.

Take Notes

Before you start collecting documents or even trying to tell anyone about the wrongs you want to expose, start documenting what you see. Jesselyn Radack, who heads the Whistleblower and Source Protection Program at ExposeFacts and has worked with Thomas Drake and Edward Snowden, says the first step is to "just keep your own little record at home in a little notebook." This should be a notebook where you methodically record everything pertinent to the wrongs you want to expose: everything that you see, everything that you hear, and everything that you say. Do this as often as you can, the same day that incidents occur. Note the time and date of each occurrence. Above all, your notes should always include any complaints you raise and to whom, as well as any retaliation against you for doing so.

This approach to notetaking played a critical role in the big Russian sports doping scandal in 2016. Grigory Rodchenkov, the whistleblower and former doctor of the Russian Olympic team, took incredibly detailed contemporaneous notes that became compelling evidence. The notes included Rodchenkov's interactions with Russian coaches, officials, and athletes, such as how and when he provided performance-enhancing drugs to athletes, and how the doping was hidden from Olympic observers and their drug tests. Aside from all of these incriminating notes, as the New York Times reported, Rodchenkov also noted his daily activities details such as "6:30, I took a shower, had a smoke, got ready, had hot cereal and farmer's cheese at breakfast." These seemingly trivial details helped convince the judges to allow the journal to be considered credible evidence in the court case.

The technology you use to take notes can either help or hinder those who might seek to access and/or destroy any information you have, depending upon your situation. You can use a physical notebook, good old pen and paper, or notes on an anonymous laptop or tablet. But be sure to stay away from making entries at work or on your personal computer unless you are highly technically confident of your computer's security.

"Documentation is very important," says Debra Katz, founding partner of Katz, Marshall & Banks, LLP and the lawyer who represented Christine Blasey Ford when she was called to testify during the Kavanaugh confirmation hearings. "We increasingly have people who show up with videotapes of harassment. I've had clients who've had their iPhone rolling as their employer, predictably, would come in and do back massages or make sexual remarks." Logs of text messages on phones or even recordings of interactions can be crucial to demonstrating that harassment is taking place. Save logs of all of your conversations and interactions, because you never know how they might prove useful later on.

The text messages sent by Mike Isabella and partners to Chloe Caras (who was also represented by Debra Katz) were used as evidence in the lawsuit that eventually took down Mike Isabella Concepts restaurants for sexual harassment. If you are going to attempt to record interactions as evidence, be sure that you are aware of the relevant recording laws. In some states and countries, you must inform the other party that you are recording and you must obtain their consent to be recorded. These laws are collectively known as two-party consent laws. Do more research into your context before you start shooting video or recording audio as documentation. You don't want your evidence thrown out of court. You don't want to be sued for releasing the recording. The Reporters Committee for Freedom of the Press is a good place to learn more about two-party consent laws in the United States.

Recommended Collection Approach

In New York City in 1953, a newspaper boy was finishing his day, jingling his coins around, when he noticed that one nickel felt lighter than the rest. When he dropped the coin on the floor, it split open, revealing a tiny photograph with numbers. This turned out to be microfilm that was destined for Soviet spy Reino Hayhanen. In 1957, Hayhanen defected to the U.S., where he exposed the spycraft of the Soviets to the FBI. This included the use of microfilm and dead drops for communication. Though this example may seem far from the world of computers and smartphones, taking photos of documents with microfilm is much safer than taking the actual documents, in the same way taking a digital photo is safer than copying the digital document. In such a case, there is far less potential for a log of the interaction.

The current best way to gather information is by taking pictures of documents or computer screens using a pseudonymous digital device. This method effectively circumvents all of the normal digital surveillance systems that might come into play when you copy data off of a network or onto a USB stick (e.g., logs of the copying or digital watermarking). It also circumvents any logging software that may be installed on your computer. Company or government tracking software can record the actions of taking screenshots or other mouse and keyboard actions. Evidence from one of these loggers was used by the FBI against Terry Albury, an FBI field agent who was sent to jail for disclosing classified information to The Intercept. In an affidavit in support of the search warrant, the FBI cited a number of facts, including that Albury had "conducted cut and paste activity" while viewing one of the classified documents. This fact could only have been gathered by latent logging software installed on his computer or built into a viewing program. By skipping digital copying or screenshotting, and instead simply taking a picture of the computer screen, you can circumvent some of these monitoring systems. Of course, if you are logged in and have a document open, you should assume that there is a log of the access as well.

Keep these tips in mind:

  • Only use a pseudonymous device for taking photos; never use your personal or work device.
  • Use a small tablet with Wi-Fi turned off instead of a phone; this way there will be no location information stored as metadata in the photos.
  • Make sure the photos don't have any identifying information in them; this could be your hand, your reflection on the computer screen, images of your office, or other identifying information or marks on your computer screen.
  • Be sure to check the images afterwards for any metadata or accidental information captured, and make sure to sanitize the images if necessary.
  • Audio and video recordings can potentially replace taking photos, but these types of files can be harder to sanitize.
  • Be sure there aren't video cameras that could capture you in the act of taking photos.


Do not trust printers. Color laser jet printers and copiers embed metadata in the documents that they print in the form of microdots, which are patterns of tiny yellow dots that are almost invisible to the naked eye. These dots encode information, similar to QR codes. This includes the printer's serial number, the time and date, the network address, and potentially other information. This data can be used to pinpoint when and where documents were printed, and potentially by whom. If you want to find out more on the topic, research the terms "printer steganography" and "machine identification code."

Regular and enhanced image of a printed page from an HP Color LaserJet 3700 showing yellow microdots. Photography by Florian Heise,, in the public domain via Wikimedia Commons.

Copying Digital Files

It is nearly impossible to copy files to a USB stick without leaving a trace, particularly if you are using log-in credentials at work or on a company device. Computers and networks are built to track and log file access, transfers, and printing. Do not try to make a digital transfer or to copy information onto a USB stick at work unless you can be positive that this process isn't being logged somewhere. Use the Tails operating system, or a computer that is offline, when you copy data.

If you must copy digital files, be sure to collect all your information as anonymously as possible: use a shared computer at work (not your own). Do not use your own login credentials. Also, consider your physical location. It is best not to attempt this in your own office, for instance. Gathering information in the office will become even less viable as technology and employee surveillance software evolves.

Aside from the issues around copying digital files, some sensitive documents (particularly from government agencies) come with "phone home" beacons embedded in them or with digital rights management built in, making it impossible to view or print documents if you aren't logged in. This could be a remote image or link embedded in a document, such that when you view the document, the image pings back to a server owned by the government or creator of the document. This allows the creator to see the IP address and potentially more information about you as a viewer. Microsoft files such as Word documents have been known to have "locating beacons" placed within them. PDFs may also include this type of beacon, though Adobe now tries to notify users before documents call a remote server. To combat this type of tracking, either convert a document to a safe format such as plain text with the command line, or view a document on a computer that is "air-gapped," meaning that it is not connected to the internet. Make it impossible for your adversary to know you have the documents.

Uniqueness and Backflushing

If you are one of a limited number of individuals with access to the information you are releasing, then no matter how careful you are, it will be easy to trace you. This was the case with Reality Winner. In the criminal complaint filed against Winner, the FBI noted that only six individuals had accessed the document that was disclosed to The Intercept. When this document showed up on the website, the FBI had six individuals to start investigating, including Winner. Her unique trail quickly made her the most likely suspect. One way to combat uniqueness is by increasing the number of individuals who have access to a document before it is released.

Danielle Brian, executive director of the Project on Government Oversight, described a method that has been in use in D.C. for years: "backflushing." Before disclosing a document, send it through official channels to as many legitimate places as possible. For example, include the document in a report and send the report to other departments. This makes it so others have the document as well, vastly reducing the uniqueness of your connection to the document. When you disclose the information later on, it will not be clear that you were in any way connected to it.

Another way to combat uniqueness is by gathering the data through a shared digital account, e.g., if someone else is logged into a computer and you copy a file while they are logged in, the document-gathering will be connected to them, not you. Of course, this should be done carefully and ethically, so as not to inadvertently cast blame on someone else. If possible, it's better to hijack a shared network account. So consider how unique the connection between the information and your identity might be. There is protection to be gained by hiding in the crowd.

Theft and Misfiling

Corporations sometimes lash back at whistleblowers by filing criminal charges for theft of company property. So be aware that by taking documents off company property, you may open yourself up to a legal battle. This was one reason that SOC, a government security contractor, gave for firing Jennifer Glover, a security guard who had been sexually assaulted and harassed at work. Her termination letter stated that Glover had used her smartphone to take a photograph of the daily schedule, an act that they viewed as justifying her termination.

As an alternative to taking physical or digital documents, consider the misfiling technique. Hide copies of documents at work, either by misnaming digital files or by storing physical copies or USB sticks somewhere at work. In the future, you can "stumble upon" the copies, providing investigators with the information. They, not you, would then be removing property from company premises. The bottom line is that it might be helpful to have a backup copy of any important material stashed somewhere at work.


Sanitization is the process of removing, concealing, or cleaning up information in documents before you give them to someone else. Whether the documents you're dealing with are physical or digital, images or videos, the same general process applies: you should overwrite, obscure, or remove any sensitive information. This process is ubiquitous the world over in redacting classified material to prepare it for release to the public. When attempting this, imagine that you are in a heist film: be meticulous, wear gloves, wipe down surfaces to remove fingerprints, and don't leave anything that contains your DNA.

For those who are trying to disclose information, the process of sanitization is a little more complex, but there are two goals: 1) the removal of any information that could identify you, such as fingerprints, email addresses, or unique watermarks on documents; and 2) the removal of sensitive information that might harm someone else or have undue consequences if released, such as any company or government secrets or any personally identifiable information. This is where ethics and judgment come into play. Who would be harmed if this information were released? You don't want to accidentally victimize (or revictimize) a colleague, accidentally reveal personal information that could compromise one's reputation, or put a field agent in harm's way.

To sanitize physical items with nonporous surfaces, such as USB sticks or hard drives, wipe them down with a cleaning product and towel. Paper documents and other porous surfaces are more difficult to sanitize. There are a number of techniques for attempting this, but most involve using an eraser and potentially a cornstarch mixture to remove any oils left by fingerprints. If you are providing someone with a device such as a hard drive, remove any serial numbers or identifying information that would make that product traceable, and of course, be sure to pay cash when buying any hardware that you might use. If you must provide physical documents, redact them first with a black marker or white-out and then photocopy them, providing a redacted copy instead of the original.

For digital documents, the process of sanitization can be broken down into two strategies: 1) redaction, the process of obfuscating information within a document; and 2) metadata removal, the process of deleting identifying traces from the document.


Any text-based document (rich text files, DOC and DOCX formats, CSVs, Microsoft Excel files, PowerPoint files, Adobe InDesign files, etc.) should first be converted to a PDF. This can be done on most computers with either "print to PDF" or "export to PDF" functionality. The PDF should then be opened, and each page should be exported as an image and then redacted in image-editing software. Draw black boxes over areas of sensitive or identifying information in the images. Note: If you try to redact the documents from within the PDF, it will be done in layers, leaving the actual data underneath the black boxes. This will not technically remove the sensitive information. Similarly, it is important to use only image formats that do not include layers. If layers are included, someone can later remove the redaction layer and see the sensitive information underneath. JPG is a great image format to use, as it cannot save layers. After all of the images have been edited, they should be either recombined into a new PDF using a PDF viewer or given to someone as a set of images.

An alternative option is to use PDF Redact Tools, which automates those processes for you. It is currently available on Linux or macOS and comes bundled inside the Tails operating system.


Images should be redacted just the same as text documents. Save them in a format without layers such as a JPG. Draw black boxes over any portions that need to be removed, then save them.

Video and Audio

Redaction of video and audio files can be a bit trickier, but the same basic process of obfuscating information applies. For videos, open them in a video editing program and either delete portions of the video or add black boxes over sensitive pieces. Then export the edited video. Audio files should be edited in an audio editor (Audacity is a good free choice), and portions of the recordings can be deleted or replaced with a standard sine wave tone (like a censorship bleep).

Remember, though, that there may be other information in audio and video recordings that isn't obvious at first glance. Is there background noise or imagery that can be analyzed to determine the time and place it was taken? Are there reflections or other subtle pieces of data that could compromise you or someone else? Be very careful when it comes to audio and video, because so much information is contained in each file that it can be hard to think of every single thing that should be redacted.

Metadata Removal

Of course, if you are simply trying to get a video out, but trying to make it less obvious who it was shot by, removing the underlying capture information might be all that's needed. This is where removing metadata comes into play.

Example of image meta data created by an iPhone

The image above is just a selection of the metadata produced by one photo taken with a smartphone. The metadata contains the model of the phone, the time it was taken, and possibly the location of the phone at the time of capture (if GPS location was enabled). This data needs to be removed if you are trying to make the photo, video, or any other type of file untraceable.

Before anything else, check the filename for anything that could identify you or your means of creating the image. If you have any doubt—rename it.

All digital files inherently contain some distinct information that identifies them: filename, creation date and time, last modified date and time, and file size. Some digital file formats contain even more information. Microsoft Word documents, for example, are known for automatically saving additional metadata, such as the authors who worked on the document and the names and locations of the computers where the file was saved. Unfortunately, with these documents and particularly with proprietary file formats, it might be difficult or near-impossible to remove all pieces of metadata. Instead, convert proprietary formats to simple open-source formats that have consistent metadata formatting.

Some file formats use standard data wrappers to store metadata, such as EXIF (exchangeable image file format) or XMP (Extensible Metadata Platform). These are used for almost all image formats and PDFs. By converting other documents into these formats, it becomes much easier to delete metadata and know that it is really gone.

To actually remove metadata from an image, a PDF, or a video file, open it with its corresponding editing software and look for options such as "Properties," "Inspector," or "Document Inspector." This should open up a dialog with a list of all of the metadata fields and entries. Delete them all. You will also want to research format-specific metadata removal methods for specific file types. Audio and video files, such as MP3s or MP4s, for example, can have proprietary ID3 tags embedded within them—such as PRIV frames—that make it near impossible to know if they have been sanitized.

Alternatively, a number of applications can scrub metadata from particular file formats. Several applications can remove EXIF data from images, but the Android application "EZ UnEXIF Free (EXIF Remover)" is especially useful for those communicating via an anonymous smartphone or tablet. This application removes all EXIF data, including geolocation, from photos taken with an Android device.

The Metadata Anonymisation Toolkit (MAT) provides a simple interface for stripping metadata from a number of formats, including PNG, JPEG, PDF, MP3, and Microsoft Office Document formats. MAT comes installed on Tails. However, MAT currently hasn't been updated since January 2016, essentially making it abandonware. Fortunately, MAT2, the replacement for MAT, is under active development and currently in beta. This is a great tool that can be used to sanitize a variety of files, but please check on its current development status online before using it.


Be cautious about where you store documentation. Never store documentation at work, unless you are following the misfiling method mentioned previously. You may feel that your desk or office is a safe space, but it isn't. You can consider storing documents at home, but this is an obvious choice for all concerned. In many cases, those who are trying to disclose information have had their houses ransacked and searched by their adversaries, both legally and illegally. If a subpoena is filed, information in your home will not be protected.

A good strategy is to either store documents outside your home or office or give a backup copy of what you will be revealing to a trusted person for safekeeping. Daniel Ellsberg gave a copy of a classified nuclear study to his brother, who hid the documents under a large gas stove in a garbage dump. Unfortunately, while this protected them for a while, the documents were ultimately destroyed by water damage, and Ellsberg spent years trying to reconstitute the information they contained. Instead of your brother, choose a lawyer. In the United States, information stored with your attorney may be protected from search and seizure through attorney-client privilege. Of course, there are exceptions to this, which was the case in the raid on the office of President Trump's former attorney Michael Cohen. If investigators can make the case that attorney-client privilege is being used "in furtherance of a contemplated or ongoing crime or fraud," then they will be able to search a lawyer's office under the crime-fraud exception.

All digital documentation should be stored on either encrypted USB drives or on an encrypted pseudonymous device, such as an encrypted tablet or a Tails USB drive. Documents should never be stored in the cloud or on a personal computer or device.

Excerpted from A Public Service: Whistleblowing, Disclosure and Anonymity published by O/R Books. © 2019 Tim Schwartz