Flashbake: Free version-control for writers using git

For the past couple weeks, I've been working with Thomas "cmdln" Gideon (host of the fabulously nerdy Command Line podcast) on a free software project for writers called "Flashbake" (which is to say, I described what I wanted and Thomas wrote the code). This is a set of Python scripts that check your hot files for changes every 15 minutes, and checks in any changed files to a local git repository. Git is a free "source control" program used by programmers to track changes to source-code, but it works equally well on any text file. If you write in a text-editor like I do, then Flashbake can keep track of your changes for you as you go.

I was prompted to do this after discussions with several digital archivists who complained that, prior to the computerized era, writers produced a series complete drafts on the way to publications, complete with erasures, annotations, and so on. These are archival gold, since they illuminate the creative process in a way that often reveals the hidden stories behind the books we care about. By contrast, many writers produce only a single (or a few) digital files that are modified right up to publication time, without any real systematic records of the interim states between the first bit of composition and the final draft.

Enter Flashbake. Every 15 minutes, Flashbake looks at any files that you ask it to check (I have it looking at all my fiction-in-progress, my todo list, my file of useful bits of information, and the completed electronic versions of my recent books), and records any changes made since the last check, annotating them with the current timezone on the system-clock, the weather in that timezone as fetched from Google, and the last three headlines with your by-line under them in your blog's RSS feed (I've been characterizing this as "Where am I, what's it like there, and what am I thinking about?"). It also records your computer's uptime. For a future version, I think it'd be fun to have the most recent three songs played by your music player.

The effect of this is to thoroughly -- exhaustively -- annotate the entire creative process, almost down to the keystroke level. Want to know what day you wrote a particular passage? Flashbake can tell you. Want to know what passage you wrote on a given day? That too. Plus, keeping track of my todo.txt file means that I get a searchable database of all the todo items I've ever used, with timestamps for their appearance and erasure.

Additionally, since git repositories are made to replicate, you can publish some or all of your projects to the public web or to a private site. I'm hoping that my publisher will use a public git repo to check out the most recent versions of my in-print books every time they go back to press for a new edition, and use the built-in compare ("diff") function to find all the typos I've fixed since the last edition.

It's all pretty nerdy, I admit. But if you're running some kind of Unix variant (I use Ubuntu Intrepid Ibex, but this'd probably do fine on a Mac with OS X, too) and you want to give it a whirl, Thomas has made all the scripts available as free software. He's working on a new version now with plugin support, which is exciting!

I love adapting programmers' tools for my writing. They tend to be extremely well-made and stable (because if they aren't, programmers will fix them or find better ones) -- it's like using chefs' knives in the kitchen.

Cory wanted the version to carry prompts, snapshots of where he was at the time an automated commit occurred and what he was thinking. I quickly sketched out a Python script to pull the contextual information he wanted and started hacking together a shell script to drive git, using the Python script’s output for the commit comment when a cron job invoked the shell wrapper.

I added my own idea to the project, borrowing from continuous integration build systems the idea of a quiet period. I could easily imagine Cory actively working on a story, saving continually and a commit happening mechanically in the midst of that writing being less useful than if the script could find a quiet time to commit. This enhancement prompted me to ditch my shell script wrapper and pull that logic all into Python.

Flashbake (Thanks, Thomas!)



  1. This actually sounds amazingly keen.

    Myself, I use Subversion via tortoisesvn on Windows because my machine that I use for writing is also the machine I need for work and that means MS Word. It’s not every 15 minutes, but instead a commit basis for me.

    However, being able to pull back previous versions and make diffs certainly creates an eerie experience, especially when I am wondering why I went in a direction that I did and forgot the nuance that lead up to the decision.

    The side benefit of the SVN system is that it also gives me another off-site backup.

  2. Interesting — a few months ago, I did something similar. I hacked up a quick shell script to capture various kinds of notes in a date-ordered directory tree.

    For the version control, I used Mercurial. Now I can take my HP Mini-Note along with me as a digital note-taking device, secure in the knowledge that, when I get back home, I can easily sync the work done between my various other systems, all the while maintaining a complete audit trail of my note-taking activities.

    My Mini-Note’s name? Moleskine, of course! :-)

  3. I’ve been using emacs and SVN for my writing for some time now, but I prefer git for code. It makes sense to use it for text too. I’ll have to try flashbake. Maybe I’ll add a snapshot of my latest Twitter tweet since I’ve moved away from actual blogging over the last year. My only fear is that a log with flashbake’s level of detail for my mental state over time may be used as evidence to commit me.

  4. Can’t you actually own an Iphone? I mean, If I buy a car, I can rip out the factory parts and put in custom parts. If I own an electronic product for some reason the same rights of ownership do not apply.

    Soon we will simply be leasing everything we “own”.
    ..Have to pay extra to read books aloud, must pay extra to have music at work and home, must renew my subscription to my own movie collection…

    Big brother is billing you…

  5. To be honest this kind of thing should be built in to the operating system. Why does the computer ever forget any work that I’ve done, ever? For almost everything but videos computers have enough memory to remember every change you ever make to a document. Ideally it’d be like having an infinite, branched undo – with the state of the document at every point in time since its creation preserved.

    Often when I’m coding I’d like to have a kind of ‘timeline slider’ that I could just quickly scratch back to an earlier version, copy a bit of older code to the clipboard and scratch back to the latest version again to paste it back in.

  6. There is a firm commitment from Mark Shuttleworth to open the source code for the Launchpad web-based community development framework.

    Combine this with bzr, and motivated authors and editors, and you get an inexpensive, crowd-sourced, publishing house with baked-in peer review.

    Let’s see if the publishing dinosaurs take note of such chances. Doubt it.

  7. Interesting project. I’m currently working on a book, and every day I check in the day’s work in my git repository and push it to a server – version history and backup all in one.

    I wouldn’t really be very interested in Flashbake, though – for the time being, I think once a day is granularity enough for me. But I like the idea, because that means you never actually forget to commit and push.

  8. Oh that is -quite- sexy. I believe it might be enough to temporarily inspire me to working on my ‘novel’ again. Until a few months from now when I hit the computer and say ‘no more!’ I love writing by hand because of the annotations, but I’m terribly slow and tend to get indecipherable when I’m excited.


  9. Nerdy? Yes. But a fantastic idea Cory and one that you were in a great position to think of, and see realized. I am impressed.

  10. If you’re using text files, then I could see the value. But most word processor formats (i.e. MS .DOC, openoffice .ODT) don’t play well with version control tools. This is because word processors change far more in the files than your simple edits, tweaking dates, keeping “undo” buffers of changes, any tiny mod to a graphic, etc. Version control tools like subversion track the diffs only to save space and build your files from those diffs. One little corrupted file in that tree and you’ve lost all the versions from that point onwards. Trust me, I know from storing .DOC files in a repository in a business environment.

    For word processing files I simply depend on my own naming standard. Baseline file begins as “MyCoolStory1a” and then rev through 1b, 1c, etc, creating a new file on each pass. When I send a piece out for critiquing, I roll the number to 2a. I find it makes a lot more sense because the revisions are keyed to revisions I plan rather than simple time stamps. Combined with the Windows Briefcase utility, it all works pretty clean.

    – Kevin N. Haw

  11. I love adapting programmers’ tools for my writing. They tend to be extremely well-made and stable (because if they aren’t, programmers will fix them or find better ones) — it’s like using chefs’ knives in the kitchen.

    Or perhaps like using a drafter’s compass for one’s watercolors?

  12. Very cool Cory!

    Of course I doubt that this will see widespread use until it has a Windoze installer and doesn’t require the user to think about it very deeply.

  13. Hmm, if the goal is seeing the creative process over time it would be very easy to create a streaming file format that recorded it in its entirety.

    You’d need to convince someone to write an editor that did this, but basically, you would write out a timestamp with every keystroke. You could use offsets in time for compactness, and create snapshots at intervals to increase the play back and rewind speed. This would all be streamed and written to disk in one continuous file.

    Every delete key, every mistyped word, every pause and subtle break. You could watch yourself writing the book, sped up or slowed down. Your coffee breaks, hesitations, and bursts of inspiration would all be recorded.

    I don’t how useful it would be in the end though.

  14. Indeed. I’m shackled to Windows at work, and because my PC is my main gaming rig, I use Windows at home. Prefer OS X, use Vista because it came on the latest machine I bought.

    A Windows variant would be brilliant! Want!

  15. Too bad git was chosen, it’s not (currently) the most portable of version control systems. Mercurial, works well on both windows and mac and would have eased creating cross platform systems.

    Also the system probably works fine with binary files like MS word documents. You just lose some of the wonderful diffing abilities.

  16. Since pretty much any editor keeps an undo/redo stack it ought to be possible to slightly modify that mechanism so that each change gets streamed to a storage system. In principle it would be possible to replay the entire stream from first opening the document.

    Combine this with TimeMachine and you could have something interesting.

    Oh, and Cory’s setup out to include a snapshot from the camera attached to the machine with the rest. Historians will want to see the expressions of bemusement, wonder, boredom, worry etc that go with each step in the writing process.

  17. Cory, this is great news! I work in publishing and have been considering the potential in distributed version control systems. It’s not hard to imagine a network of writers, designers and editors working on ebooks through git or another DVCS. That kind of workflow just feels natural and very efficient. Kudos to Thomas Gideon and you!
    I wonder which text editor you use and why. I’m using Kate right now and feel very comfortable with it, but any suggestions from you or the readers would be deeply appreciated.

    Just for the record, I don’t work for a particular publishing house, just a freelance editor, but I’d like to start one of my own as nobody in my country (Argentina) seems very interested in ebooks at all. Oh well, talk about dinosaurs!

  18. Agger, the frequency is driven by cron so you could easily have it run daily.

    John Cooke, right now the limiter on a Windows version is some of the stock plugin code. If you want, I can provide you instructions on disabling the problem plugins in exchange for your willingness to try it on Windows. The current version should only depend on a valid Python installation. To get a fully compatible version, I’d need help testing Windows’ specific plugins for identifying timezone (on which the weather plugin relies, too) and uptime (if at all possible under windows).

    In the next major revision, I will have fully fleshed out the plugins, including documenting how to enable/disable the stock plugins for anyone who wants to start experimenting with altering the commit message flashbake generates.

  19. You could also run this in a virtual machine on a windows box, given that the system requirements seem light.

    cmdln, I don’t suppose you’d be interested in creating a portable apps version, would you?

  20. Johne Cooke: My contact info is on my website, drop me an email and we’ll see what we can do in the next couple of days (I am traveling this weekend).

  21. I think this is an amazingly cool thing. Already downloaded, and I’m checking it out.

    I used to use Subversion (now on Mercurial) for my own writing, but I’ve also started using Dropbox (no affiliation, just a user). It is commercial and not open-source, but is also remarkably easy, completely automatic, works across platform (Linux, OS X, Windows), and is free, unless you want to purchase extra storage.

    The reason I bring it up is that it interfaces seamlessly with the GUI so that those intimidated by the command line can reap the benefits of version control and off-site storage without relying on friendly nerds to help them get setup.

  22. You do know that most word processors (including OOO Write and even MS Word) have a built in per document versioning system.The file size will only be slightly larger every time you create a new version.

    I understand the need for a centralised versioning repository, but for a local repository this seems like reinventing the wheel.

  23. I wonder if anyone remembers Textra and InfoSelect, my two favorite writing programs. Textra was this wonderfully powerful, efficient, attractive and easy text-based word processor that got wiped out during the mad rush to Windows. (Not to be confused with an academic-writing program of the same name?) And infoselect was a similarly powerful etc. text-based free form database. I would love to find equivalents for them now (GNU/Linux).

  24. CMDLN, for windows, you should check out cygwin (www.cygwin.com) a POSIX layer for windows. It has cron, python, git etc, all easily installed (and maintained) from the installer.

    Not tried it in anger myself yet, but setup.py does not complain about any missing libs/whatever.

  25. To quote Terry Pratchett:

    I save about twenty drafts — that’s ten meg of disc space — and the last one contains all the final alterations. Once it has been printed out and received by the publishers, there’s a cry here of ‘Tough shit, literary researchers of the future, try getting a proper job!’ and the rest are wiped.

  26. That is really sweet, Cory and Thomas!

    I’ve moved from writing in Open Office to yWriter, since I’m shackled to Windowz by my graphics apps — same thing thing that keeps me using, at least some of the time, a desktop behemoth.

    yWriter has good paranoid backup method deep down in its guts, and I add to that directory-level file sync across machines in multiple locations, but I would seriously dig having the quantity of logging that you’ve got there.

    Bugger the historians, I want that kind of data-tracking so I can reverse engineer my own process!

  27. # URL to the RSS feed, Atom support is coming soon

    Er, the URL to what RSS feed? I don’t even have a webserver. Am I supposed to set up my (so far unused) git installation to publish RSS? How do I do this?

  28. So, why does it use python? Is it because he only has a hammer so everything is a nail? Git comes with a Perl module which works with external git commands like his git interfacing code (so he basically duplicates his effort for no gain except s/\.pl/.py/g). The Git module is also more likely to be tested and guaranteed to be working with the installed version of git than hand coded calling out to ‘git’.

    This reminds me of a post I read this morning, a guy at mozilla rewriting GNU Make in Python, supposedly to increase performance, although his implementation is currently 10x slower than gmake.

  29. I’ve been hearing good things about yWriter as well. I started experimenting with that last night for the serial novel I’m 2/3s of the way through with.

  30. Dropbox (http://getdropbox.com) could be great for writers. It’s online storage (2g free) that’s pretty easy to work with. At least on Windows (they have Mac and Linux versions), the software sets up a folder in My Documents that automatically syncs with the server. Then it can sync the files to any other computers connected to your Dropbox account. It’s practically replaced USB thumbdrives for me.

    What’s good for this discussion is that the server maintains versions of every file (I think they claim infinite). So it automatically keeps a revision history, even if you delete the file locally. It makes it very hard to lose your work.

  31. In a similar vein, I’ve started using doxWiki (a really, really minimal wiki) to keep background notes for fiction pieces, and that’s working well :)

  32. Is there a git repository available (e.g. on github, on gitorious, on repo.or.cz) for Flashbake? Homepage seems to have only snapshot of the last version…

  33. #30, mirrormonkey:

    So, why does it use python? Is it because he only has a hammer so everything is a nail?

    No, python’s more like a Leatherman; if you already have it in your hand, why bother reaching for another tool?

    Your gmake example is admittedly ridiculous, but I find python extremely useful for actually getting stuff done; even if python’s not absolutely the best way to do something, it’s often the quickest way.

    I’ve just been doing something similar to this project, using python to manipulate SVN-controlled files; pysvn was very useful for this.

Comments are closed.