On the Freedom of Data

This post is part of Mothball March, where I dust off old posts I never quite finished. This is a more-final version of something I originally wrote in December 2012.

In the 1980s, a young computer programmer named Richard Stallman started the Free Software Foundation with the goal of promoting “free software”, software that granted its users a number of rights, including the right to view the source code, determine how the program works, and use that method in any other software. RMS realized that without these rights, we didn’t truly own any of our software or, more importantly, our data - if the company manufacturing, say, your word processor went out of business, you could be stuck with a number of important business documents that you could no longer read. Over the last 30 years, we’ve made significant progress in convincing the technical community of this, and inroads are happening to non-technical folks as well. However, we now face a new fight.

Previously having the freedom to deconstruct a piece of software (that is, to obtain the algorithm used to read data) was all that was needed, since you always had access to your own data. Now, with cloud-based services, that is often not the case.

Take Facebook as an example. Many people upload most, if not all, the photos they take directly to Facebook - and that’s where they continue to live. If Facebook decides to, say, start charging you (or any of your friends) $20 to view your photos, you can be outraged, you can throw a fit - but you’ll have to either pay the fee or give up those memories. This is a fairly far-fetched idea (although not as much as you might think), but any number of other issues could happen - Facebook could suddenly go out of business, or suffer catastrophic data loss, or even just slowly fade into obscurity without ever providing an easy way for you to retrieve your data.

In today’s networked world, free data is just as important as free software, if not more.

A previous (great!) company I worked for, XYZ Homework, provided an online homework service for customers of its books. The system we used was built on an open-source system, but if you wanted to switch off, you’d lose students’ accounts, progress on problems, the problems themselves (both you-created and XYZ-provided).

As a contrast, Github’s primary website is not open-source, but I have absolutely no concerns about it. Much of it is inherently open and distributable (the decentralized nature of git means that everyone working on a project has a full copy of it, including history), but those parts that aren’t, they provide access to with a well-documented and freely-available API. If you’re concerned about losing access to, say, your bug reports, you can easily back them up on a regular basis, and a number of people have developed open-source adapters that will convert between Github’s issue tracking system and popular alternatives.

If you’re a developer, please provide a way for your users to export their data into a competing service; if lock-in is the only thing keeping them with you, you’re doomed already.

And since all of us are users of something-or-the-other, it’s our responsiblity to give data freedom a place in the purchasing conversation - and, if necessary, to make businesses pay attention by refusing to use their services until they free our data.