Where Does Data Go When Services Die?

If Flickr closed tomorrow and turned off its servers without a word to its users, what would happen to all our photos? They would go the way of our GeoCities sites and our AOL Hometown Pages, of course — except for the copies that users had saved on their own hard drives.

What if the Library of Congress, like most libraries across the nation, faced such severe funding cuts that it did not have the funds necessary to preserve our nation’s public documents? What if the outgoing President of the United States deleted all electronic records of his communications under the assertion of executive privilege?

The positive feedback loop between government transparency and data-driven decision making is reason enough to make government records publicly available online in an open, machine readable format. The risk of losing data to unexpected catastrophes is reason enough to store copies on multiple servers. But preserving the permanence of data regardless of shifting political winds, government officials’ mania for redaction, and the possibility of simply losing track of it amid the information overload all large bureaucracies contend with makes online syndication even more imperative.

Open data advocates have thus far focused their efforts on giving government agencies mandates to post data on their websites and to establish open data transport standards. This work is necessary for a participatory democracy based on data, but not sufficient. The next step is to use technology to spur civic participation in distributed data storage and replication that is so easy to use, anyone can do it. Think SETI@home but for preserving public data.

Mainstream methods for posting such data — direct client-server relationships like American FactFinder or Recovery.govensure that the data will no longer be available in 10 years, let alone 50 years, or 500 years. Compact discs lose their integrity over time, as do tape storage and hard drives. Government agencies de-fund initiatives, and data evaporates along with them.

But when citizens, agencies, and researchers use the Information Commons to retrieve, view and analyze copies of government data, the data is replicated through a broad network of independently owned, decentralized servers they use to access the Commons. Without even thinking about it, they have created redundant copies that ensure the data cannot be lost to accidental disasters — or to malfeasance, misfeasance or nonfeasance. As they continue to work with the data, more copies are created dynamically. The deterioration of storage media is no longer of concern, as the data are constantly refreshed and recreated across a wide network of storage servers every time they are accessed. These servers are not owned nor controlled by any one agency, organization or corporation, which provides organizational redundancy as well as data redundancy.

Public agencies must make their data truly public on the Internet, with redundant copies, for the sake of transparency, accountability, efficiency, and posterity — or else the lessons of our time, recorded with the most detail in history, will be lost to both the near and distant future.

Please let me know what you think in the comments, I’d love to hear your opinions.

Follow Josh Knauer on Twitter: http://twitter.com/jknauer