In December, Google discontinued its Google Research Datasets service. The idea behind the service was great: Google provided scientists who needed to share very large datasets with storage space in the Google cloud of servers. Their decision to cut the service is part of larger belt-tightening effort as a result of an alarming 68% drop in their fourth quarter profit from the previous year. I don’t blame Google for taking this action, but it nonetheless is a jarring example of how putting all of your data eggs in one basket can be very dangerous.
It’s great to see researchers and others in the public sector sharing more and more of their data. Trouble is, most of the data they’re sharing exist on one server, housed either on-site, or by third parties like Google’s now defunct service or Amazon.com Public Data Sets. The problem with this particular approach is that when servers crash, companies decide to drop their services or political winds change, the data disappear forever.
Our ancestors made this same mistake during the third century BC with the creation of the Library of Alexandria. The Library was charged with collecting all of the world’s knowledge, which it accomplished with monetary and other support from the royalty of the time. When the Library was destroyed, most of the source copies of much of the world’s documented knowledge vanished along with it.
Will we repeat this mistake, or is there a better way„ (more…)