A study recently published in Current Biology 1 investigates the availability of research data from papers published up to 20 years ago, and the findings are startling, even to those who have long advocated for better stewardship of research data.
Timothy Vines, an ecologist at the University of British Columbia, set out to explore the idea “that authors are poor stewards of their data, particularly over the long term,” a premise that underlies current legislative and other efforts to ensure longterm preservation and access to research data by mandating that datasets be made available through a public repository, and that grant applications include a formal plan for managing research data.
Vines and his team set out to test this premise by requesting datasets from 516 ecology articles written between 1991 and 2011. Among his findings:
• The odds of a data set being reported as extant fell by 17% per year. In fact, for papers published in 1991, data could be confirmed as extant for only 2 of the 26 papers examined. For the oldest papers in the study, some 80% of the data may, in effect, be lost.
• Broken e-mails and obsolete storage devices were the main obstacles to data sharing. Working e-mail addresses for the first, last, or corresponding author fell by 7% per year; for the oldest (1991) papers, 35% of the email addressed did not work. Overall, Vines and his group received responses to their inquiries from only 37% of the studies’ authors (no working email address for 25% overall; no response received for 38% overall). As to the data’s availability, Vines told Nature that
“‘[m]ost of the time, researchers said ‘it’s probably in this or that location’, such as their parents’ attic, or on a zip drive for which they haven’t seen the hardware in 15 years,’ says Timothy Vines, the lead author on the study and an evolutionary ecologist at the University of British Columbia in Vancouver. ‘In theory, the data still exist, but the time and effort required by the researcher to get them to you is prohibitive.'”
Vines concludes that “policies mandating data archiving at publication are clearly needed” and argues that journals are in a strong position to require that data be submitted to a public repository as a condition of publication. A number of well-established data repositories exist; see Databib for examples, or speak to your friendly librarian for more information. You may also wish to consult this guide to research data management, developed by the Library’s Office of Scholarly Publishing & Communication.
1. Vines TH, Albert AYK, Andrew RL, Débarre F, Bock DG, Franklin MT, Gilbert KJ, Moore J, Renaut S, Rennison DJ (2013) The availability of research data declines rapidly with article age. Current Biology, online in advance of print. doi:10.1016/j.cub.2013.11.014
Data for Vines’ study is available through the Dryad digital data repository:
Vines TH, Albert AYK, Andrew RL, Débarre F, Bock DG, Franklin MT, Gilbert KJ, Moore J, Renaut S, Rennison DJ (2013) Data from: The availability of research data declines rapidly with article age. Dryad Digital Repository. doi:10.5061/dryad.q3g37