Happy Open Data Day 2019!
It’s that special day of the year again! Well, every day should be Open Data Day, but today lots of motivated folk come together around the world to remind us all why Open Data, Open Science, and sharing of data and science in general is better for everyone.
Better for reuse, better for tracking public money flows, better for open mapping and development, and also, lest we lost sight, better for the researcher who produced the data!
Why better for the researchers who generated the data?
Better because the value add from sharing is multifold.
Others can reuse and reanalyse your data. If you’ve placed the data in a repository with a persistent identifier, you’ll get attributed when they are reused and you can get credit for this — and even citations.
What may not be immediately obvious is that taking a little bit of time to ensure your data are ‘sharable’ is good practise that ensures that when you want to use your own data again they’ve been well stored/annotated and are accessible. Those data will not be lost if the laptop hard-drive fails. And they’ll not be lost when the grad student or post-doc who generated them moves on from your lab. At some point down the line, when a new project might benefit from a look back at the data you’re producing now, you’ll have access — or better yet, you can directly point your new students/post-docs to exactly where those data can be found.
And finally, also, better for the researcher because that one last check of your data before you make it public could help catch any minor errors before publication, rather than them surfacing after the fact!
Since implementation of our policy, we’ve taken pains to work with authors to ensure that data used to generate figures are shared, requesting the underlying numerical values used to generate all plots and graphs. While perhaps being perceived as a chore by some, this should be something a researcher can produce with minimal effort at the time of writing a manuscript. Along the way, some of our authors have identified minor errors in their data because we’ve required the data sharing step.
Transference of numbers from lab notebook to digital file may not have been completely accurate or an error in statistical analysis or interpretation may have crept in at a multitude of places along the way. As a result our authors found & rectified the errors before we published their manuscript and they were very grateful. Here are quotes from two happy, grateful authors who really appreciated the data sharing step:
“I apologise sincerely for the errors that were included in previous versions of the manuscript. This is the first time that I have been required to provide a data spreadsheet alongside a manuscript, and the value of doing this, which meant that I ended up double-checking the values in our figures, figure legends and spreadsheets, has been illustrated amply. In the future (whether required to by the journal or not!) I will prepare a similar file prior to initial submission in order to prevent this from ever happening again. We appreciate the precision with which your staff has edited our manuscript, as its quality has improved because of it.”
“Our errors would not have been detected until after publication (if at all), if it was not for your comprehensive data policy. I hope more journals follow your lead on this, as I think it will improve data analysis by the authors and hopefully, improve reproducibility of findings between labs.”
The benefits really speak for themselves… Share, share, share!
Image credit: A PhD student keeping notes in their lab notebook. By Katherine Stember – Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=64475946