The benefits of sharing data generated by researchers have long been understood to be of great value to science (as exemplified by this British Medical Journal piece from 1994). And over recent years there has been a rapid increase in the ability to share and access research data – as can be seen in the rise of data journals (such as Scientific Data and Gigascience), the increase in research data repositories (both general and subject-specific), and the establishment of data sharing policies around the world.
However, in the medical world large amounts of clinical research can go unpublished and a large number of clinical trials go unregistered (almost 40% according to one study) – meaning we only have a partial account of what data have been gathered in medical research, let alone data that may be available to others. On top of this problem of non-publication, there is also evidence that reporting of research in medical literature favours positive results (as can be seen in this study and this one). All of which can ultimately lead to ill-informed treatment decisions by clinicians.
Compounding these problems in clinical research is the fact that it has traditionally not been easy to openly share data, mainly due to the need to protect research participants’ privacy. This said, calls for greater sharing and accessing of clinical data have been increasing– especially in light of the Ebola crisis, and more recently the Zika outbreak.
Voices in favour may have gotten louder but the conflict between clinical data sharing and protecting privacy still remains unresolved. An obvious seeming solution would be to de-identify participant’s data. But while guidelines and methods for anonymising data exist, real and perceived barriers to sharing clinical data remain.
Concerns about protecting research participant privacy were clearly demonstrated in a 2012 survey of several hundred clinical researchers. While most supported the sharing of de-identified data, clear concerns were expressed about privacy, as well as issues such as the legitimacy of secondary analyses of data by other researchers – a concern highlighted again by the recent ‘research parasites’ debate.
Also, although this survey confirmed that clinical researchers are interested in sharing data (despite their concerns), it indicated that they are often uncertain on how to – demonstrating that there is also a need to raise awareness on how (and how best) to share data.
The best method for sharing data is often seen as being via data repositories (such as in the editorial policies of the Nature journals) – as this enables greater awareness, accessibility and discoverability of data. However, surveys of researchers have previously shown that sharing of research data is more often than not carried out between individuals or research groups directly, as opposed to via public repositories.
Direct sharing may put researchers in control of who can access their data but it does little to increase overall awareness of clinical datasets being produced. And while many journals have policies that require authors to share the data that support their results with other researchers on request, or state publicly what data are available and how they can be accessed, these data sharing polices (in particular in medicine) vary greatly between journals. Even though encouraging data sharing should be lauded – and might be a positive first step – it only goes partway to allowing valuable clinical data to be effectively discovered and utilised.
Furthermore, clinical datasets that cannot be made public, such as where de-identification is not possible, are rarely discoverable without direct contact with the data generators. This has resulted in a need to develop new ways for interested parties to be able to find out about datasets that are made publicly available, as well as those that are not.
The emergence of dedicated data request and discovery systems, such as Repositive, Clinical Study Data Request and the Yale Open Data Access (YODA) project, represents one such new avenue of discovery. Another is journal articles (such Scientific Data’s Data Descriptor) that can permanently link articles to non-public clinical datasets – such as this article describing human brain imaging data that are available on request via the UK Data Archive. In scholarly publishing these capabilities are relatively new though and so there needs to be more robust guidance on how researchers, repositories and journal editors can work together to best provide access to, and awareness of, clinical data.
To this end, we at Scientific Data, in conjunction with a number of relevant stakeholders, have developed a set of guidelines for publishing descriptions of non-public clinical datasets intended to increase discoverability and accessibility of non-public clinical datasets in an effective and consistent manner. Although these guidelines do not directly address the issue of giving clinical researchers the will to share their data, they do help make the way a little easier.
By providing additional clarity on how, when and where to share, publish and peer-review clinical data we aim to improve the interconnectedness and transparency of clinical research. Sharing data in conjunction with journal publishing and via permanent repositories will also help researchers to receive more credit for their work and, we hope, ultimately enable clinicians to be better informed in their treatment of patients.
Stay up to date with everything from Scientific Data: http://www.nature.com/nams/svc/myaccount/save/ealert?list_id=329
Are you part of a project that facilitates data sharing for genomics or other related research?
Are you directly or indirectly involved in the Open Science movement?
Would you like to be featured on our blog?
We would love to hear from you.